Using Mothur to Determine Bacterial Community Composition and Structure in 16S Ribosomal RNA Datasets

Sruthi Chappidi, Sruthi Chappidi, Erika C. Villa, Erika C. Villa, Brandi L. Cantarel, Brandi L. Cantarel

Published: 2019-06-20 DOI: 10.1002/cpbi.83

Abstract

The 16S ribosomal RNA (rRNA) gene is one of the scaffolding molecules of the prokaryotic ribosome. Because this gene is slow to evolve and has very well conserved regions, this gene is used to reconstruct phylogenies in prokaryotes. Universal primers can be used to amplify the gene in prokaryotes including bacteria and archaea. To determine the microbial composition in microbial communities using high-throughput short-read sequencing techniques, primers are designed to span two or three of the nine variable regions of the gene. Mothur, developed in 2009, is a suite of tools to study the composition and structure of bacterial communities. This package is freely available from the developers (https://www.mothur.org). This protocol will show how to (1) perform preprocessing of sequences to remove errors, (2) perform operational taxonomic unit (OTU) analysis to determine alpha and beta diversity, and (3) determine the taxonomic profile of OTUs and the environmental sample. © 2019 The Authors.

INTRODUCTION

Advances in next-generation sequencing platforms enable researchers to investigate the diversity of the microbial community that plays a crucial role in human health and disease. The 16S ribosomal RNA (rRNA) gene, which is a subcomponent of the 30S subunit of the prokaryotic ribosome, has been used to understand the microbial diversity in different environments including oceans, soil, and the various body sites in animals. The 16S rRNA gene is slow to evolve and conserved across prokaryotic species, thus it makes an informative genetic marker to determine the taxonomic composition of microbiota communities (Chakravorty, Helb, Burday, Connell, & Alland, 2007; Fouhy, Clooney, Stanton, Claesson, & Cotter, 2016; Klindworth et al., 2013; Lim, Song, Kim, Lee, & Nam, 2018). The 1500 to 1550 bp 16S rRNA gene sequence consists of nine hypervariable regions (V1 to V9) that can be used to estimate organism diversity among the bacterial community (Chakravorty et al., 2007; Clarridge, 2004; Janda & Abbott, 2007). Species identification or classification among a bacterial community can be achieved using the sequence diversity demonstrated by nine hypervariable regions within the 16S rRNA gene (Chakravorty et al., 2007). Typically, only a portion of this gene, containing two or three of these variable regions, is sequenced using amplicons because inexpensive next-generation sequencing technologies can sequence 250 to 300 bp reads. With overlapping paired-end sequencing, contigs can be generated of the 400 to 450 bp targeted region.

Several studies have examined how the selection of variable regions can yield different bacterial compositions with the same samples because of nucleotide heterogeneity and classification criteria (Fouhy et al., 2016). A comparison of these variable regions showed that the V2, V3, and V6 regions together enable identification of 110 bacterial species (Chakravorty et al., 2007). Furthermore, this study showed that: (1) V1 is able to identify common pathogenic Streptococcus species; (2) V6 region, which is only 58 bp in length, could differentiate among most bacterial species except Enterobacteriaceae; and (3) regions V4, V5, V7, and V8 are more conserved and have limited utility for bacterial phylogenetic identification. The V1 to V4 or V1 to V3 regions were shown to be more reliable compared to other regions for taxonomic classification to distinguish bacterial organisms in a study that examined the utility of 454 pyrosequencing and partial gene sequencing (Kim, Morrison, & Yu, 2011). More recently, using Illumina sequencing, Yang, Wang, and Qian (2016) concluded that V4 to V6 was more reliable for compositional determination and that V2 and V8 were the least reliable regions in bacterial taxonomic profiling. The Human Microbiome Project (HMP) used the V3 to V5 regions of the 16S rRNA gene for all samples after a pilot determined these regions to be the most accurate for taxonomic profiling in all eighteen body sites, compared to the V1 to V3 and V6 to V9 regions (Schloss, Gevers, & Westcott, 2011).

There are several biological and technical confounders that will affect estimates of the composition of microbial communities. Biological effects include metabolic factors, such as diet, age, medication, and disease in the case of gut samples or environmental factors such as temperature, salinity, and moisture. Technical confounders that effect microbiome profiles include sampling, processing, storage techniques, DNA extraction methods, PCR primers selection of hypervariable regions, and sequencing methods (Chakravorty et al., 2007; Debelius et al., 2016; Fouhy et al., 2016; Xue, Wu, He, Liang, & Wen, 2018). For example, DNA extraction methods have been shown to be very important for bacterial community composition determination (Wagner Mackenzie, Waite, & Taylor, 2015; Santiago et al., 2014; Wesolowska-Andersen et al., 2014; Yuan, Cohen, Ravel, Abdo, & Forney, 2012). These differences can complicate the comparison of studies. For example, HMP and MetaHIT DNA extraction methods resulted in good yields and purity for sequencing, however the HMP protocol worked better for bacterial extraction whereas MetaHIT most efficiently extracted eukaryotic DNA (Wesolowska-Andersen et al., 2014). These technical confounders should be considered when interpreting results comparing previously published studies.

Basic Protocol 1: PREPROCESSING AND ALIGNMENT OF 16S rRNA SEQUENCES

The purpose of this protocol is to (1) remove poor quality sequences, (2) chimeric sequences, and (3) reduce the number of similar sequences to optimize computation. Typically, a sequenced library is generated from PCR products of about 400 base pairs from two or three variable regions of the 16S rRNA gene. For this basic protocol, we assume that sequences are generated from paired-end 250 or 300 bp sequences generated from an Illumina MiSeq. This pipeline will include the following steps (1) contig generation; (2) quality filtering; (3) alignment to a reference; and (4) the removal of chimeric sequences.

Necessary Resources

Hardware

  • 64-bit computer running Linux, Mac OS X, or Windows operating system

Software

  • Mothur (mothur, RRID:SCR_011947; the program can be downloaded from the software git repo, https://github.com/mothur/mothur/releases, where software executables are available for Windows, Linux, and Macintosh computers)

Files

Table 1. Example Input File for the Make_Contigs Command
Sample1 Sample1.R1.fastq.gz Sample1.R2.fastq.gz
Sample2 Sample2.R1.fastq.gz Sample2.R2.fastq.gz

1.Generate contigs from FASTQ/FASTA data.

Note
Paired FASTQ or FASTA files containing paired end sequencing reads, typically have the same name where the forward read file is labeled with an R1 and the reverse read file with an R2. The make.contigs command combines these overlapping reads from the sample sequence cluster into contigs. This step requires an input file containing the names of the FASTQ or FASTA files that will be used in the analysis. The input sequence list file can be generated manually or through a mothur script called make.file.

Note
The output for this command is fasta files of the contigs and report files. Samples are called “groups” in mothur.

From the command-line prompt (using Terminal in Mac OS X and in Linux or Putty in Windows), run mothur using its full path from the directory where you have saved the data files, for example, if you download the data into microbiomeprotocol/test_data and you installed mothur in the directory:

/home/user/mothur

Then execute mothur using:

  • $ cd microbiomeprotocol/test_data
  • $ /home/user/mothur/mothur
  • make.contigs(file=microbiome.files)

2.Filter sequences with user defined criteria.

Note
Checking the quality of your contig fasta file could help you decide how to proceed with removal of low-quality data. In order to determine the quality of your data, you can run the summary.seqs command to produce a summary of the contig sequences created in step 1. This step is optional but recommended if you are new to 16S rRNA analysis. Because this protocol is based on PCR, sequences that are larger than the target region are likely errors and should be removed, as should any sequence with an excessive number of homopolymer strings and ambiguous bases. The screen.seq command is used to remove low-quality reads or those likely to contain errors. The input for the screen.seq command is the contig FASTA and group files from step 1 and the summary file that you will create with the command summary.seq, if this step was performed. Each option defines the criteria for quality including the maximum number of ambiguous (N) bases (maxambig = 0) and the maximum length. Sometimes, it is desirable to filter on a metric, such as read length, without defining the specific criteria, minlength >200. The optimize and criteria options are removed based on that metric using the percentile, so to make this criterion based on maxlength using the 90th percentile, the options add is: optimize=minlength, criteria=90. The user could choose to optimize based on other factors such as maxlength or maximum homopolymers. The screen.seqs command produces a new file called microbiome.trim.contigs.good.fasta.

  • summary.seqs(fasta=microbiome.trim.contigs.fasta)

  • screen.seqs(fasta=microbiome.trim.contigs.fasta, summary=microbiome.trim.contigs.summary, group=microbiome.contigs.groups, maxambig=0, optimize=maxlength, criteria=90)

3.Create FASTA of unique sequences and a table of the counts for each of those representative sequences.

Note
In order to reduce downstream computational resources and speed up compute time, duplicate sequences can be removed with their duplicate count taken into account for abundance measurements. The unique.seqs command removes duplicate sequences and creates a file that tracks the number of sequences represented by the reference sequence. Use the command to get unique sequences from the filtered FASTA file created in step 2. Output will be a FASTA file with unique sequences and a names files with the name of the sequences that are identical to the retained reference sequence. With the count.seq command, generate a table of the count of identical sequences for each group. This table is necessary for determining the abundances of taxonomic classification and OTUs in downstream steps. This command simplifies the names file created in unique.seqs and the groups file from step 2 by setting the rows as the names of the unique sequences and the columns as the names of the groups. Finally, create a summary stats file using the summary.seqs command.

  • unique.seqs(fasta=microbiome.trim.contigs.good.fasta)

  • count.seqs(name=microbiome.trim.contigs.good.names, group=microbiome.contigs.good.groups)

  • summary.seqs(fasta=microbiome.trim.contigs.good.unique.fasta,count=microbiome.trim.contigs.good.count_table)

4.Generate a sequence alignment.

Note
Align the sequences from the FASTA file generated in step 3 to an alignment database. Using your alignments, sequences that do not align to the desired 16S region can be determined and removed with the screen.seqs step and filter.seqs steps. The following command removes (1) homopolymers of length >8 using the option maxhomop=8 and (2) start, end, and minimum lengths that fall outside the 90th percentile bracket using the options optimize=start-end-minlength and criteria=90. Alternatively, a threshold can be set for each metric. The filter.seq command trims the gap ends of the alignment.

  • align.seqs(fasta=microbiome.trim.contigs.good.unique.fasta, reference=template.align)

  • screen.seqs(fasta=microbiome.trim.contigs.good.unique.align, count=microbiome.trim.contigs.good.count_table, summary=microbiome.trim.contigs.good.unique.summary, optimize=start-end-minlength, criteria=90, maxhomop=8)

  • filter.seqs(fasta=microbiome.trim.contigs.good.unique.good.align, vertical=T, trump=.)

5.Reduce sequence redundancy.

Note
The unique.seqs command reduces the number of sequences for analysis, while maintaining them in the sequences count. The pre.cluster command uses the fasta file to cluster the sequences with a maximum number of base differences and collapses possible PCR error.

  • unique.seqs(fasta=microbiome.trim.contigs.good.unique.good.filter.fasta,count=microbiome.trim.contigs.good.good.count_table)
  • pre.cluster(fasta=microbiome.trim.contigs.good.unique.good.filter.unique.fasta,
  • count=microbiome.trim.contigs.good.unique.good.filter.count_table, diffs=2)

6.Identify and remove chimeras.

Note
Chimeras, which are single DNA sequences arising from multiple parent sequences, can exist in your sequencing reads because of laboratory manipulation or PCR artifacts. Potential chimeras can be identified using the chimera.vsearch command and they can be removed using the remove.seqs command.

  • chimera.vsearch(fasta=microbiome.trim.contigs.good.unique.good.filter.unique.precluster.fasta,
  • count=microbiome.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
  • remove.seqs(fasta=microbiome.trim.contigs.good.unique.good.filter.unique.precluster.fasta,
  • accnos=microbiome.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.accnos)
  • rename.file(fasta=microbiome.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta,
  • count=microbiome.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, prefix=final)

Basic Protocol 2: TAXONOMIC CLASSIFICATION

The purpose of this protocol is to determine the taxonomic classification of the sequences passing quality control filters from Basic Protocol 1.Classification is based on a reference phylogenetic tree and variability of sequences in a taxonomic group can be classified at the phylum, class, order, family, genus, or species rank. Species classification is difficult with only a proportion of the 16S rRNA gene.

Necessary Resources

Hardware

  • 64-bit computer running Linux, Mac OS X, or Windows operating system (use the name software and datafiles shown in Basic Protocol 1)

Software

  • Mothur (mothur, RRID:SCR_011947; program downloaded from the software git repo: https://github.com/mothur/mothur/releases, where software executables are available for Windows, Linux, and Macintosh computers)

Files

  • Name software and datafiles (produced in Basic Protocol 1)
  • Fasta and count files (generated in Basic Protocol 1)
  • Reference database of 16S rRNA sequences
  • Taxonomic classification of the sequences in the reference database (Taxonomies and reference sequences can be obtained at https://mothur.org/wiki/Taxonomy_outline. We recommend the taxonomy file downloaded in Basic Protocol 1.)

1.Classify sequences.

Note
Taxonomic classification is based on a database of reference sequences and a phylogenetic tree. There are available methods: Wang and kNN. Wang uses the profile of k-mers to match sequences into taxonomic groups using the k-mers profiles of each classification in the reference tree. The kNN method assigns taxonomy by the taxonomic assignments of the k-Nearest-Neighbors. Therefore, a sequence classification can differ based on the method and phylogenetic tree. There are three widely used taxonomic classification phylogenies: (1) Ribosomal Database Project (RDP; https://mothur.org/w/images/d/dc/Trainset16_022016.rdp.tgz), (2) SILVA (https://mothur.org/w/images/3/32/Silva.nr_v132.tgz), and (3) Greengenes (http://www.mothur.org/w/images/6/68/Gg_13_8_99.taxonomy.tgz). While most of these taxonomies overlap well, differences can affect taxonomic classification (Balvočiute & Huson, 2017). The program classify.seqs is used to predict the taxonomic classification for each read or contig.

  • classify.seqs(fasta=final.fasta,count=final.count_table, reference=reference.fasta, taxonomy=reference.tax, cutoff=80)

2.Remove sequences based on taxon.

Note
False positive classification can be removed. Because this example is targeting 16S rRNA from eubacteria, we are removing all non-bacterial classifications.

  • remove.lineage(fasta=final.fasta, count=final.count_table, taxonomy=final.reference.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

3.Get summary of taxonomy.

Note
To create a data matrix table that has the count of each taxonomic classification for each sample, use the summary.tax command. The resulting data matrix can be used for plotting the taxonomic profiles (see Support Protocol).

  • summary.tax(taxonomy=final.reference.wang.taxonomy,count=final.pick.count_table)

Support Protocol: TAXONOMIC PROFILE PLOTTING

In order to assess the taxonomic composition of samples, the taxonomic profiles can be plotted in order to compare samples. The purpose of this protocol is to (1) open the taxonomy table from the above protocol, (2) select the abundances of the taxa in a particular rank (kingdom to genus), (3) normalize the counts, and (4) create a stacked barplot.

Necessary Resources

  • R (https://cloud.r-project.org)
  • R Libraries from CRAN reshape2, tidyverse, RColorBrewer, installed using the following command:
    • install.packages(c("reshape2","tidyverse","RColorBrewer"))

1.Prepare a table of taxonomic abundance of the family level.

Note
The tax summary table uses a column called taxlevel to indicate the distance from the root of the phylogenetic tree where 1 =Kingdom, 2=Phylum, 3=Class, 4=Order, 5=Family, 6=Genus, and 7=Species. The following code can be run using the output tax summary table. The code generates a plot of the phylum (Fig. 1).

Open an R interactive session using your favorite environment such as R Studio.

  • tsum <- read.table(file=’ final.reference.wang.tax.summary’,header=TRUE)

  • header <- names(tsum)

  • dds <- tsum[rowMax(tsum[c(6:length(header))]) > 10, ]

  • dds.header <- names(dds)

  • keepcols <- header[c(6:length(names(dds)))]

  • tlevel <- select(filter(dds,taxlevel == 2),keepcols)

  • libSizes <- melt(colSums(tlevel))

  • colnames(libSizes) <- c('SampleCt')

  • tleveltaxon)

  • temp <- melt(tlevel,id.vars='taxon')

Taxonomic profile at the family level of four mock community samples.
Taxonomic profile at the family level of four mock community samples.

2.Determine a normalized count.

  • temp2 <- unique(merge(libSizes,temp, by.y=‘variable', by.x=‘row.names'))

  • temp2value/temp2$SampleCt)),0)

  • taxontbl<-

  • merge(sgroups,temp2,by.y=‘Row.names',by.x=‘SampleID')
  • tbl <- unique(select(taxontbl,‘SampleName', ‘SampleGroup', ‘taxon',‘normct'))

3.Create a PDF file containing a stacked bar plot of phyla by sample.

  • pdf(file=paste(study,".taxon_barchart",levelname[i],‘.pdf', sep=''), paper=‘USr')

  • p2 <- ggplot(data=tbl, aes(x=SampleName, y=normct, fill=taxon)) + geom_bar(stat="identity") + guides(guide = guide_legend(ncol=2)) +theme(axis.text.x=element_text(angle=45, hjust=1, size = 5)) + theme(legend.title=element_text(size=8), legend.text = element_text(size=8))

  • print(p2)

  • dev.off()

Basic Protocol 3: OPERATIONAL TAXONOMIC UNIT DETERMINATION AND COMMUNITY DIVERSITY ASSESSMENT

Operational taxonomic units (OTUs) are clusters of similar sequences that represent a taxonomic unit such as species or genus. Due to the lack of consistent prokaryotic taxonomies and reliance on a reference database to calculate diversity and assess differentially abundance taxa, it is more beneficial to use OTUs rather than taxonomic assignment. Alpha and beta diversity could be used to explain biological diversity of OTUs. Alpha diversity is diversity within a specific sample whereas beta diversity represents the diversity between samples within a microbial community structure. The diversity can be explained using richness, the number of different species, and evenness, distribution of species in OTU.

Necessary Resources

Hardware

  • 64-bit computer running Linux, Mac OS X, or Windows operating system

Software

  • Mothur (mothur, RRID:SCR_011947)

Files

  • Name software and datafiles (produced in Basic Protocol 1 and Basic Protocol 2)
  • Fasta and count files (from Basic Protocol 1)
  • Taxonomic assignment (from Basic Protocol 2)

1.Determine OTU abundances and taxonomically classify OTUs.

Note
OTU classification requires that (1) a distance matrix is calculated between sequence pairs and (2) sequences are clustered by distance. The dist.seqs command creates a distance matrix and any distances >0.03 will not be included in matrix. The cluster command determines the OTUs at difference distances. A shared file provides the abundance of each OTU by sample and is produced using the make.shared command. The classify.otu command identifies the consensus taxonomic classification for OTUs.

  • dist.seqs(fasta=final.fasta, cutoff=0.03)

  • cluster(column=final.dist, count=final.count_table)

  • make.shared(list=final.opti_mcc.list,count=final.count_table, label=0.03)

  • classify.otu(list=final.opti_mcc.list,count=final.count_table, taxonomy=final.reference.wang.pick.taxonomy, label=0.03)

2.Create a file for input into applications for visualization.

Note
There are programs that will allow for further analysis and visualization of the results. The make.biom files creates a biom file using your shared file and the OTU taxonomic classification. Biom files can be imported into external programs such as MEGAN (RRID:SCR_011942; Huson & Weber, 2013; Huson et al., 2016) to perform additional analysis and create plots.

  • make.biom(shared= final.opti_mcc.shared, constaxonomy=final.opti_mcc.0.03.cons.taxonomy)

3.OTU richness estimation:

Note
When sampling a new environment, in order to get a sense of the richness, you can calculate the number of new OTUs observed with each new sample. With a small number of samples, you would expect many new OTUs added. As number of OTUs levels offs and few new species are detected, the accuracy of your species richness increases. Generate a rarefaction curve to estimate the OTUs richness with this command:

  • rarefaction.single(shared=final.opti_mcc.shared)

4.Diversity calculation:

Note
Alpha diversity combines OTU richness and evenness to determine the diversity of a sample. Samples with a few dominating OTUs will have a lower diversity than samples with the same number of OTUs with equally distributed abundances. To determine the OTU diversity we use the summary.single command. The calc option determines the diversity distance metrics to be calculated including Shannon and Inverse-Simpson (Fig. 2).

  • summary.single(shared= final.opti_mccshared, calc=npshannon-invsimpson)

Operational taxonomic unit (OTU) diversity. Diversity of mock community samples using the (A) Inverse-Simpson and (B) Shannon calculation methods.
Operational taxonomic unit (OTU) diversity. Diversity of mock community samples using the (A) Inverse-Simpson and (B) Shannon calculation methods.

5.Sample comparison:

Note
Pairwise distances can be calculated pairwise between samples using the dist.shared command. The calc option determines the diversity distance metrics to be calculated. In this example, we are using Jaccard coefficient, the Yue and Clayton theta, and the Bray-Curtis index to determine OTU diversity. Using the resultant distance matrixes, samples can be visualized by plotting principal coordinates (PCoA) or non-metric multidimensional scaling (NMDS; Fig. 3).

  • dist.shared(shared=final.opti_mcc.shared, calc=thetayc-braycurtis)

  • pcoa(phylip=final.opti_mcc.thetayc.0.03.lt.dist)

  • nmds(phylip=final.opti_mcc.thetayc.0.03.lt.dist, mindim=2, maxdim=5)

Plot of (A) the principal coordinates (PCoA) and (B) non-metric multidimensional scaling (NMDS) of the 4 mock community samples.
Plot of (A) the principal coordinates (PCoA) and (B) non-metric multidimensional scaling (NMDS) of the 4 mock community samples.

COMMENTARY

Background Information

Basic Protocol 1

Data quality control and filtering is one of the less glamorous but one of the most important steps of bioinformatics data analysis. The steps laid out in this protocol are designed to remove data with poor data quality or are artifacts of the assay. These quality control metrics are based on previously reported studies (Kozich, Westcott, Baxter, Highlander, & Schloss, 2013, Schloss et al., 2011).

Basic Protocol 2

The important considerations for taxonomic classification are the reference database and taxonomic classification of those sequences. The mothur resources include the Greengenes, NCBI, and RDP databases. The newest mothur formatted Greengenes, RDP, and SILVA databases were generated in 2013, 2016, and 2018, respectively. The SILVA resources also include the taxonomic classifications of RDP and Greengenes and are probably the most comprehensive (Glöckner et al., 2017). Furthermore, the SILVA taxonomic classification examines anomalies compared to the other classification schema. Taxonomic classification gives a coarse overview of the overall community structure, while overall community structure, diversity, and comparison should be made at the OTU level.

Basic Protocol 3

OTU analysis allows for a deeper examination of the community structure and diversity estimate. OTU determination provides a reference-independent way of classifying sequences into “taxonomic groups.” The default method used is the opti-mcc, which has been shown to improve classification over less computationally intensive methods like Vsearch (Westcott & Schloss, 2017). If the dataset is very large, the computational time can be reduced by splitting up sequences by a taxonomic rank (order, class, or family) for the OTU classification using the cluster.split command, which can replace the dist.seqs and cluster commands in the protocol. The biom file can be imported into packages such as MEGAN (RRID:SCR_011942) for downstream analysis and visualization. Community comparisons between samples can include comparing metrics of diversity or dimension reduction methods such as NMDS or PoCA.

Critical Parameters

Basic Protocol 1

The mothur website includes a wiki for all commands that can be found at the mothur website: https://www.mothur.org/wiki/. The code allows for the user to set the number of processors when the option is: processors=X, where X is the number of CPUs. For many commands, mothur will use the maximum available processors. Mothur will add the command name to the file names at each step.

Basic Protocol 2

The remove.lineage step should be tailored to your particular experiment. In the above example, we remove sequences that classify as chloroplast, mitochondria, unknown, archaea, and eukaryota, which are appropriate for 16S rRNA primers targeting eubacteria. However, mothur can also be used to examine 16S sequences from primers targeting archaea where you would not want to remove archaea classification. Alternatively, if contamination is discovered by sequencing a negative control sample, the classified sequences of the negative control can be removed with this command.

Basic Protocol 3

The label parameter is the distance used to cluster OTUs. Roughly a distance of 0.03 translates to 97% identify. OTUs can be clustered at any distance or only for unique sequences. The 0.03 threshold is typically for genus/species level clustering.

Troubleshooting

Basic Protocol 1

Mothur usually gives an informative error message. Typical issues are related to names of groups (samples), where the characters are limited to alphanumeric characters and underscores “_”. Mothur also will generate a warning or error message if the multiple input files do not match in content. Otherwise, there is a forum where users can post questions and see common errors posted by other users (forum.mothur.org).

Guidelines for Understanding Results

Upon successful execution of the steps described in the protocols above, one obtains the abundances of different taxa at different taxonomic ranks; the abundance of OTU at a distance of 0.03; and several metrics for sample comparison including alpha diversity, sample distances using the Yue and Clayton theta and the Bray-Curtis index; and a biom file that can be loaded into external analysis packages like MEGAN.

Here are several considerations that are important for interpreting the results:

  1. Because every effort is taken to remove sequences with errors, this could result in the loss of more than half of the initially sequenced reads.
  2. A rarefaction curve can help determine whether the samples were sequenced to enough depth in order to accurately survey the species richness. If the number of species does not level off, then the sequence depth is too low.
  3. Because of the nature of environmental sample collection, it is possible that there is some contamination, for example, in the sample collection, storage, or sample preparation. It is recommended that a negative control sample is “collected” and processed in the same manner as real samples.
  4. Without the entire 16S gene, any classification of species is error-prone. Also, many bacterial species can have pathogenic and commensal strains with similar or identical 16S rRNA gene sequences. Therefore, it will be very difficult to use 16S rRNA sequencing to reliably differentiate between these different strains without other evidence.

Suggestions for Further Analysis

One package for comparing samples is called UniFrac (Lozupone, Lladser, Knights, Stombaugh, & Knight, 2011). UniFrac takes a phylogenetic tree generated from the 16S rRNA sequences and uses the tree structure to calculate distance between samples, not only on what sequences are shared but it also weights the distances—not on sequence differences but distances on the tree. The phylogenetic tree can be generated using the alignment generated in mothur as a fasta file, which is the input for the dist.seqs command. In mothur, a tree can be generated using the clearcut command and unifrac distances can be generated using the unifrac.weighted and unifrac.unweighted commands.

For statistical analysis to identify differentially abundant OTUs, mothur includes the algorithms for lefse (RRID:SCR_014609) and metastats (RRID:SCR_014610).

Acknowledgments

The funding for this work is provided by Cancer Prevention and Research Institute of Texas (RP150596).

Literature Cited

  • Balvočiute, M., & Huson, D. H. (2017). SILVA, RDP, Greengenes, NCBI and OTT—how do these taxonomies compare? BMC Genomics , 18, 114. doi: 10.1186/s12864-017-3501-4.
  • Chakravorty, S., Helb, D., Burday, M., Connell, N., & Alland, D. (2007). A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. Journal of Microbiological Methods , 69, 330–339. doi: 10.1016/j.mimet.2007.02.005.
  • Clarridge, J. E. (2004). Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clinical Microbiology Reviews , 17(4), 840–862. doi: 10.1128/CMR.17.4.840-862.2004.
  • Debelius, J., Song, S. J., Vazquez-Baeza, Y., Xu, Z. Z., Gonzalez, A., & Knight, R. (2016). Tiny microbes, enormous impacts: What matters in gut microbiome studies? Genome Biology , 17(1), 217. doi: 10.1186/s13059-016-1086-x.
  • Fouhy, F., Clooney, A. G., Stanton, C., Claesson, M. J., & Cotter, P. D. (2016). 16S rRNA gene sequencing of mock microbial populations-impact of DNA extraction method, primer choice and sequencing platform. BMC Microbiology , 16(1), 123. doi: 10.1186/s12866-016-0738-z.
  • Glöckner, F. O., Yilmaz, P., Quast, C., Gerken, J., Beccati, A., Ciuprina, A., … Ludwig, W. (2017). 25 years of serving the community with ribosomal RNA gene reference databases and tools. Journal of Biotechnology , 261, 169–176. doi: 10.1016/j.jbiotec.2017.06.1198.
  • Huson, D. H., Beier, S., Flade, I., Górska, A., El-Hadidi, M., Mitra, S., … Tappu, R. (2016). MEGAN community edition - interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Computational Biology , 12(6), e1004957. doi: 10.1371/journal.pcbi.1004957.
  • Huson, D. H., & Weber, N. (2013). Microbial community analysis using MEGAN. Methods in Enzymology , 531, 465–485. doi: 10.1016/B978-0-12-407863-5.00021-6.
  • Janda, J. M., & Abbott, S. L. (2007). 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: Pluses, perils, and pitfalls. Journal of Clinical Microbiology , 45(9), 2761–2764. doi: 10.1128/JCM.01228-07.
  • Kim, M., Morrison, M., & Yu, Z. (2011). Evaluation of different partial 16S rRNA gene sequence regions for phylogenetic analysis of microbiomes. Journal of Microbiological Methods , 84(1), 81–87. doi: 10.1016/j.mimet.2010.10.020.
  • Klindworth, A., Pruesse, E., Schweer, T., Peplies, J., Quast, C., Horn, M., & Glöckner, F. O. (2013). Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Research , 41(1), e1–e1. doi: 10.1093/nar/gks808.
  • Kozich, J. J., Westcott, S. L., Baxter, N. T., Highlander, S. K., & Schloss, P. D. (2013). Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the miseq illumina sequencing platform. Applied and Environmental Microbiology , 79(17), 5112–5120. doi: 10.1128/AEM.01043-13.
  • Lim, M. Y., Song, E. J., Kim, S. H., Lee, J., & Nam, Y. D. (2018). Comparison of DNA extraction methods for human gut microbial community profiling. Systematic and Applied Microbiology , 41(2), 151–157. doi: 10.1016/j.syapm.2017.11.008.
  • Lozupone, C., Lladser, M. E., Knights, D., Stombaugh, J., & Knight, R. (2011). UniFrac: An effective distance metric for microbial community comparison. ISME Journal , 5(2), 169. doi: 10.1038/ismej.2010.133.
  • Santiago, A., Panda, S., Mengels, G., Martinez, X., Azpiroz, F., Dore, J., … Manichanh, C. (2014). Processing faecal samples: A step forward for standards in microbial community analysis. BMC Microbiology , 14(1), 112. doi: 10.1186/1471-2180-14-112.
  • Schloss, P. D., Gevers, D., & Westcott, S. L. (2011). Reducing the effects of PCR amplification and sequencing artifacts on 16s rRNA-based studies. PLoS One , 6(12), e27310. doi: 10.1371/journal.pone.0027310.
  • Wagner Mackenzie, B., Waite, D. W., & Taylor, M. W. (2015). Evaluating variation in human gut microbiota profiles due to DNA extraction method and inter-subject differences. Frontiers in Microbiology , 6, 130. doi: 10.3389/fmicb.2015.00130.
  • Wesolowska-Andersen, A., Bahl, M. I., Carvalho, V., Kristiansen, K., Sicheritz-Pontén, T., Gupta, R., & Licht, T. R. (2014). Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomic analysis. Microbiome , 2(1), 19. doi: 10.1186/2049-2618-2-19.
  • Westcott, S. L., & Schloss, P. D. (2017). OptiClust, an improved method for assigning amplicon-based sequence data to operational taxonomic units. mSphere , 2(2), e00073–17. doi: 10.1128/mSphereDirect.00073-17.
  • Xue, M., Wu, L., He, Y., Liang, H., & Wen, C. (2018). Biases during DNA extraction affect characterization of the microbiota associated with larvae of the Pacific white shrimp, Litopenaeus vannamei. PeerJ , 6, e5257. doi: 10.7717/peerj.5257.
  • Yang, B., Wang, Y., & Qian, P. Y. (2016). Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis. BMC Bioinformatics , 17(1), 135. doi: 10.1186/s12859-016-0992-y.
  • Yuan, S., Cohen, D. B., Ravel, J., Abdo, Z., & Forney, L. J. (2012). Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS One , 7(3), e33865. doi: 10.1371/journal.pone.0033865.

Citing Literature

Number of times cited according to CrossRef: 33

  • Lin Hu, Yajuan Xu, Jingjing Li, Miao Zhang, Zongzong Sun, Yanjie Ban, Xin Tian, Dong Liu, Lulu Hu, Gut microbiome characteristics of women with hypothyroidism during early pregnancy detected by 16S rRNA amplicon sequencing and shotgun metagenomic, Frontiers in Cellular and Infection Microbiology, 10.3389/fcimb.2024.1369192, 14 , (2024).
  • Chenxi Zhu, Weiwei Lv, Shuang Hong, Mingming Han, Weiguo Song, Chengbin Liu, Chunxia Yao, Qichen Jiang, Gradual effects of gradient concentrations of perfluorooctane sulfonate on the antioxidant ability and gut microbiota of red claw crayfish (Cherax quadricarinatus), Science of The Total Environment, 10.1016/j.scitotenv.2024.172962, (172962), (2024).
  • Gabriela Feix Pereira, Taiah Rajeh Rosin, Bibiana Braga, Harry Pilz Junior, Gertrudes Corção, Evaluation of biofilm inhibition and detachment by commercial biocides in sulfate-reducing bacteria consortia from oil fields, Journal of Water Process Engineering, 10.1016/j.jwpe.2024.105547, 63 , (105547), (2024).
  • Jianting Fan, Siqun Li, Chong Li, Dongping Chen, Peipei Zhu, Jingya Yu, Meiqi Ma, Gut microbiota in a leaf beetle enhance the toxicity of insecticide Dursban to host, Industrial Crops and Products, 10.1016/j.indcrop.2024.119692, 222 , (119692), (2024).
  • Md Nasir Uddin, Yasmin Akter, Mohammad Al-baruni Chowdhury, Kazuyuki Shimizu, Lolo Wal Marzan, Exploring alkaline serine protease production and characterization in proteolytic bacteria Stenotrophomonas maltophilia: Insights from real-time PCR and fermentation techniques, Biocatalysis and Agricultural Biotechnology, 10.1016/j.bcab.2024.103186, 58 , (103186), (2024).
  • Zhuangzhuang Gao, Peiwang Li, Changzhu Li, Ruichang Tang, Minghuai Wang, Jingzhen Chen, Yan Yang, Zhenxiang He, Zhihong Xiao, Yingzi Ma, Yunzhu Chen, Identification, functional annotation, and isolation of phosphorus-solubilizing bacteria in the rhizosphere soil of Swida wilsoniana (Wanger) Sojak, Applied Soil Ecology, 10.1016/j.apsoil.2023.105207, 194 , (105207), (2024).
  • Zhenbin Zhang, Yiquan Sun, Xinhuang Zhong, Jun Zhu, Sihan Yang, Yalan Gu, Xiang Yu, Yue Lu, Zhiqi Lu, Xuezhao Sun, Mengzhi Wang, Dietary crude protein and protein solubility manipulation enhances intestinal nitrogen absorption and mitigates reactive nitrogen emissions through gut microbiota and metabolome reprogramming in sheep, Animal Nutrition, 10.1016/j.aninu.2024.04.003, 18 , (57-71), (2024).
  • Jie Yang, Zhe Du, Caihong Huang, Wei Li, Beidou Xi, Lin Zhu, Xinxin Wu, Dynamics of microbial functional guilds involved in the humification process during aerobic composting of chicken manure on an industrial scale, Environmental Science and Pollution Research, 10.1007/s11356-024-32390-2, 31 , 14, (21044-21056), (2024).
  • Zhalaga Ao, Juan Xia, Honoka Seino, Katsuhiro Inaba, Yukitsugu Takahashi, Chie Hayakawa, Hideaki Hirai, Isamu Maeda, Adaptations of Potential Nitrogenase Activity and Microbiota with Long-Term Application of Manure Compost to Paddy Soil, Environments, 10.3390/environments10060103, 10 , 6, (103), (2023).
  • Ya-Zhen Chen, Wan-Tao Rong, Ying-Can Qin, Lin-Yuan Lu, Jing Liu, Ming-Jie Li, Lei Xin, Xiao-Dong Li, De-Long Guan, Integrative analysis of microbiota and metabolomics in chromium-exposed silkworm (Bombyx mori) midguts based on 16S rDNA sequencing and LC/MS metabolomics, Frontiers in Microbiology, 10.3389/fmicb.2023.1278271, 14 , (2023).
  • Qun Lan, Yuju Lian, Peiya Peng, Long Yang, Heng Zhao, Peng Huang, Haiming Ma, Hongjiang Wei, Yulong Yin, Mei Liu, Association of gut microbiota and SCFAs with finishing weight of Diannan small ear pigs, Frontiers in Microbiology, 10.3389/fmicb.2023.1117965, 14 , (2023).
  • Heng Dai, Yiyu Zhang, Wen Fang, Juan Liu, Jun Hong, Chaowang Zou, Jin Zhang, Microbial community structural response to variations in physicochemical features of different aquifers, Frontiers in Microbiology, 10.3389/fmicb.2023.1025964, 14 , (2023).
  • Sufang Zhao, Renju Liu, Juan Wang, Shiwei Lv, Benjuan Zhang, Chunming Dong, Zongze Shao, Biodegradation of polyethylene terephthalate (PET) by diverse marine bacteria in deep‐sea sediments, Environmental Microbiology, 10.1111/1462-2920.16460, 25 , 12, (2719-2731), (2023).
  • Xiaowen Yu, Xueyu Gao, Li Shang, Xiaoyu Wang, Yutian Jiao, Xiao-Hua Zhang, Xiaochong Shi, Spatial and temporal change determined co-occurrence networks stability and community assembly processes of epipelagic seawater microbial community in the Nordic Sea, Science of The Total Environment, 10.1016/j.scitotenv.2022.160321, 859 , (160321), (2023).
  • N. Venkata Raju, Jithin S. Sunny, Daniel Andrew Gideon, Karuganti Sukumar, Safia Riaz, Sarfraz Nawaz, Asad Syed, Rajalakshmanan Eswaramoorthy, Prabhat Kumar Pankaj, Abhinav Parashar, Deciphering the influence of soil and feed on the nutritional status of ruminants in rainfed areas using metagenomic analysis, Journal of King Saud University - Science, 10.1016/j.jksus.2023.102601, 35 , 4, (102601), (2023).
  • A. Lin, X. Yan, R. Xu, H. Wang, Y. Su, W. Zhu, Effects of lactic acid bacteria-fermented formula milk supplementation on colonic microbiota and mucosal transcriptome profile of weaned piglets, animal, 10.1016/j.animal.2023.100959, 17 , 9, (100959), (2023).
  • Ying Yao, Yanju Wang, Qiang Liu, Ying Li, Junwei Yan, Mechanism of HMBR in Reducing Membrane Fouling under Different SRT: Effect of Sludge Load on Microbial Properties, Membranes, 10.3390/membranes12121242, 12 , 12, (1242), (2022).
  • Claudio Costantini, Emilia Nunzi, Luigina Romani, From the nose to the lungs: the intricate journey of airborne pathogens amid commensal bacteria, American Journal of Physiology-Cell Physiology, 10.1152/ajpcell.00287.2022, 323 , 4, (C1036-C1043), (2022).
  • Jeferson Menezes Lourenco, Christina Breanne Welch, Using microbiome information to understand and improve animal performance, Italian Journal of Animal Science, 10.1080/1828051X.2022.2077147, 21 , 1, (899-913), (2022).
  • Weihong Zhang, Chunxia Jiang, Lu Chen, Geetika Bhagwat, Palanisami Thava, Yuyi Yang, Spatial turnover of core and occasional bacterial taxa in the plastisphere from a plateau river, China, Science of The Total Environment, 10.1016/j.scitotenv.2022.156179, 838 , (156179), (2022).
  • Atif Khurshid Wani, Priyanka Roy, Vijay Kumar, Tahir ul Gani Mir, Metagenomics and artificial intelligence in the context of human health, Infection, Genetics and Evolution, 10.1016/j.meegid.2022.105267, 100 , (105267), (2022).
  • Dan Chen, Ruyu Bai, Huimin Yong, Shuai Zong, Changhai Jin, Jun Liu, Improving the digestive stability and prebiotic effect of carboxymethyl chitosan by grafting with gallic acid: In vitro gastrointestinal digestion and colonic fermentation evaluation, International Journal of Biological Macromolecules, 10.1016/j.ijbiomac.2022.06.170, 214 , (685-696), (2022).
  • Lu Li, Chunyan Peng, Zicong Yang, Yu He, Meng Liang, Hongmin Cao, Qinghua Qiu, Jingjing Song, Youlu Su, Bin Gong, Microbial communities in swamps of four mangrove reserves driven by interactions between physicochemical properties and microbe in the North Beibu Gulf, China, Environmental Science and Pollution Research, 10.1007/s11356-021-18134-6, 29 , 25, (37582-37597), (2022).
  • Zhilong Lu, Yanling Wu, Ying Chen, Xiaoling Chen, Renzhi Wu, Qi Lu, Dong Chen, Ribo Huang, Role of spt23 in Saccharomyces cerevisiae thermal tolerance, Applied Microbiology and Biotechnology, 10.1007/s00253-022-11920-3, 106 , 9-10, (3691-3705), (2022).
  • Hend Altaib, Kohei Nakamura, Mayuko Abe, Yassien Badr, Emiko Yanase, Izumi Nomura, Tohru Suzuki, Differences in the Concentration of the Fecal Neurotransmitters GABA and Glutamate Are Associated with Microbial Composition among Healthy Human Subjects, Microorganisms, 10.3390/microorganisms9020378, 9 , 2, (378), (2021).
  • Hai-Ning Sun, Chun-Mei Yu, Hui-Hui Fu, Peng Wang, Zai-Guang Fang, Yu-Zhong Zhang, Xiu-Lan Chen, Fang Zhao, Diversity of Marine 1,3-Xylan-Utilizing Bacteria and Characters of Their Extracellular 1,3-Xylanases, Frontiers in Microbiology, 10.3389/fmicb.2021.721422, 12 , (2021).
  • Chiqian Zhang, Ke Qin, Ian Struewing, Helen Buse, Jorge Santo Domingo, Darren Lytle, Jingrang Lu, The Bacterial Community Diversity of Bathroom Hot Tap Water Was Significantly Lower Than That of Cold Tap and Shower Water, Frontiers in Microbiology, 10.3389/fmicb.2021.625324, 12 , (2021).
  • Sara García-Davis, Carolina P. Reyes, Irene Lagunes, José M. Padrón, Eugenio Fraile-Nuez, José J. Fernández, Ana R. Díaz-Marrero, Bioprospecting Antiproliferative Marine Microbiota From Submarine Volcano Tagoro, Frontiers in Marine Science, 10.3389/fmars.2021.687701, 8 , (2021).
  • J.Y. Shang, Y. Wu, B. Huo, L. Chen, E.T. Wang, Y. Sui, W.F. Chen, C.F. Tian, W.X. Chen, X.H. Sui, Potential of Bradyrhizobia inoculation to promote peanut growth and beneficial Rhizobacteria abundance, Journal of Applied Microbiology, 10.1111/jam.15128, 131 , 5, (2500-2515), (2021).
  • Ying Li, Wei Chen, Xiaoying Zheng, Qiang Liu, Wei Xiang, Jixiang Qu, Chengfang Yang, Microbial community structure analysis in a hybrid membrane bioreactor via high-throughput sequencing, Chemosphere, 10.1016/j.chemosphere.2021.130989, 282 , (130989), (2021).
  • Veronica Ricci, Davide Carcione, Simone Messina, Gualtiero I. Colombo, Yuri D’Alessandra, Circulating 16S RNA in Biofluids: Extracellular Vesicles as Mirrors of Human Microbiome?, International Journal of Molecular Sciences, 10.3390/ijms21238959, 21 , 23, (8959), (2020).
  • Shujing Sun, Fan Li, Xin Xu, Yunchao Liu, Xuqiang Kong, Jianqiu Chen, Ting Liu, Liding Chen, Study on the community structure and function of symbiotic bacteria from different growth and developmental stages of Hypsizygus marmoreus, BMC Microbiology, 10.1186/s12866-020-01998-y, 20 , 1, (2020).
  • Qingguo Chen, Bo Bao, Yijing Li, Mei Liu, Baikang Zhu, Jun Mu, Zhi Chen, Effects of marine oil pollution on microbial diversity in coastal waters and stimulating indigenous microorganism bioremediation with nutrients, Regional Studies in Marine Science, 10.1016/j.rsma.2020.101395, 39 , (101395), (2020).

推荐阅读

Nature Protocols
Protocols IO
Current Protocols