BASIC PROTOCOL 3: Population Single Nucleotide Variant Calling
miriam.goldman, chunyu.zhao
Abstract
This protocol describes the SNV module of MIDAS2, which takes as input metagenomic sequencing reads from a set of samples and generates files with SNV genotypes for each sample for all detected species. The SNV module has two steps: (1) single-sample allele tallying with the midas2 run_snps command and (2) population SNV calling with the midas2 merge_snps command. Basic Protocols 1 (Species) and 2 (MIDASDB) should be run before this protocol.
Steps
Perform species prescreening as described in Basic Protocol 1.
Download MIDASDB as described in Basic Protocol 2.
Execute the run_snps command for each sample.
for sample_name in SRR172902 SRR172903
do
midas2 run_snps \
--sample_name ${sample_name} \
-1 reads/${sample_name}.fastq.gz \
--midasdb_name uhgg --midasdb_dir midasdb_uhgg \
--select_by median_marker_coverage,unique_fraction_covered \
--select_threshold=0,0.6 \
--num_cores 8 midas2_output
done
Prepare sample manifest file for merging pileup results across samples. We can use the same file list_of_samples.tsv generated by step 6 in Basic Protocol 1.
Upon the completion of run_snps for all the samples in the file list_of_samples.tsv, MIDAS2 compute the population SNVs with the merge_snps command.
midas2 merge_snps --samples_list list_of_samples.tsv \
--midasdb_name uhgg --midasdb_dir midasdb_uhgg \
--genome_coverage 0.4 --genome_depth 0.1 --sample_counts 2 \
--snp_type bi --num_cores 8 midas2_output/merge
Population SNV analysis has finished successfully when all the following output files are created under the directory midas2_output/merge/snps/ without any error message.