BASIC PROTOCOL 4: Pan-genome Copy Number Variant Calling
miriam.goldman, chunyu.zhao
Abstract
This protocol describes the CNV module of MIDAS2, which takes as input metagenomic sequencing reads from a set of samples and generates files with CNV genotypes for each sample for all detected species. There are two steps for population CNV calling: (1) single-sample quantification of copy number for each gene in the pangenome of each species with the midas2 run_genes command and (2) population CNV calling with the midas2 merge_genes command. Basic Protocols 1 (Species) and 2 (MIDASDB) should be run before this protocol.
Steps
Perform species prescreening as described in Basic Protocol 1.
Download MIDASDB as described in Basic Protocol 2.
Execute the run_genes command for each sample
#midas2 run_genes
for sample_name in SRR172902 SRR172903
do
midas2 run_genes \
--sample_name ${sample_name} \
-1 reads/${sample_name}.fastq.gz \
--midasdb_name uhgg --midasdb_dir midasdb_uhgg \
--species_list 100122,100277 \
--select_by median_marker_coverage,unique_fraction_covered \
--select_threshold=0,0.6 \
--num_cores 8 midas2_output
done
Prepare sample manifest file for merging purpose. We can use the same list_of_samples.tsv generated by step 6 in Basic Protocol 1.
Upon the completion of run_genes for all the samples listed in the list_of_samples.tsv, MIDAS2 merges the CNV profiles across samples with the merge_genes command.
midas2 merge_genes --samples_list list_of_samples.tsv \
--midasdb_name uhgg --midasdb_dir midasdb_uhgg \
--min_copy 0.5 \
--num_cores 2 midas2_output/merge
Population pangenome CNV analysis has finished successfully when all the following output files are created under the directory midas2_output/merge/genes/ without any error message