Generating Sequencing Depth and Coverage Map for Organelle Genomes
Yang Ni, Jingling Li, Chang Zhang, Chang Liu
Disclaimer
Abstract
The success in development of molecular markers, phylogenetic analysis, and genetic engineering relies heavily on high quality organelle genomes. Any errors in the assembly of these genomes can result in inaccurate downstream analysis. In this article, we present a complete protocol for assessing the assembly quality of organelle genomes using sequencing depth and coverage plot. The protocol consists of nine steps that can be divided into three sections, allowing users to map sequence reads to the assembly, calculate sequence depth and coverage, and plot the data using a custom script, respectively. We provide detailed instructions for setting up the computational environment and running the analytical software tools. This protocol is particularly suitable for users with little bioinformatic experience and will play a vital role in ensuring the high assembly quality of organelle genomes.
Before start
All commonly used linux operating systems: CentOS and Ubuntu
Conda (version 4.7.12)
Sratoolkit (version 2.5.7)
BWA (version v0.7.17)
SAMTOOLS (version 1.16.1)
Attachments
Steps
Generating Sequencing Depth and Coverage Map for Organelle Genomes
1. Set up the working environment
1.1 Download the Miniconda installation script for Python 3.9 by running the following command in the terminal: (add the installation steps for the window platform!!!)
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
1.2 Install at least the following software:
1.3 Install the following software (Optional):
$ conda install -c bioconda -c conda-forge hisat2 star gmap tophat novoalign bowtie2
1.4 Install the python package if you need to visualize the sequencing depth and coverage map created with the script provided in this protocol (Optional).
2. Download the test data and reference genome
2.1 Download the reference genome ( MT872375.fasta ) $ wget "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=MT872375&rettype=fasta&retmode=text" -O MT872375.fasta# Download the reference genome with the accession number MT872375
2.2 Download the test data ( SRR12597239)
3. Create the index for the reference genome # In the original directory $ bwa index MT872375.fasta# Use BWA to create an index file for the reference genome# Generate new .amb, .ann, .bwt, .pac, and .sa files
4. Align the raw paired-end reads to the reference genome and generate a SAM file $ bwa mem -t 10 MT872375.fasta SRR12597239.lite.1_1.fastq SRR12597239.lite.1_2.fastq > out.sam# Align using the BWA MEM algorithm, with -t 10 specifying 10 threads, input files are paired-end reads, output as out.sam
5. Convert the SAM file to a BAM file $ samtools view -bS out.sam > out.bam# Use samtools to convert the SAM file to a more efficient BAM format, -b for BAM output, -S for SAM input
6. Filter BAM data $ samtools view -bF 12 out.bam > out.12F.bam# Filter paired-end reads where both reads align to the reference genome (FLAG value of 12 indicates one of the paired-end reads did not align to the reference genome)
7. Sort the BAM File $ samtools sort -@ 8 out.12F.bam -o sort_out.12F.bam##Sort reads aligned to the reference genome, -o for specifying output file name, -T for temporary file name, -@ 0 for using 8 thread#Sort paired-end reads where both reads align to the reference genome. The parameters are the same as those above.
8. Convert BAM data to TXT $samtools depth sort_out.12F.bam > sort_out.12F.bam.txt# Calculate the sequencing depth for paired-end reads where both reads align to the reference genome, output the results to a text file
9. Draw the Genomic sequencing depth and coverage map $ python Draw_SequencingDepth.py sort_out.12F.bam.txt Depth # Input data: sort_out.12F.bam.txt; Output JGP format figure prefixation: Depth # The Draw_SequencingDepth.py script was provided in the supplementary file 1. Before using this script, please install the python packages described in subsection 1.4. The final result is shown in Figure 1.