Stranded Mapping from Oriented Long Reads
David A Eccles
Abstract
This protocol demonstrates how to map strand-oriented long reads to a genome, and visualise them in a genome browser.
The general idea is to use minimap2 to create stranded BAM files, which are split for forward/reverse orientation then converted into BigWig format for display in a genome browser.
Input(s) :
-
stranded fastq files (see protocol Preparing Reads for Stranded Mapping)
-
a FASTA file containing the genome / sequence of interest. Output(s):
-
Genome-mapped stranded BAM files
-
Genome-mapped stranded BigWig files
Before start
Steps
Orient Reads
Orient reads as per protocol Preparing Reads for Stranded Mapping.
If this has been done, then the following command should produce output without errors:
for bc in $(awk '{print $2}' barcode_counts.txt);
do ls oriented/${bc}_reads_dirAdjusted.fq.gz;
done
```Example output:
oriented/BC03_reads_dirAdjusted.fq.gz oriented/BC04_reads_dirAdjusted.fq.gz oriented/BC05_reads_dirAdjusted.fq.gz oriented/BC06_reads_dirAdjusted.fq.gz oriented/BC07_reads_dirAdjusted.fq.gz oriented/BC08_reads_dirAdjusted.fq.gz
Index Preparation
Prepare genome index for spliced alignment
Software
Value | Label |
---|---|
minimap2 | NAME |
Linux | OS_NAME |
https://github.com/lh3/minimap2 | REPOSITORY |
Heng Li | DEVELOPER |
https://github.com/lh3/minimap2/releases | LINK |
2.16 | VERSION |
minimap2 -d mmus_ucsc_all-splice.idx -Q -t 10 -x splice mmus_ucsc_all.fa
Read Mapping
Map the long reads to the genome using minimap2, using samtools to covert to a sorted BAM format. This is where the reverse complementing done during demultiplexing gives a big saving of effort. As this BAM file is one of the main outputs, the run name is added to the file name.
Software
Value | Label |
---|---|
SAMtools | NAME |
Linux | OS_NAME |
https://github.com/samtools/samtools | REPOSITORY |
Wellcome Trust Sanger Institute | DEVELOPER |
http://www.htslib.org/download/ | LINK |
1.8 | VERSION |
runName="CHANGE_THIS";
mkdir -p mapped;
for bc in $(awk '{print $2}' barcode_counts.txt);
do echo ${bc};
minimap2 -t 10 -a -x splice mmus_ucsc_all-splice.idx oriented/${bc}_reads_dirAdjusted.fq.gz | \
samtools view -b | samtools sort > mapped/mm2_${runName}_called_${bc}_vs_MmusG.bam;
done
Creating BigWig Coverage Files
A bedGraph of coverage is created using samtools mpileup and mpileupDC.pl, excluding any skipped intronic sequence. When 'mpileupDC.pl' is provided with a single file, it will output a bedGraph file with a header line starting with '##'; this header line is removed.
To simplify output naming in later steps, the run name is added to the file name.
The particular JBrowse plugin that I use for stranded display requires that the reverse strand have negative coverage values, so that file needs to be changed:
runName="CHANGE_THIS";
for bc in $(awk '{print $2}' barcode_counts.txt);
do echo ${bc};
samtools view -b -F 0x10 mapped/mm2_called_${bc}_vs_MmusG.bam | \
samtools mpileup -A -B -Q 0 -q 0 -I -q 0 -Q 0 - | \
mpileupDC.pl | tail -n +2 > mapped/mm2_${runName}_called_${bc}_vs_MmusG.bg.plus
samtools view -b -f 0x10 mapped/mm2_called_${bc}_vs_MmusG.bam | \
samtools mpileup -A -B -Q 0 -q 0 -I -q 0 -Q 0 - | \
mpileupDC.pl | tail -n +2 > mapped/mm2_${runName}_called_${bc}_vs_MmusG.bg.minus
perl -i -pe 's/([0-9]+)$/-$1/' mapped/mm2_${runName}_called_${bc}_vs_MmusG.bg.minus
done;
```As an alternative, BEDTools can be used to generate coverage. The default options for BEDTools treat sequence deletions (which happen frequently in nanopore reads) as a drop in coverage, which can make exon hunting and coverage calculation more difficult. I submitted [a pull request](https://github.com/arq5x/bedtools2/pull/868) to add an additional _ignoreD_ parameter to the command line to allow cDNA reads with split coverage across introns to ignore deletions when considering coverage; this request has now been incorporated into the main BEDtools repository (as of v2.30.0):
Software
| Value | Label |
| --- | --- |
| BEDtools | NAME |
| Linux | OS_NAME |
| https://github.com/arq5x/bedtools2 | REPOSITORY |
| Aaron Quinlan | DEVELOPER |
| https://github.com/arq5x/bedtools2/releases/tag/v2.30.0 | LINK |
| 2.30.0 | VERSION |
runName="CHANGE_THIS";
for bc in
Stranded bedgraph files are converted to bigwig. This requires BEDTools and a genome information file containing chromosome lengths (one for Mmus/mm10 is attached to this step). As this bigwig file is one of the main outputs, the run name is added to the file name.
runName="CHANGE_THIS";
for bc in $(awk '{print $2}' barcode_counts.txt);
do echo ${bc};
basename="mapped/mm2_${runName}_called_${bc}_vs_MmusG"
bedGraphToBigWig ${basename}.bg.plus Mmus_genome.chrInfo.txt ${basename}.bw.plus
bedGraphToBigWig ${basename}.bg.minus Mmus_genome.chrInfo.txt ${basename}.bw.minus
done
```[Mmus_genome.chrInfo.txt](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.b54iq8ue/bgevjpt61.txt)
JBrowse Configuration
Each track should have its own JBrowse configuration section using the StrangedBigWig class and StrandedXYPlot type. BAM tracks can also be added. An example is shown here:
[tracks.bam-CG005_Nov18_BC03-track]
# settings for what data is shown in the track
storeClass = JBrowse/Store/SeqFeature/BAM
urlTemplate = raw/mm2_called_CG005AB_BC03_vs_MmusG.bam
baiUrlTemplate = raw/mm2_called_CG005AB_BC03_vs_MmusG.bam.bai
chunkSizeLimit = 10000000
maxHeight = 3000
# settings for how the track looks
category = MinION / Alignments
type = JBrowse/View/Track/Alignments2
key = Minimap2 alignments from 4T1.ρ0#C [CG005]
[tracks.bw-CG005_Nov18_BC03-both-track]
storeClass = StrandedPlotPlugin/Store/SeqFeature/StrandedBigWig
urlTemplate = bw/mm2_called_CG005AB_BC03_vs_MmusG.bw
category = MinION / Coverage
type = StrandedPlotPlugin/View/Track/Wiggle/StrandedXYPlot
key = Minimap2 coverage from 4T1.ρ0#C [CG005]
scale = log
scoreType = maxScore
autoscale = global
style.pos_color = #228B22
style.neg_color = lightskyblue
[tracks.BWCG004-4T1-BC04-both-track ] storeClass = StrandedPlotPlugin/Store/SeqFeature/StrandedBigWig urlTemplate = bw/mm2_called_CG004_BC04_vs_MmusG.bw category = MinION - Coverage type = StrandedPlotPlugin/View/Track/Wiggle/StrandedXYPlot key = MinION minimap2 coverage from CG004-4T1-WT (combined strands) scale = log scoreType = maxScore autoscale = global style.pos_color = darkred style.neg_color = darkgreen
./makeMinIONTemplate.sh mm2_called_CG005AB_BC03_vs_MmusG CG005_Nov18_BC03 '4T1.ρ0#C [CG005]' >> tracks.conf
[makeMinIONTemplate.sh](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.b54iq8ue/cnn7jpt63.sh)
Sanity Check
If this has worked properly, then mapping human or mouse to the mitochondrial genome should show most expression appearing on the positive strand, with a small scattering of negative-strand expression, a bit like the Expected Results shown here.
If not, check for the following issues:
- Tracks not displaying at all in JBrowse -- make sure track IDs inside square brackets are of the form [ tracks.
-track ] - JBrowse track is reflected in the X axis -- make sure that the reverse bedgraph file is orientated the correct way; it should be created with the '-f 0x10' flag (no capitalisation).
- JBrowse track only shows one direction -- make sure that the reverse bedgraph file has negative values, and re-generate the bigwig file