Stranded Mapping from Oriented Long Reads

David A Eccles

Published: 2022-03-09 DOI: 10.17504/protocols.io.b54iq8ue

Abstract

This protocol demonstrates how to map strand-oriented long reads to a genome, and visualise them in a genome browser.

The general idea is to use minimap2 to create stranded BAM files, which are split for forward/reverse orientation then converted into BigWig format for display in a genome browser.

Input(s) :

  • stranded fastq files (see protocol Preparing Reads for Stranded Mapping)

  • a FASTA file containing the genome / sequence of interest. Output(s):

  • Genome-mapped stranded BAM files

  • Genome-mapped stranded BigWig files

Before start

You will need access to the following free and open-source software program(s):

And the following additional data file(s):

  • a FASTA file containing the genome / sequence of interest.

Steps

Orient Reads

1.

Orient reads as per protocol Preparing Reads for Stranded Mapping.

If this has been done, then the following command should produce output without errors:

for bc in $(awk '{print $2}' barcode_counts.txt);
  do ls oriented/${bc}_reads_dirAdjusted.fq.gz;
done
```Example output:

oriented/BC03_reads_dirAdjusted.fq.gz oriented/BC04_reads_dirAdjusted.fq.gz oriented/BC05_reads_dirAdjusted.fq.gz oriented/BC06_reads_dirAdjusted.fq.gz oriented/BC07_reads_dirAdjusted.fq.gz oriented/BC08_reads_dirAdjusted.fq.gz




Index Preparation

2.

Prepare genome index for spliced alignment

Software

ValueLabel
minimap2NAME
LinuxOS_NAME
https://github.com/lh3/minimap2REPOSITORY
Heng LiDEVELOPER
https://github.com/lh3/minimap2/releasesLINK
2.16VERSION
minimap2 -d mmus_ucsc_all-splice.idx -Q -t 10 -x splice mmus_ucsc_all.fa

Read Mapping

3.

Map the long reads to the genome using minimap2, using samtools to covert to a sorted BAM format. This is where the reverse complementing done during demultiplexing gives a big saving of effort. As this BAM file is one of the main outputs, the run name is added to the file name.

Software

ValueLabel
SAMtoolsNAME
LinuxOS_NAME
https://github.com/samtools/samtoolsREPOSITORY
Wellcome Trust Sanger InstituteDEVELOPER
http://www.htslib.org/download/LINK
1.8VERSION
runName="CHANGE_THIS";
mkdir -p mapped;
for bc in $(awk '{print $2}' barcode_counts.txt);
  do echo ${bc};
  minimap2 -t 10 -a -x splice mmus_ucsc_all-splice.idx oriented/${bc}_reads_dirAdjusted.fq.gz | \
    samtools view -b | samtools sort > mapped/mm2_${runName}_called_${bc}_vs_MmusG.bam;
done

Creating BigWig Coverage Files

4.

mpileupDC.pl

A bedGraph of coverage is created using samtools mpileup and mpileupDC.pl, excluding any skipped intronic sequence. When 'mpileupDC.pl' is provided with a single file, it will output a bedGraph file with a header line starting with '##'; this header line is removed.

To simplify output naming in later steps, the run name is added to the file name.

The particular JBrowse plugin that I use for stranded display requires that the reverse strand have negative coverage values, so that file needs to be changed:

runName="CHANGE_THIS";
for bc in $(awk '{print $2}' barcode_counts.txt);
  do echo ${bc};
  samtools view -b -F 0x10 mapped/mm2_called_${bc}_vs_MmusG.bam | \
    samtools mpileup -A -B -Q 0 -q 0 -I -q 0 -Q 0 - | \
    mpileupDC.pl | tail -n +2 > mapped/mm2_${runName}_called_${bc}_vs_MmusG.bg.plus
  samtools view -b -f 0x10 mapped/mm2_called_${bc}_vs_MmusG.bam | \
    samtools mpileup -A -B -Q 0 -q 0 -I -q 0 -Q 0 - | \
    mpileupDC.pl | tail -n +2 > mapped/mm2_${runName}_called_${bc}_vs_MmusG.bg.minus
  perl -i -pe 's/([0-9]+)$/-$1/' mapped/mm2_${runName}_called_${bc}_vs_MmusG.bg.minus
done;
```As an alternative, BEDTools can be used to generate coverage. The default options for BEDTools treat sequence deletions (which happen frequently in nanopore reads) as a drop in coverage, which can make exon hunting and coverage calculation more difficult. I submitted [a pull request](https://github.com/arq5x/bedtools2/pull/868) to add an additional  _ignoreD_  parameter to the command line to allow cDNA reads with split coverage across introns to ignore deletions when considering coverage; this request has now been incorporated into the main BEDtools repository (as of v2.30.0):




Software

| Value | Label |
| --- | --- |
| BEDtools | NAME |
| Linux | OS_NAME |
| https://github.com/arq5x/bedtools2 | REPOSITORY |
| Aaron Quinlan | DEVELOPER |
| https://github.com/arq5x/bedtools2/releases/tag/v2.30.0 | LINK |
| 2.30.0 | VERSION |





runName="CHANGE_THIS"; for bc in Extra open brace or missing close brace(awk '{print 2}' barcode_counts.txt); do echo Missing superscript or subscript argument{bc}; samtools view -b mapped/mm2_{runName}calledMissing superscript or subscript argument{bc}_vs_MmusG.bam | \ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam - > mapped/mm2_{runName}calledMissing superscript or subscript argument{bc}_vs_MmusG.bg.plus samtools view -b mapped/mm2_{runName}calledMissing superscript or subscript argument{bc}_vs_MmusG.bam | \ bedtools genomecov -bga -strand '-' -split -ignoreD -ibam - > mapped/mm2_{runName}called/-Missing superscript or subscript argument1/' mapped/mm2_{runName}called${bc}_vs_MmusG.bg.minus done;

5.

Stranded bedgraph files are converted to bigwig. This requires BEDTools and a genome information file containing chromosome lengths (one for Mmus/mm10 is attached to this step). As this bigwig file is one of the main outputs, the run name is added to the file name.

runName="CHANGE_THIS";
for bc in $(awk '{print $2}' barcode_counts.txt);
  do echo ${bc};
  basename="mapped/mm2_${runName}_called_${bc}_vs_MmusG"
  bedGraphToBigWig ${basename}.bg.plus Mmus_genome.chrInfo.txt ${basename}.bw.plus
  bedGraphToBigWig ${basename}.bg.minus Mmus_genome.chrInfo.txt ${basename}.bw.minus 
done
```[Mmus_genome.chrInfo.txt](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.b54iq8ue/bgevjpt61.txt)







JBrowse Configuration

6.

Each track should have its own JBrowse configuration section using the StrangedBigWig class and StrandedXYPlot type. BAM tracks can also be added. An example is shown here:

[tracks.bam-CG005_Nov18_BC03-track]
# settings for what data is shown in the track
storeClass     = JBrowse/Store/SeqFeature/BAM
urlTemplate    = raw/mm2_called_CG005AB_BC03_vs_MmusG.bam
baiUrlTemplate = raw/mm2_called_CG005AB_BC03_vs_MmusG.bam.bai
chunkSizeLimit = 10000000
maxHeight      = 3000
# settings for how the track looks
category = MinION / Alignments
type = JBrowse/View/Track/Alignments2
key  = Minimap2 alignments from 4T1.ρ0#C [CG005]

[tracks.bw-CG005_Nov18_BC03-both-track]
storeClass     = StrandedPlotPlugin/Store/SeqFeature/StrandedBigWig
urlTemplate    = bw/mm2_called_CG005AB_BC03_vs_MmusG.bw
category       = MinION / Coverage
type           = StrandedPlotPlugin/View/Track/Wiggle/StrandedXYPlot
key            = Minimap2 coverage from 4T1.ρ0#C [CG005]
scale          = log
scoreType      = maxScore
autoscale      = global
style.pos_color = #228B22
style.neg_color = lightskyblue

[tracks.BWCG004-4T1-BC04-both-track ] storeClass = StrandedPlotPlugin/Store/SeqFeature/StrandedBigWig urlTemplate = bw/mm2_called_CG004_BC04_vs_MmusG.bw category = MinION - Coverage type = StrandedPlotPlugin/View/Track/Wiggle/StrandedXYPlot key = MinION minimap2 coverage from CG004-4T1-WT (combined strands) scale = log scoreType = maxScore autoscale = global style.pos_color = darkred style.neg_color = darkgreen






./makeMinIONTemplate.sh mm2_called_CG005AB_BC03_vs_MmusG CG005_Nov18_BC03 '4T1.ρ0#C [CG005]' >> tracks.conf




[makeMinIONTemplate.sh](https://static.yanyin.tech/literature_test/protocol_io_true/protocols.io.b54iq8ue/cnn7jpt63.sh) 





Sanity Check

7.

If this has worked properly, then mapping human or mouse to the mitochondrial genome should show most expression appearing on the positive strand, with a small scattering of negative-strand expression, a bit like the Expected Results shown here.

If not, check for the following issues:

  • Tracks not displaying at all in JBrowse -- make sure track IDs inside square brackets are of the form [ tracks.-track ]
  • JBrowse track is reflected in the X axis -- make sure that the reverse bedgraph file is orientated the correct way; it should be created with the '-f 0x10' flag (no capitalisation).
  • JBrowse track only shows one direction -- make sure that the reverse bedgraph file has negative values, and re-generate the bigwig file

Citation
Stranded BigWig JBrowse tracks for four samples, demonstrating log expression on the mitochondrial chromosome.
Stranded BigWig JBrowse tracks for four samples, demonstrating log expression on the mitochondrial chromosome.

推荐阅读

Nature Protocols
Protocols IO
Current Protocols
扫码咨询