High-throughput DNA barcoding library construction and sequencing protocol for BIOSCAN using unpurified non-destructively extracted DNA from arthropods
Emma Dawson, Naomi Park, Scott Thurston, Ian Johnston, Lyndall Pereira da Conceicoa, Mara Lawniczak, Abdulrahman Tuameh, Marco M Mosca
Abstract
This SOP describes the procedure for high-throughput generation of mitochondrial cytochrome c oxidase subunit I (COI) DNA barcode amplicons using very small quantities of crude DNA extracted non-destructively (i.e., without grinding or disruption to the organism) from arthropods LysisCextractionSOPV1.pdf - Google Drive. The use of an inhibitor-tolerant polymerase enables amplification of crude lysate without purification, which can add significant cost. The first PCR amplifies the target of choice using untailed primers. Here, we target the Cytochrome Oxidase I mitochondrial locus, but in principle, the locus could be any amplicon. In a second PCR step, long read compatible 16- mer combinatorial dual indexed amplicons are then made directly from the first PCR product. Although full length indexed amplicons can be made in a single PCR step, by incorporating the use of non-tailed COI primers first, the sensitivity to low template inputs is markedly improved. Insects alone can range across three orders of magnitude in size and can be as small as 0.2 mm, so increasing sensitivity to low quantity inputs without oversequencing individuals with much greater DNA quantities is desirable. After the two step PCR is complete, as many as 9216 PCRs are then equivolume pooled and quantitated, prior to long-read library construction. This single library is then sequenced on a single Pacbio 8M SMRT Cell.
This SOP is entitled BIOSCAN as it supports the current global endeavour of the International Barcode of Life (https://ibol.org/programs/bioscan/) to massively increase species discovery using barcoding. Additionally, this SOP is being used for the Sanger BIOSCAN project to study 1M insects across the UK (https://www.sanger.ac.uk/collaboration/bioscan/).
This 2-step indexing PCR approach is an adaptation of the COVID-19 ARTIC Illumina library construction - tailed method, which can be found here:
Before start
Steps
COI amplification (PCR1)
Important! This step must be performed in a pre-PCR environment in which post PCR COI amplicons are not present, to minimise risk of sample contamination.
Input into COI amplification is unpurified non-destructively extracted DNA from arthropods.
Generate the COI primer pool (2.5micromolar (µM)
each primer) by combining the following in a 2mL Eppendorf DNA LoBind tube and vortex to mix.
A | B | C | D |
---|---|---|---|
Non-tailed COI primer | Sequence | Concentration (µM) | Volume (µl) |
LepF1 | ATTCAACCAATCATAAAGATATTGG | 100 | 40 |
LepR1 | TAAACTTCTGGATGTCCAAAAAATCA | 100 | 40 |
LCO1490 | GGTCAACAAATCATAAAGATATTGG | 100 | 40 |
HC02198 | TAAACTTCAGGGTGACCAAAAAATCA | 100 | 40 |
Qiagen EB | 1440 | ||
Total | 1600 |
COI non-tailed primer mix. Order STD purification. Pool volumes may be scaled to required sample number throughput
Prepare the following COI PCR master mix and mix thoroughly by vortexing on full power. Keep on ice whilst preparing for subsequent steps.
A | B | C |
---|---|---|
Weighted PCR Primer Pool 1 Master Mix | Vol/PCR RXN (µl) | Vol/384 plate (µl) inc. 20% excess |
COI Primer mix (2.5µM each) | 0.25 | 115 |
RepliQa HiFi ToughMix | 2.5 | 1150 |
Nuclease-free water | 2.15 | 989 |
Total | 4.9 | 2254 |
Use the SPT Labtech Dragonfly Discovery to predispense 4.9µL
mastermix per well into 384 well plates.
Select 4 x 96 well plates containing crude lysate and centrifuge at 2000rpm for 2 minutes and remove the seal
Use the SPT Labtech Mosquito LV to transfer 100nL
of crude lysate into the plate containing the COI PCR master mix maintaining the same well locations throughout. The Mosquito LV must be setup to fix the aspirate height to aspirate from the upper 50µL of the 100µL well contents. Immediately proceed to the next step.
Heat seal and mix the plate e.g. on a BioShake iQ for 1 minute at 2000rpm, and centrifuge briefly at 3000rpm.
Important! Heat seal to minimise evaporation during PCR.
Place the plates onto a thermocycler and run the following program:
A | B | C |
---|---|---|
Step | Temperature | Time |
1 | 98°C | 10 seconds |
2 | 45°C | 5 seconds |
3 | 68°C | 5 seconds |
4 | Repeat steps 1 - 3 for a total of 40 cycles | |
5 | 10°C | ∞ |
PAUSE POINT Amplified DNA can be stored at 4°C (overnight) or -20°C (up to 6 months).
Indexing amplified DNA (PCR2)
Defrost the COI indexing plates, being careful to record which index plate # is to be combined with which PCR 1 plate.
Use the SPT Labtech Mosquito LV to transfer 100nL
of COI PCR 1 product into the dual indexed plate containing the tailed primers, maintaining the same well locations throughout. Immediately proceed to the next step.
Use the SPT Labtech Dragonfly Discovery to dispense 6.25µL
of Kapa HiFi 2X Mastermix into the dual indexed plate from step 11, and place On ice
immediately. The dispense is sufficient to mix all the reagents.
Heat seal and place the plate onto a thermocycler and run the following program.
Important! Heat seal to minimise evaporation.
A | B | C |
---|---|---|
Step | Temperature | Time |
1 | 95°C | 5 minutes |
2 | 98°C | 30 seconds |
3 | 53°C | 20 minutes |
4 | 72°C | 2 minutes |
Repeat steps 2-4 once more | ||
5 | 98°C | 30 seconds |
6 | 62°C | 30 seconds |
7 | 72°C | 2 minutes |
Repeat steps 5-7 six more times | ||
8 | 72°C | 5 minutes |
9 | 10°C | ∞ |
PAUSE POINT Amplified indexed products can be stored at 4°C (overnight) or -20°C (up to 6 months).
Construction of equivolume pool
In a post-PCR lab, use a VBLOK200 reservoir to collect the entire contents of a single post indexed COI plate by upside down centrifugation at 1000rpm for 1 minute.
Transfer the contents in the reservoir to a 5mL Eppendorf tube and vortex to mix. The same VBLOK200 reservoir may be used to collect the contents of multiple plates which will eventually be pooled together (up to a maximum of 24 plates)
Optional QC step : Dilute each pool 1:10 with Elution Buffer and run directly on TapeStation High Sensitivity D5000. A single peak ~890bp is expected although the residual salts cause the sizing to run ~150bp smaller.
PAUSE POINT Pools can be stored at 4°C (overnight) or -20°C (up to 6 months).
Manually combine 30µL
of each of the 24 pools together, and mix by vortexing to form an equivolume pool of 9216 samples.
Equivolume pool SPRI bead cleanup
Allow AMPure XP beads to equilibrate to room temperature (~30 minutes). Ensure solution is homogenous prior to use.
Add 0.6X volume (300µL
) of AMPure XP beads per 500µL
of pooled product, and mix well by vortexing.
Incubate for 0h 6m 0s
at 20Room temperature
.
Transfer the tube to a magnet, allow 0h 4m 0s
for the beads to form a pellet.
Carefully remove and discard the supernatant, taking care not to disturb the bead pellet.
Wash the beads with 1000µL
75% ethanol for 0h 0m 15s
then carefully remove ethanol and discard.
(First wash)
Wash the beads with 1000µL
75% ethanol for 0h 0m 15s
then carefully remove ethanol and discard.
(Second wash)
Pulse spin the tube and return to magnet to remove residual 75% ethanol. Leave ~1 minute to dry (being careful not to overdry)
Remove tube from magnet and resuspend beads in 100µL
elution buffer, mix well by vortexing.
Incubate for 0h 3m 0s
at 20Room temperature
Transfer tube to magnet, allow 0h 5m 0s
for the beads to form a pellet.
Carefully transfer supernatant into a new tube, taking care not to disturb the bead pellet.
The clean equivolume pool may be quantified using Qubit Fluorometer, and sizing checked on TapeStation D5000.
PacBio Library Preparation and Sequencing
We currently prepare our amplicon pool for PacBio sequencing using the protocol attached below, 'Preparing SMRTbell Libraries using PacBio Barcoded Universal Primers for Multiplexing Amplicons', starting with DNA Damage Repair.
The library, containing 9216 samples, is sequenced on a SMRT Cell 8M using the Sequel IIe system.
Sample setup recommendations for sequencing amplicon libraries <3 kb:
Sequencing Primer: Sequencing Primer v4
Binding Kit: Sequel II Binding Kit 2.1
Binding Time: 1 Hour
Sequencing Kit: Sequel II Sequencing Plate 2.0
On-Plate Loading Concentration: 100 pM
Recommended Run parameters:
Movie Time (hours): 10
Pre-Extension Time (hours): 0.5
Immobilization Time (hours): 2 (default)
Analysis using mBRAVE
PacBio sequence data de-multiplexing is performed using the rapid and highly configurable mBRAVE (Multiplex Barcode Research And Visualization Environment) online analysis platform http://www.mbrave.net/. mBRAVE builds on the BOLD platform, http://www.boldsystems.org/, to support species identification and discovery.
The index set currently in use at Sanger is registered on mBRAVE as 'S
ONT Library Preparation and Sequencing
The amplicon pool generated in steps 1-32 is also compatible with Oxford Nanopore sequencing.
The amplicon pool can be prepared for Oxford Nanopore sequencing using the protocol attached below, 'Ligation sequencing amplicons V14 (SQK-LSK114)'.
The library is then sequenced on an R10.4.1 MinION flow cell (FLO-MIN114).
ligation-sequencing-amplicons-sqk-lsk114-ACDE_9163_v114_revJ_29Jun2022-gridion.pdf
Custom demultiplexing for Oxford Nanopore sequence data
Each sample was identified by a pair of index sequences: a front index fiand a rear index rj. Individual index sequences are not unique, i.e. a front index is paired with more than one rear index and vice versa (f1-sample1-r1, f2-sample2-r1, …). The pair fi + rj uniquely identifies a sample s .
Since the ONT deplexer (guppy_barcoder) cannot handle non-unique single indexes, the deplexing was customised. ONT advised us to use nanoplexer to perform custom deplexing.
Nanoplexer (v0.1.2) takes as input a fastq/fastq.gz file and a configuration file describing a set of indexes. It outputs one file per index containing the classified reads. In order to deplex the pooled samples, the software was run twice; firstly, for a rear index set R and secondly, for a front index set F. The following steps were used to deplex the sample pool:
- Deplex by rear indexes rjϵ R
- For each set of classified reads (by rj)
- Deplex the set by front indexes fi ϵ F