Smar2C2: A Simple and Efficient Protocol for the Identification of Transcription Start Sites

Andrew Murray, Andrew Murray, Christopher Vollmers, Christopher Vollmers, Robert J. Schmitz, Robert J. Schmitz

Published: 2023-03-22 DOI: 10.1002/cpz1.705

Abstract

Promoters and the noncoding sequences that drive their function are fundamental aspects of genes that are critical to their regulation. The transcription preinitiation complex binds and assembles on promoters where it facilitates transcription. The transcription start site (TSS) is located downstream of the promoter sequence and is defined as the location in the genome where polymerase begins transcribing DNA into RNA. Knowing the location of TSSs is useful for annotation of genes, identification of non-coding sequences important to gene regulation, detection of alternative TSSs, and understanding of 5′ UTR content. Several existing techniques make it possible to accurately identify TSSs, but are often difficult to perform experimentally, require large amounts of input RNA, or are unable to identify a large number of TSSs from a single sample. Many of these protocols take advantage of template switching reverse transcriptases (TSRTs), which reliably place an adaptor at the 5′ end of a first strand synthesis of cDNA. Here, we introduce a protocol that exploits TSRT activity combined with rolling circle amplification to identify TSSs with several unique advantages over existing methods. Sequence adaptors are placed on the 5′ and 3′ end of the full-length cDNA copy of a transcript. A splint compatible with those adaptors is then used to circularize the full-length cDNA. Linear DNA containing concatemers of the cDNA are generated using rolling circle amplification, and a sequencing library is formed by fragmenting the concatemers. This protocol is straightforward to execute, requiring limited bench time with relatively stable reagents. Using extremely low amounts of RNA input, this protocol produces large numbers of accurate, deduplicated TSSs genome wide. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.

Basic Protocol 1 : Splint generation

Basic Protocol 2 : RNA extraction

Basic Protocol 3 : cDNA synthesis

Basic Protocol 4 : cDNA circularization and amplification

Basic Protocol 5 : Library generation

INTRODUCTION

Promoters and the noncoding sequence that comprise them are critical to the normal function of a gene. They serve as the site where the transcription preinitiation complex assembles and binds, as well as contain regulatory noncoding elements that can influence the overall transcription rate of a gene. These regulatory noncoding sequences are oriented around the transcription start site (TSS), which serves as the first nucleotide transcribed into RNA by the elongating RNA polymerase. Essential and regulatory sequences can be positioned both upstream and downstream of the TSS, and for some of these sequences, the relative position to the TSS is critical to both their discovery and function. Knowing the general location of transcription initiation is often insufficient towards characterizing the noncoding sequences that drive transcription initiation. This knowledge is especially critical to research in plant biology, which lacks TSS data in many species critical to agriculture and research.

The template switching reverse transcriptase (TSRT) is a reverse transcriptase that deposits a few ectopic cytosines when reaching the end of the transcript (Kulpa, Topping, & Telesnitsky, 1997). These C's allow for the anchoring of a template switching oligo (TSO) matching the C's and containing a short adaptor at the 5′ end of a transcript. This has been used extensively for the generation of full-length cDNA (Zhu, Machleder, Chenchik, Li, & Siebert, 2001), and has been adapted several times for the identification of TSSs (Batut & Gingeras, 2013; Islam et al., 2012; Policastro, Raborn, Brende, & Zentner, 2020). To efficiently generate libraries for next-generation sequencing, we incorporated rolling circle amplification into the TSRT protocol to generate short fragments of DNA that contain internal adaptors identifying TSSs.

We have successfully generated libraries with as little as 40 picograms of RNA extracted from fresh plant tissue, suggesting that the total RNA needed to successfully generate a library is extremely low in comparison to some existing methods for TSS identification. RNA is then circularized using a splint, and excess linear DNA is digested using a blend of 5′ and 3′ exonucleases. Circular DNA is then amplified into long linear fragments using rolling circle amplification and the phi29 DNA polymerase. The linear fragments are then broken into small pieces, and sequencing adaptors are attached. After sequencing, the fragments containing TSS adaptors are bioinformatically extracted using the 5′ adaptor and used to identify the TSSs.

The protocol described here is broken down into several main steps. In Basic Protocol 1 we describe the formation of the splint, which is used downstream in the circularization reaction. In Basic Protocol 2 we describe how RNA is extracted and quantified. In Basic Protocol 3 we describe how the RNA is reverse transcribed using the TSRT, and how the adaptor is attached to the 5′ end of the transcript using the TSO (Fig. 1). In Basic Protocol 4 we describe how the cDNA is circularized and amplified with rolling circle amplification (Fig. 1), and in Basic Protocol 5 we describe how to produce a sequencing library from the linear DNA generated in the previous step (Fig. 1).

Overview and workflow for smar2C2. (A) cDNA is generated using a poly-dT primer containing an adaptor, which anneals to the poly-A tail of mRNA transcripts. (B) A TSRT generates a full-length cDNA transcript and deposits roughly three non-templated cytosines at the end of the transcript. (C) The non-templated cytosines then serve as a binding site for the TSO, which contains a second adaptor and a unique molecular identifier. (D) The TSO is used as a template to generate the full-length cDNA flanked by adaptors at the TSS and transcription termination site. (E) These adaptors are then used to amplify the cDNA and generate double stranded full-length cDNA. (F) The full-length cDNA is circularized using a linker sequence that matches the adaptors deposited at the 5′ and 3′ end of the transcript. (G) The circular DNA is then amplified using rolling circle amplification, and (H) the linear concatemer that is converted into a library using tagmentation. (I) The final library is sequenced, and (J) the sequences containing TSS adaptors are bioinformatically extracted and mapped to the reference genome.
Overview and workflow for smar2C2. (A) cDNA is generated using a poly-dT primer containing an adaptor, which anneals to the poly-A tail of mRNA transcripts. (B) A TSRT generates a full-length cDNA transcript and deposits roughly three non-templated cytosines at the end of the transcript. (C) The non-templated cytosines then serve as a binding site for the TSO, which contains a second adaptor and a unique molecular identifier. (D) The TSO is used as a template to generate the full-length cDNA flanked by adaptors at the TSS and transcription termination site. (E) These adaptors are then used to amplify the cDNA and generate double stranded full-length cDNA. (F) The full-length cDNA is circularized using a linker sequence that matches the adaptors deposited at the 5′ and 3′ end of the transcript. (G) The circular DNA is then amplified using rolling circle amplification, and (H) the linear concatemer that is converted into a library using tagmentation. (I) The final library is sequenced, and (J) the sequences containing TSS adaptors are bioinformatically extracted and mapped to the reference genome.

CAUTION: All reactions should be completed using appropriate laboratory protective equipment including gloves, safety glasses, and a lab coat.

STRATEGIC PLANNING

Because completing this protocol involves manipulation of RNA, researchers should take care to ensure that reagents and equipment are RNase-free. We recommend using dedicated filtered pipette tips as well as dedicated RNase-free reagents until reverse transcription is complete. It is also advisable to use RNase-free workspaces and proper wet bench techniques to ensure that samples are not contaminated.

Basic Protocol 1: SPLINT GENERATION

The splint is used downstream for the circularization reaction, but it is easiest to generate it ahead of time as it can be stored long term. The original design of the splint includes the use of a unique molecular identifier (UMI), which is not necessary for this experiment. However, it does increase potential uses for the construct, such as allowing for deduplication of reads as described in the original R2C2 protocol (Volden et al., 2018).The original R2C2 methodology sequenced concatemerized full-length cDNA using the Oxford Nanopore Technologies (ONT) technology, and the inclusion of the UMI in the splint allows for deduplication when sequencing a full concatemer with their existing pipeline. At the end of this reaction you should have full-length primers with the initial primer sequences removed via the Select-a-Size kit.

Materials

  • Q5 High-Fidelity 2× Master Mix (NEB cat. no. M0492S)

  • Forward splint primer (see oligonucleotides list in recipe)

  • Reverse splint primer (see oligonucleotides list in recipe)

  • Zymo Select-a-Size DNA Clean & Concentrator Kit (Zymo Research cat. no. D4080)

  • 95% ethanol (EtOH)

  • Eppendorf tubes (2 ml centrifuge tubes)

  • PCR tubes

  • Microcentrifuge (>10,000 × g)

  • Thermocycler

Perform splint generation

1.Set up the initial PCR reaction in PCR tubes.

  • 12.5 µl High-Fidelity 2× Master Mix
  • 1 µl 100 µM Forward Splint Primer
  • 1 µl 100 µM Reverse Splint Primer
  • 10.5 µl Water

2.Run an initial extension reaction with the following conditions:

  1. 95°C for 3 min

  2. 98°C for 1 min

  3. 62°C for 1 min

  4. 72°C for 6 min

  5. Cool to 4°C

3.Using the Zymo Select-a-Size DNA Clean & Concentrator Kit add 85 µl EtOH to 500 µl DNA Binding Buffer and mix via pipetting.

Note
The amount of EtOH relative to the binding buffer is specific to the size of the fragment you want to purify. If you modify the size of the linker then the ethanol needs to be adjusted as well. Please refer to Zymo Select-a-Size DNA Clean & Concentrator Kit protocol (https://static.yanyin.tech/literature/current_protocol/10.1002/cpz1.705/attachments/_d4080_select-a-size_dna_clean_concentrator.pdf) for instructions on how to adjust EtOH concentration for varying linker sizes.

4.Bring the PCR reaction up to 100 µl with DNA elution buffer.

5.Add the sample to the DNA binding buffer and mix thoroughly by pipetting.

6.Transfer the mixture to a Zymo-Spin IC-S column in a collection tube. Centrifuge 30 s at 10,000 × g , room temperature, and discard the flow-through.

7.Add 700 µl DNA wash buffer to the column and centrifuge 30 s at 10,000 × g , room temperature. Discard the flow-through.

8.Add 200 µl DNA wash buffer to the column and centrifuge 60 s at 10,000 × g , room temperature. Discard the flow-through and the collection tube.

Note
If you get any ethanol on the column or suspect any potential contamination, re-spin the column in the collection tube for 30 s at 10,000 × g, room temperature.

9.Transfer the column to an Eppendorf tube, add 10 µl DNA elution buffer directly to the column matrix and incubate for 1 minute at room temperature. Centrifuge 30 s at 10,000 × g , room temperature, to elute the DNA.

Note
Be sure not to disturb the column matrix when pipetting the elution buffer.

10.Measure the final DNA concentration to ensure reaction success.

Note
If the reaction failed, you will observe a very low concentration of DNA. In our experience, successful reactions usually contained between 900 and 1100 ng of DNA, while failed reactions contained between 90 and 130 ng of DNA. We recommend using a Qubit fluorometer (Thermo Fisher Scientific Inc.) to measure total DNA to allow for better control of total splint added downstream.

Basic Protocol 2: RNA EXTRACTION

The goal of this step is to extract full-length RNA transcripts with as little breakage and degradation as possible. Although most RNA extraction methods will likely work for this protocol, we opted to use a commercial column extraction kit for several reasons. Primarily, we were concerned with obtaining intact full-length mRNA, as any breaks would appear as an artifact TSS within the gene body. To help facilitate the isolation of intact full-length mRNA, we used a rapid protocol that reduced the time in between RNA extraction and cDNA synthesis. In addition, the on-column DNase I digestion allowed for easy degradation of genomic DNA, which causes artifacts when the poly-dT primer binds with natural poly-A tracts in genomic DNA. Finally, the TSRT requires very low RNA input levels and will fail if too much RNA is added to the reaction, so column purification provides more than enough RNA. While we do not know the exact RNA level that will stop working, we suggest not using over 400 ng of total input.

Be sure that all reagents and materials used for RNA extraction are RNase-free. The easiest way to do this is to use dedicated reagents and disposable materials for all RNA protocols that are known to be RNase-free upon purchase, and keep those materials sealed when not in use to avoid contamination. It is advisable to use dedicated filter tips for all pipetting, and to thoroughly clean the workstation to remove RNase contamination. It is advisable to immediately continue on to cDNA generation after RNA extraction to avoid degradation, but we have generated high-quality libraries with RNA stored for up to one month at −80°C. This section mostly follows the “Tough-to-Lyse” protocol provided by Monarch with minor specifications, additions, and commentary as we extracted RNA from plant tissue frozen with liquid nitrogen and ground in a mortar and pestle. Be sure to perform all initial preparatory steps outlined in the kit instructions. Very little tissue is required for this protocol, with successful libraries having been generated with as little as 40 pg of total RNA input.

Materials

  • RNaseZap (Thermo Fisher cat. no. AM9780)

  • Liquid nitrogen

  • Plant sample

  • Monarch Total RNA Miniprep Kit (NEB cat. no. T2010S)

  • Qubit RNA BR Assay Kit (Thermo Fisher cat. no. Q10210)

  • ≥95% EtOH

  • Mortar and pestle

  • Eppendorf tubes (1.5 ml centrifuge tubes)

  • Vortex shaker

  • Refrigerated (4°C) microcentrifuge (16,000 × g)

Perform RNA extraction

1.Clean all surfaces and the mortar and pestle with RNaseZAP or another RNase decontamination solution.

Note
Avoiding RNase contamination with the sample is key to a successful experiment, be sure to decontaminate the workspace, gloves, and any tools that will contact the lysed sample with an RNase decontamination solution. As this protocol is designed to initially generate full-length cDNA, breaks in RNA may lead to artifacts in the final experiment.

2.Pre-cool a mortar and pestle using liquid nitrogen. Place the plant sample into the mortar and slowly add liquid nitrogen. Grind the tissue into a fine powder for 3 min, adding more liquid nitrogen if necessary.

Note
Other methods of lysing the sample, such as a bead mill, will likely be successful as well. Ensuring that the sample remains as RNase-free as possible will increase the success of the reaction.

3.Add the ground sample to 800 µl 1× DNA/RNA protection reagent pre-cooled in an Eppendorf tube on ice and vortex to mix thoroughly.

Note
Keeping the sample cool from this point onward will improve RNA stability, decreasing potential artifacts and increasing experimental success. Variable amounts of total tissue input can be used, but it is advisable to keep your total tissue below 100 mg. Regardless of the total amount of tissue used, 800 µl of 1× DNA/RNA protection reagent can be used.

4.Spin 2 min at 16,000 × g , room temperature, to pellet debris. Transfer supernatant to a pre-cooled gDNA removal column (light blue) fitted with a collection tube.

Note
There is no reason to try and optimize the total RNA extracted from the sample in most cases, as smar2C2 libraries can be prepared with as little as 40 pg of total RNA. Be sure to not pipette any precipitate, as it is fine to leave a large amount of supernatant behind. 400 µl is sufficient for downstream applications, as after the addition of an equal volume of ethanol only 800 µl will be able to be added to the next column.

5.Spin 30 s at 16,000 × g , room temperature, to remove the genomic DNA.

Note
Be sure to save the flow-through in the collection tube, discarding the gDNA removal column.

6.Add an equal volume of EtOH and mix thoroughly by pipetting (do not vortex).

7.Transfer the mixture to a pre-cooled RNA purification column (dark blue) and collection tube. Spin 30 s at 16,000 × g , room temperature, and discard the flow-through.

8.Add 500 µl of RNA wash buffer and spin 30 s at 16,000 × g , room temperature, and discard flow-through.

9.Combine 5 µl DNase I with 75 µl DNase I reaction buffer and pipet directly to the top of the column matrix. Incubate at room temperature for 15 min.

Note
While the on-column DNase I digestion step is optional, we highly recommend performing this step. Even while performing a DNase I digestion we are able to occasionally see artifacts presumably from the poly-dT primer binding to poly-A tracts in the genomic DNA.

10.Add 500 µl RNA priming buffer and spin 30 s at 16,000 × g , room temperature. Discard the flow-through.

11.Add 500 µl RNA wash buffer and spin for 30 s at 16,000 × g , room temperature. Discard the flow-through.

12.Add 500 µl RNA wash buffer and spin for 2 min at 16,000 × g , room temperature. Discard the flow-through.

13.Spin the column for 1 min at 16,000 × g , room temperature.

Note
While this step is optional, it is likely wise to ensure no ethanol contamination in the final elution

14.Add 50 µl nuclease-free water directly to the center of the column matrix and spin for 30 s at 16,000 × g , room temperature.

Note
This volume can be adjusted anywhere between 30 and 100 µl

15.Place the eluted RNA on ice.

Note
When dealing with RNA one of the best ways of improving experimental success is keeping your sample cool.

16.Measure total RNA extracted using Qubit RNA Broad Range Assay Kit.

Note
A nanodrop or other similar RNA quantification test will likely provide enough accuracy for this application. We have found that the addition of too much RNA into the reverse transcription reaction can inhibit experimental success, making moderately accurate knowledge of input important. It is advisable to proceed directly to Basic Protocol 3, but if necessary RNA samples can be frozen at −20°C for use the next day or −80°C for more long term storage.

Basic Protocol 3: cDNA SYNTHESIS

The goal of this step is to generate the cDNA using the TSRT and then attach the TSO containing the adaptor to the additional C's deposited at the end of the first-strand cDNA transcript. TSRTs can introduce an error called strand invasion (Tang et al., 2013) where the TSO binds with the first strand of cDNA before the reverse transcriptase has completed first-strand synthesis, usually at a site with sequence complementarity to the oligo. These artifacts can be removed bioinformatically, but methods should also be adjusted to attempt to reduce potential problems. We have opted to add a step where the poly-dT primer is annealed to the mRNA before the TSO is added. Continue to use RNase-free reagents and materials until after first-strand cDNA synthesis using the TSRT.

Materials

  • SMARTScribe reverse transcriptase (Takara Bio cat. no. 639538)

  • 5× first-strand buffer (Takara Bio cat. no. 639538)

  • 20 mM dithiothreitol (DTT) (Takara Bio cat. no. 639538)

  • TSO (100 µM) (see oligonucleotides list in recipe)

  • ISPCR primer (10 µM) (see oligonucleotides list in recipe)

  • Poly-dT primer (10 µM) (see oligonucleotides list in recipe)

  • Deoxynucleotide (dNTP) solution mix (10 µM) (NEB cat. no. N0447S)

  • RNase-free water

  • RNase H (5000 U/ml) (NEB cat. no. M0297S)

  • Q5 High-Fidelity 2× Master Mix (NEB cat. no. M0492S)

  • Monarch PCR & DNA Cleanup Kit (NEB cat. no. T1030S)

  • Thermocycler

  • Eppendorf tubes (1.5-ml centrifuge tubes)

  • Vortex shaker

  • Refrigerated (4°C) microcentrifuge (16,000 × g)

  • Heat block

Perform cDNA synthesis

1.Set up your initial sample solution using:

  • 400 ng of sample
  • 2 µl poly-dT primer
  • 1 µl dNTPs
  • Water to 6 µl

2.Place the solution on a heat block at 74°C for 3 min, then remove the reaction and place it on ice.

3.Add to your reaction:

  • 2 µl 5× SMART buffer
  • 1 µl DTT
  • 0.5 µl TSO
  • 0.5 µl SMARTScribe reverse transcriptase

4.Using a thermocycler, set the reaction to:

  1. 42°C for 90 min

  2. 75°C for 15 min

  3. Cool to 4°C

5.To your reaction add:

  • 12.5 µl Q5 High-Fidelity 2× Master Mix
  • 2 µl ISPCR primer
  • 1 µl RNase H

6.Using a thermocycler, set the reaction to:

  1. 37°C fo 15 min

  2. 95°C for 1 min

  3. 65°C for 10 min

  4. 98°C for 45 s

  5. 98°C for 10 s

  6. 63°C for 30 s

  7. 72°C for 3 min

  8.         Repeat steps e-g six times

        The initial step at 37°C is to allow for RNase digestion of the remaining annealed RNA. The 10 min spent at 65°C is to allow for second-strand cDNA synthesis. The PCR amplification following should be done a minimal number of times. If duplicates become a problem downstream, four cycles of PCR are likely sufficient, though the effects of this have not been tested.

7.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 8 µl elution buffer.

Note
Any PCR cleanup commonly used is likely sufficient. Eluting into 8 µl allows for easy input into subsequent circularization reactions and conservation of circularization reagents but is not critical to the protocol. It is possible to pause after this step and store cDNA at 4°C for use in the near future or −20°C for more long-term storage.

Basic Protocol 4: cDNA CIRCULARIZATION AND ROLLING CIRCLE AMPLIFICATION

The goal of this step is to generate circular DNA from the cDNA fragments and amplify them into linear concatemers using rolling circle amplification. The phi29 polymerase used in this protocol is a high-fidelity DNA polymerase that is extremely processive and well suited to rolling circle amplification. Circularization and rolling circle amplification is what allows this protocol to generate fragments of DNA that contain the TSS. It contains the added benefit that once a fragment of DNA is circularized, any linear DNA is digested away using effective, inexpensive, and stable exonucleases. A wide variety of exonucleases that are specific to linear DNA ensure complete digestion of all non-circularized DNA, with exonuclease I showing preference for linear single strand DNA, exonuclease III showing preference for linear double stranded DNA, and lambda exonuclease showing preference for nicked and 5′-phosphorylated double-stranded linear DNA post digestion, thus the library should only contain circular sequences generated using the sequence similarity present on the adaptors on the cDNA and the previously generated splint.

Materials

  • NEBuilder HiFi DNA Assembly Master Mix (NEB cat. no. E2621)

  • DNA splint (Basic Protocol 1)

  • NEBuffer 2 (NEB cat. no. B7002S)

  • Lambda exonuclease (5000 U/ml) (NEB cat. no. M0262S)

  • Exonuclease I (E. coli) (20,000 U/ml) (NEB cat. no. M0293S)

  • Exonuclease III (E. coli) (100,000 U/ml) (NEB cat. no. M0206S)

  • Monarch PCR & DNA Cleanup Kit (NEB cat. no. T1030S)

  • Exo-resistant random primer (Thermo Fisher Scientific cat. no. SO181)

  • phi29 DNA polymerase (10,000 U/ml) (NEB cat. no. M0269S)

  • phi29 DNA polymerase reaction buffer (10×) (NEB cat. no. M0269S)

  • Deoxynucleotide (dNTP) solution mix (10 µM) (NEB cat. no. N0447S)

  • Thermocycler

  • Eppendorf tubes (1.5 ml centrifuge tubes)

  • PCR tube

  • Vortex shaker

  • Refrigerated (4°C) microcentrifuge (16,000 × g)

  • Heat block

cDNA circularization and rolling circle amplification

1.In a PCR tube combine:

  • 200 ng splint (from the previous step)
  • 10 µl 2× NEBuilder HiFi DNA Assembly Master Mix
  • 8 µl sample
  • Add water to 20 µl

Note
NEBuilder is very effective at generating circular fragments using the cDNA and splint, but other methods of circularization via splint ligation are likely sufficient as well. If your sample is in more than 8 µl due to your cleanup step, increase the total volume of the reaction to up to 42.5 µl by adding equal amounts of NEBuilder.

2.Incubate the reaction at 50°C for 60 min.

3.Add to the reaction:

  • 5 µl NEBuffer2
  • 1 µl lambda exonuclease
  • 1 µl exonuclease I
  • 0.5 µl exonuclease III
  • Add water to 50 µl

4.Incubate the reaction at 37°C for 60 min, 80°C for 20 min, and then cool to 4°C or place on ice.

Note
The 60 min at 37°C can be extended to up to six hours depending on the needs of the experimenter, though most digestion should be done in the first hour. Heating to 80°C for 20 min is to inactivate the nucleases to ensure no activity once linear DNA is generated via rolling circle amplification.

5.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 20 µl elution buffer.

Note
Any DNA PCR cleanup method is likely sufficient at this step, simply ensure that your cleanup does not specifically remove circular DNA.

6.Add to the eluted circular DNA.

  • 2.5 µl dNTP
  • 2.5 µl exo-resistant random primer
  • 5 µl 10× phi29 buffer
  • 1 µl phi29 polymerase
  • Add water to 50 µl

7.Incubate at:

  1. 30°C for 4 hr

  2. 65°C for 10 min

  3.         cool to 4°C

        The total time spent at 30°C can be adjusted significantly, and we have tested it for up to 16 hr (overnight). We have found that 4 hr have produced sufficient linear DNA for our needs, but if the yield is too low the reaction can be left running overnight.

8.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 18 µl elution buffer.

Note
While any DNA PCR cleanup method is likely sufficient for this step, we have encountered problems when using magnetic beads. Our beads tended to not stick well to the side of the tube when placed on the magnetic rack, forming a looser clump that protruded away from the side of the tube. This made it difficult to remove the reaction buffer and ethanol used in the cleanup and resulted in a significant loss of the sample.

Basic Protocol 5: LIBRARY PREPARATION

At this stage, long linear fragments of DNA need to be broken down into short fragments that can be sequenced, and sequencing adaptors need to be attached. Any protocol that is commonly used to sequence genomic DNA on Illumina sequencers will likely work at this stage. We have used Tn5 loaded with sequencing adaptors to generate libraries (with an equivalent product referenced in the materials) using a tagmentation protocol, but this is not critical to the success of the protocol.

Materials

  • Tn5 loaded with sequencing adaptors (Illumina cat. no. 20034197)

  • TD buffer (see recipe)

  • Illumina Nextera barcoded primers

  • Q5® High-Fidelity 2× Master Mix (NEB cat. no. M0492S)

  • Monarch PCR & DNA Cleanup Kit (NEB cat. no. T1030S)

  • Thermocycler

  • Eppendorf tubes (1.5 ml centrifuge tubes)

  • PCR tube

  • Vortex shaker

  • Refrigerated (4°C) microcentrifuge (16,000 × g)

  • Heat block

Prepare sequencing library

1.Add to the sample:

  • 20 µl TD buffer
  • 2 µl Tn5

2.Incubate at 30°C for 30 min.

Note
This step allows for the integration of Tn5.

3.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 10 µl elution buffer.

4.Add to the reaction:

  • 12.5 µl 2× Q5 High-Fidelity 2× Master Mix
  • 1.25 µl 10 µM forward Nextera barcoded primer
  • 1.25 µl 10 µM reverse Nextera barcoded primer

5.Incubate at:

  1. 72°C for 5 min

  2. 98°C for 2 min

  3. 98°C for 10 s

  4. 63°C for 30 s

  5. 72°C for 90 s

  6.         repeat steps c-e eight times.

        The initial extension step at the start of the protocol is critical for extending off of the DNA fragment inserted by Tn5.

6.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 10 µl elution buffer.

Note
Any pre-sequencing DNA PCR cleanup should work at this stage.

REAGENTS AND SOLUTIONS

Oligonucleotides

  • Forward splint primer
    • ACTCTGCGTTGATACCACTGCTTTGAGGCTGATGAGTTCCATANNNNNTATATNNNNNATCACTACTTAGTTTTTTGATAGCTTCAAGCCAGAGTTGTCTTTTTCTCTTTGCTGGCAGTAAAAG
  • Reverse splint primer
    • ACTCTGCGTTGATACCACTGCTTAAAGGGATATTTTCGATCGCNNNNNATATANNNNNTTAGTGCATTTGATCCTTTTACTCCTCCTAAAGAACAACCTGACCCAGCAAAAGGTACACAATACTTTTACTGCCAGCAAAGAG
  • TSO-UMI
    • AAGCAGTGGTATCAACGCAGAGTACNNNNNNNNNNNNATrGrG+G
  • Poly dT primer
    • AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN
  • ISPCR_oligo
    • AAGCAGTGGTATCAACGCAGAGT

TD Buffer

  • 0.38 g/L KCl
  • 0.1 g/L Na2HPO4·7H2O
  • 8.0 g/L NaCl
  • 3.0 g/L Tris

COMMENTARY

Background Information

This protocol was adapted from R2C2 (Volden et al., 2018), which was originally designed as a technique to help improve error correction in Oxford Nanopore Technologies long-read sequencing technology by sequencing full-length cDNA concatemers to obtain a consensus sequence. It was realized that if these long concatemers were broken up into individual short read sequences they could be used to generate TSS data on any Illumina sequencing platform (Fig. 1) (Cole, Byrne, Adams, Volden, & Vollmers, 2020).

Several small modifications were made to the technique to optimize cDNA generation and rolling circle output, with less concern for retaining the integrity of the concatemer as the sequence would be fragmented regardless. The UMI sequence was originally present in the splint but was moved into the TSO to allow for a greater chance of detecting the UMI and the TSS sequence in a read fragment. We tested the technique using fresh plant leaf and root tissue.

There are many other available techniques available that make use of TSRT to attach adaptors to the 5′ end of an RNA transcript. STRIPE-seq (Policastro et al., 2020) uses a randomer (primer composed of random nucleotides) instead of a poly-dT primer, size selecting for transcripts which can be sequenced. The use of a randomer allows the technique to measure the TSS of non-polyadenylated RNA, but comes at the cost of reaction efficiency and necessitates the removal of rRNA. Single-cell tagged reverse transcription (STRT) (Islam et al., 2012) can be used on bulk libraries as well as single cell libraries (Adiconis et al., 2018), which retains the ultra-low input and ease of execution advantage of Smar2C2. For use with Illumina sequencing platforms this uses a restriction enzyme to randomly digest the full-length cDNA before ligating an adaptor to the 3′ end of the transcript (Islam et al., 2012). Although this is very effective in accurately identifying TSSs, it has been reported that when used on bulk data there are limits on the total amount of TSSs that can be obtained from a single sample that does not improve with increased input (Adiconis et al., 2018).

Critical Parameters

Clean extraction of full-length RNA is key to the success of the protocol, and any significant degradation will generate poor data. For this reason, it is important to use dedicated reagents, clean workspaces, and proper lab technique when handling RNA.

Do not input large amounts of RNA when generating cDNA from RNA using the TSRT. We recommend limiting your input to 400 ng of total RNA, and lower amounts will usually produce comparable results.

Proper linker formation is also critical to the circularization reaction. If yields are low when generating the linker, as outlined in the protocol, then it is safer to attempt linker formation a second time. Fortunately, the current protocol generates enough linkers for many library preparations, and they are very stable when stored at −20°C.

Troubleshooting

Most problems encountered in this protocol will be the result of poor RNA extractions or expired reagents. Because cDNA levels are low before circularization the first indicator of a failed reaction is little to no DNA generation after rolling circle amplification. Check the expiration on all enzymes and reagents, and ensure that all steps involving RNA are using RNase-free solutions.

When sequencing data is aligned, reads that cluster around poly-A tracts in the genome may indicate high genomic DNA contamination. If your protocol for RNA extraction does not include a genomic DNA digestion (as used in step 9 of Basic Protocol 2), consider including this in your protocol. Ensure that your DNase I is not expired and consider increasing the digestion time to ensure less contamination. While DNase digestions are not always included in RNA extraction protocols [such as with acid phenol:chloroform extraction and lithium chloride (LiCl) precipitation] (Green & Sambrook, 2019), they are highly advisable here.

Understanding Results

The number of TSSs that can be measured using this technique is largely dependent on the sequencing depth used. We were able to measure 70 million unique TSS reads from a single input (Murray, Mendieta, Vollmers, & Schmitz, 2022), and we have not reached or been able to determine a sequencing saturation point. If extremely low levels of RNA input are used then sequencing saturation may be reached earlier, but currently we do not know what these levels are.

While there are many methods that can be used to validate the results, the easiest and most readily available is likely genome annotations. Although gene annotations often do not precisely reflect true TSSs, we still expect empirically measured TSSs that are located proximally to gene annotations to cluster around the annotated TSS (Fig. 2). The degree to which this occurs may be somewhat dependent on the quality of the annotation that is available for the species being studied. However, correlations between annotations derived from RNA-seq (or comparable methods) should still be expected.

Expected Distribution of TSSs Relative to an Annotation. Although some TSSs identified via smar2C2 do not fall directly on an annotated TSS, most TSSs that are located proximal to annotated genes usually fall within 250 base pairs of an annotated TSS.
Expected Distribution of TSSs Relative to an Annotation. Although some TSSs identified via smar2C2 do not fall directly on an annotated TSS, most TSSs that are located proximal to annotated genes usually fall within 250 base pairs of an annotated TSS.

Initial processing of data (Fig. 3) usually includes identifying TSS reads, discovering individual TSS positions in the genome, and then clustering TSSs that are located in close proximity into transcriptional start regions (TSRs). We recommend using the Cutadapt software (Martin, 2011) to identify reads containing a TSS, extract them from the main sequence file, quality trim, and remove the TSS adaptor. Once the reads containing a TSS have been extracted and trimmed we opted to align them to a reference genome using STAR (Dobin et al. 2013).

A simple flow chart for data analysis. This basic data processing flowchart can be used to generate TSS data in a variety of formats for further downstream analysis.
A simple flow chart for data analysis. This basic data processing flowchart can be used to generate TSS data in a variety of formats for further downstream analysis.

This resulting BAM file can then be used to identify individual TSSs and TSRs. There are many available programs to do this (Adiconis et al., 2018; Haberle, Forrest, Hayashizaki, Carninci, & Lenhard, 2015; Thodberg, Thieffry, Vitting-Seerup, Andersson, & Sandelin, 2019). We opted to use TSRexplorer for its broad range of features and ease of execution (Policastro, Mcdonald, Brendel, & Zentner, 2021). TSRexplorer rapidly generates both single nucleotide TSS and TSR bed files using BAM file inputs generated by mapping TSS extracted from raw sequencing files.

It is worth noting that raw BAM files can be directly used to identify TSSs, as the strand and first nucleotide position are easily be extracted from individual reads. Although this lacks the convenience of software packages and makes more complicated downstream analysis more difficult, it does allow for the experimenter to directly interact with the data set if desired.

When initially interpreting the data, while the vast majority of reads should be present surrounding TSSs, a small number of reads are expected within gene bodies and some intergenic space and are usually treated as experimental noise. This is present in most TSS data sets, and has been traditionally been removed by setting a threshold of reads that any single position needed to be considered as a TSS. This threshold has previously been defined by looking at the percent of reads at each threshold value that occur at an expected location versus unexpected intergenic locations (Policastro et al., 2021).

This specific methodology can become a problem due to the high numbers of unique TSSs generated by smar2C2. Highly expressed genes can create intragenic TSSs that appear as true TSSs due to the sheer number of reads present, but only account for a small number of reads present in the gene. Setting a threshold which is high enough to remove these reads can also remove many TSS which appear to be valid from genes with lower base transcriptional levels. Our solution to this problem was to retain a threshold from which to consider individual loci, but to only consider TSRs that contain at least 10% of the total reads present at any gene annotation. This allows for a sliding scale that removes noise from genes based on their total transcription levels rather than trying to find a threshold that best suits an entire genome.

In addition to defining TSSs and TSRs, the single nucleotide position that is most prominent within a TSS can be defined, called a primary transcription start site (pTSS). This piece of information is useful for examining the positional enrichment of sequence motifs relative to the TSS, as critical promoter elements such as the TATA box are often positioned precisely relative to the TSS (Haberle & Stark, 2018). Novel motif discovery, known motif enrichment, and positional motif enrichment can be discovered using the MEME-suite software (Bailey, Johnson, Grant, & Noble, 2015), and having a single position from which to search from greatly aids in discovery.

Time Considerations

Initial RNA extraction and reverse transcription will require a dedicated block of time to complete, as it is not recommended to extract RNA and then pause before converting to cDNA. However, once cDNA has been generated it is easy and feasible to pause between most major steps of the protocol. Using a thermocycler, both the rolling circle amplification or the exonuclease digestion can be programmed and allowed to run overnight. If you find that you need an extended rolling circle amplification to achieve significant DNA output then we suggest allowing this step to run overnight for up to 16 hr.

The protocol can be completed over the course of two days, with many lengthy incubations allowing for relatively low amounts of hands on time at the bench. We have never attempted to process more then 16 samples at a single time, but can see no reason that large sample batches would not be viable and easily processed. Depending on the difficulty and protocol used for RNA extraction using large sample sizes may result into freshly extracted RNA sitting on ice for an extended period of time, which has the potential to degrade and reduce experimental output quality.

Acknowledgments

The authors would like to acknowledge the Georgia Genomics & Bioinformatics Core as well as the Georgia Advanced Computing Resource Center for research support. This research was supported by grants from the National Science Foundation (IOS-1546867 and IOS-1856627) to RJS.

Author Contributions

Andrew Murray : Conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing-original draft, writing-review and editing, Christopher Vollmers : Conceptualization, methodology, writing-review and editing, Robert J. Schmitz : Funding acquisition, methodology, project administration, resources, supervision, writing-review and editing.

Conflict of Interest

The authors declare no conflict of interest.

Open Research

Data Availability Statement

All raw and processed sequencing data generated by this technique and in previous applications of this technique have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE197144. All data used in this manuscript is available and can be accessed under this accession.

Literature Cited

  • Adiconis, X., Haber, A. L., Simmons, S. K., Levy Moonshine, A., Ji, Z., Busby, M. A., … Levin, J. Z. (2018). Comprehensive comparative analysis of 5′-end RNA-sequencing methods. Nature Methods , 15(7), 505–511. doi: 10.1038/s41592-018-0014-2
  • Bailey, T. L., Johnson, J., Grant, C. E., & Noble, W. S. (2015). The MEME Suite. Nucleic Acids Research , 43(W1), W39–W49. doi: 10.1093/nar/gkv416
  • Batut, P., & Gingeras, T. R. (2013). RAMPAGE: Promoter activity profiling by paired-end sequencing of 5’-complete cDNAs. Current Protocols in Molecular Biology , 104, 25B.11.1–25B.11.16. doi: 10.1002/0471142727.mb25b11s104
  • Cole, C., Byrne, A., Adams, M., Volden, R., & Vollmers, C. (2020). Complete characterization of the human immune cell transcriptome using accurate full-length cDNA sequencing. Genome Research , 30(4), 589–601. doi: 10.1101/gr.257188.119
  • Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., … Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England) , 29(1), 15–21. doi: 10.1093/bioinformatics/bts635
  • Green, M. R., & Sambrook, J. (2019). Removing DNA contamination from RNA samples by treatment with RNase-free DNase I. Cold Spring Harbor Protocols , 2019(10), pdb–prot101725. doi: 10.1101/pdb.prot101725
  • Islam, S., Kjällquist, U., Moliner, A., Zajac, P., Fan, J. B., Lönnerberg, P., & Linnarsson, S. (2012). Highly multiplexed and strand-specific single-cell RNA 5′ end sequencing. Nature Protocols , 7(5), 813–828. doi: 10.1038/nprot.2012.022
  • Haberle, V., Forrest, A. R., Hayashizaki, Y., Carninci, P., & Lenhard, B. (2015). CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Research , 43(8), e51. doi: 10.1093/nar/gkv054
  • Haberle, V., & Stark, A. (2018). Eukaryotic core promoters and the functional basis of transcription initiation. Nature Reviews. Molecular Cell Biology , 19(10), 621–637. doi: 10.1038/s41580-018-0028-8
  • Kulpa, D., Topping, R., & Telesnitsky, A. (1997). Determination of the site of first strand transfer during Moloney murine leukemia virus reverse transcription and identification of strand transfer-associated reverse transcriptase errors. EMBO Journal , 16(4), 856–865. doi: 10.1093/emboj/16.4.856
  • Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal , 17(1), 10–12. doi: 10.14806/ej.17.1.200
  • Murray, A., Mendieta, J. P., Vollmers, C., & Schmitz, R. J. (2022). Simple and accurate transcriptional start site identification using Smar2C2 and examination of conserved promoter features. The Plant Journal , 112(2), 583–596. doi: 10.1111/tpj.15957
  • Policastro, R. A., Mcdonald, D. J., Brendel, V. P., & Zentner, G. E. (2021). Flexible analysis of TSS mapping data and detection of TSS shifts with TSRexploreR. NAR Genomics and Bioinformatics , 3(2), lqab051. doi: 10.1093/nargab/lqab051
  • Policastro, R. A., Raborn, R. T., Brende, V. P., & Zentner, G. E. (2020). Simple and efficient profiling of transcription initiation and transcript levels with STRIPE-seq. Genome Research , 30(6), 910–923. doi: 10.1101/gr.261545.120
  • Tang, D. T., Plessy, C., Salimullah, M., Suzuki, A. M., Calligaris, R., Gustincich, S., & Carninci, P. (2013). Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching. Nucleic Acids Research , 41(3), e44. doi: 10.1093/nar/gks1128
  • Thodberg, M., Thieffry, A., Vitting-Seerup, K., Andersson, R., & Sandelin, A. (2019). CAGEfightR: analysis of 5'-end data using R/Bioconductor. BMC Bioinformatics , 20(1), 487. doi: 10.1186/s12859-019-3029-5
  • Volden, R., Palmer, T., Byrne, A., Cole, C., Schmitz, R. J., Green, R. E., & Vollmers, C. (2018). Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proceedings of the National Academy of Sciences of the United States of America , 115(39), 9726–9731. doi: 10.1073/pnas.1806447115
  • Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R., & Siebert, P. D. (2001). Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. BioTechniques , 30(4), 892–897. doi: 10.2144/01304pf02

推荐阅读

Nature Protocols
Protocols IO
Current Protocols
扫码咨询