Smar2C2: A Simple and Efficient Protocol for the Identification of Transcription Start Sites
Andrew Murray, Andrew Murray, Christopher Vollmers, Christopher Vollmers, Robert J. Schmitz, Robert J. Schmitz
Cis-regulatory elements
promoter
rolling circle amplification
template switching reverse transcriptase
transcription start site
Abstract
Promoters and the noncoding sequences that drive their function are fundamental aspects of genes that are critical to their regulation. The transcription preinitiation complex binds and assembles on promoters where it facilitates transcription. The transcription start site (TSS) is located downstream of the promoter sequence and is defined as the location in the genome where polymerase begins transcribing DNA into RNA. Knowing the location of TSSs is useful for annotation of genes, identification of non-coding sequences important to gene regulation, detection of alternative TSSs, and understanding of 5′ UTR content. Several existing techniques make it possible to accurately identify TSSs, but are often difficult to perform experimentally, require large amounts of input RNA, or are unable to identify a large number of TSSs from a single sample. Many of these protocols take advantage of template switching reverse transcriptases (TSRTs), which reliably place an adaptor at the 5′ end of a first strand synthesis of cDNA. Here, we introduce a protocol that exploits TSRT activity combined with rolling circle amplification to identify TSSs with several unique advantages over existing methods. Sequence adaptors are placed on the 5′ and 3′ end of the full-length cDNA copy of a transcript. A splint compatible with those adaptors is then used to circularize the full-length cDNA. Linear DNA containing concatemers of the cDNA are generated using rolling circle amplification, and a sequencing library is formed by fragmenting the concatemers. This protocol is straightforward to execute, requiring limited bench time with relatively stable reagents. Using extremely low amounts of RNA input, this protocol produces large numbers of accurate, deduplicated TSSs genome wide. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.
Basic Protocol 1 : Splint generation
Basic Protocol 2 : RNA extraction
Basic Protocol 3 : cDNA synthesis
Basic Protocol 4 : cDNA circularization and amplification
Basic Protocol 5 : Library generation
INTRODUCTION
Promoters and the noncoding sequence that comprise them are critical to the normal function of a gene. They serve as the site where the transcription preinitiation complex assembles and binds, as well as contain regulatory noncoding elements that can influence the overall transcription rate of a gene. These regulatory noncoding sequences are oriented around the transcription start site (TSS), which serves as the first nucleotide transcribed into RNA by the elongating RNA polymerase. Essential and regulatory sequences can be positioned both upstream and downstream of the TSS, and for some of these sequences, the relative position to the TSS is critical to both their discovery and function. Knowing the general location of transcription initiation is often insufficient towards characterizing the noncoding sequences that drive transcription initiation. This knowledge is especially critical to research in plant biology, which lacks TSS data in many species critical to agriculture and research.
The template switching reverse transcriptase (TSRT) is a reverse transcriptase that deposits a few ectopic cytosines when reaching the end of the transcript (Kulpa, Topping, & Telesnitsky, 1997). These C's allow for the anchoring of a template switching oligo (TSO) matching the C's and containing a short adaptor at the 5′ end of a transcript. This has been used extensively for the generation of full-length cDNA (Zhu, Machleder, Chenchik, Li, & Siebert, 2001), and has been adapted several times for the identification of TSSs (Batut & Gingeras, 2013; Islam et al., 2012; Policastro, Raborn, Brende, & Zentner, 2020). To efficiently generate libraries for next-generation sequencing, we incorporated rolling circle amplification into the TSRT protocol to generate short fragments of DNA that contain internal adaptors identifying TSSs.
We have successfully generated libraries with as little as 40 picograms of RNA extracted from fresh plant tissue, suggesting that the total RNA needed to successfully generate a library is extremely low in comparison to some existing methods for TSS identification. RNA is then circularized using a splint, and excess linear DNA is digested using a blend of 5′ and 3′ exonucleases. Circular DNA is then amplified into long linear fragments using rolling circle amplification and the phi29 DNA polymerase. The linear fragments are then broken into small pieces, and sequencing adaptors are attached. After sequencing, the fragments containing TSS adaptors are bioinformatically extracted using the 5′ adaptor and used to identify the TSSs.
The protocol described here is broken down into several main steps. In Basic Protocol 1 we describe the formation of the splint, which is used downstream in the circularization reaction. In Basic Protocol 2 we describe how RNA is extracted and quantified. In Basic Protocol 3 we describe how the RNA is reverse transcribed using the TSRT, and how the adaptor is attached to the 5′ end of the transcript using the TSO (Fig. 1). In Basic Protocol 4 we describe how the cDNA is circularized and amplified with rolling circle amplification (Fig. 1), and in Basic Protocol 5 we describe how to produce a sequencing library from the linear DNA generated in the previous step (Fig. 1).

CAUTION: All reactions should be completed using appropriate laboratory protective equipment including gloves, safety glasses, and a lab coat.
STRATEGIC PLANNING
Because completing this protocol involves manipulation of RNA, researchers should take care to ensure that reagents and equipment are RNase-free. We recommend using dedicated filtered pipette tips as well as dedicated RNase-free reagents until reverse transcription is complete. It is also advisable to use RNase-free workspaces and proper wet bench techniques to ensure that samples are not contaminated.
Basic Protocol 1: SPLINT GENERATION
The splint is used downstream for the circularization reaction, but it is easiest to generate it ahead of time as it can be stored long term. The original design of the splint includes the use of a unique molecular identifier (UMI), which is not necessary for this experiment. However, it does increase potential uses for the construct, such as allowing for deduplication of reads as described in the original R2C2 protocol (Volden et al., 2018).The original R2C2 methodology sequenced concatemerized full-length cDNA using the Oxford Nanopore Technologies (ONT) technology, and the inclusion of the UMI in the splint allows for deduplication when sequencing a full concatemer with their existing pipeline. At the end of this reaction you should have full-length primers with the initial primer sequences removed via the Select-a-Size kit.
Materials
-
Q5 High-Fidelity 2× Master Mix (NEB cat. no. M0492S)
-
Forward splint primer (see oligonucleotides list in recipe)
-
Reverse splint primer (see oligonucleotides list in recipe)
-
Zymo Select-a-Size DNA Clean & Concentrator Kit (Zymo Research cat. no. D4080)
-
95% ethanol (EtOH)
-
Eppendorf tubes (2 ml centrifuge tubes)
-
PCR tubes
-
Microcentrifuge (>10,000 × g)
-
Thermocycler
Perform splint generation
1.Set up the initial PCR reaction in PCR tubes.
- 12.5 µl High-Fidelity 2× Master Mix
- 1 µl 100 µM Forward Splint Primer
- 1 µl 100 µM Reverse Splint Primer
- 10.5 µl Water
2.Run an initial extension reaction with the following conditions:
-
95°C for 3 min
-
98°C for 1 min
-
62°C for 1 min
-
72°C for 6 min
-
Cool to 4°C
3.Using the Zymo Select-a-Size DNA Clean & Concentrator Kit add 85 µl EtOH to 500 µl DNA Binding Buffer and mix via pipetting.
4.Bring the PCR reaction up to 100 µl with DNA elution buffer.
5.Add the sample to the DNA binding buffer and mix thoroughly by pipetting.
6.Transfer the mixture to a Zymo-Spin IC-S column in a collection tube. Centrifuge 30 s at 10,000 × g , room temperature, and discard the flow-through.
7.Add 700 µl DNA wash buffer to the column and centrifuge 30 s at 10,000 × g , room temperature. Discard the flow-through.
8.Add 200 µl DNA wash buffer to the column and centrifuge 60 s at 10,000 × g , room temperature. Discard the flow-through and the collection tube.
9.Transfer the column to an Eppendorf tube, add 10 µl DNA elution buffer directly to the column matrix and incubate for 1 minute at room temperature. Centrifuge 30 s at 10,000 × g , room temperature, to elute the DNA.
10.Measure the final DNA concentration to ensure reaction success.
Basic Protocol 2: RNA EXTRACTION
The goal of this step is to extract full-length RNA transcripts with as little breakage and degradation as possible. Although most RNA extraction methods will likely work for this protocol, we opted to use a commercial column extraction kit for several reasons. Primarily, we were concerned with obtaining intact full-length mRNA, as any breaks would appear as an artifact TSS within the gene body. To help facilitate the isolation of intact full-length mRNA, we used a rapid protocol that reduced the time in between RNA extraction and cDNA synthesis. In addition, the on-column DNase I digestion allowed for easy degradation of genomic DNA, which causes artifacts when the poly-dT primer binds with natural poly-A tracts in genomic DNA. Finally, the TSRT requires very low RNA input levels and will fail if too much RNA is added to the reaction, so column purification provides more than enough RNA. While we do not know the exact RNA level that will stop working, we suggest not using over 400 ng of total input.
Be sure that all reagents and materials used for RNA extraction are RNase-free. The easiest way to do this is to use dedicated reagents and disposable materials for all RNA protocols that are known to be RNase-free upon purchase, and keep those materials sealed when not in use to avoid contamination. It is advisable to use dedicated filter tips for all pipetting, and to thoroughly clean the workstation to remove RNase contamination. It is advisable to immediately continue on to cDNA generation after RNA extraction to avoid degradation, but we have generated high-quality libraries with RNA stored for up to one month at −80°C. This section mostly follows the “Tough-to-Lyse” protocol provided by Monarch with minor specifications, additions, and commentary as we extracted RNA from plant tissue frozen with liquid nitrogen and ground in a mortar and pestle. Be sure to perform all initial preparatory steps outlined in the kit instructions. Very little tissue is required for this protocol, with successful libraries having been generated with as little as 40 pg of total RNA input.
Materials
-
RNaseZap (Thermo Fisher cat. no. AM9780)
-
Liquid nitrogen
-
Plant sample
-
Monarch Total RNA Miniprep Kit (NEB cat. no. T2010S)
-
Qubit RNA BR Assay Kit (Thermo Fisher cat. no. Q10210)
-
≥95% EtOH
-
Mortar and pestle
-
Eppendorf tubes (1.5 ml centrifuge tubes)
-
Vortex shaker
-
Refrigerated (4°C) microcentrifuge (16,000 × g)
Perform RNA extraction
1.Clean all surfaces and the mortar and pestle with RNaseZAP or another RNase decontamination solution.
2.Pre-cool a mortar and pestle using liquid nitrogen. Place the plant sample into the mortar and slowly add liquid nitrogen. Grind the tissue into a fine powder for 3 min, adding more liquid nitrogen if necessary.
3.Add the ground sample to 800 µl 1× DNA/RNA protection reagent pre-cooled in an Eppendorf tube on ice and vortex to mix thoroughly.
4.Spin 2 min at 16,000 × g , room temperature, to pellet debris. Transfer supernatant to a pre-cooled gDNA removal column (light blue) fitted with a collection tube.
5.Spin 30 s at 16,000 × g , room temperature, to remove the genomic DNA.
6.Add an equal volume of EtOH and mix thoroughly by pipetting (do not vortex).
7.Transfer the mixture to a pre-cooled RNA purification column (dark blue) and collection tube. Spin 30 s at 16,000 × g , room temperature, and discard the flow-through.
8.Add 500 µl of RNA wash buffer and spin 30 s at 16,000 × g , room temperature, and discard flow-through.
9.Combine 5 µl DNase I with 75 µl DNase I reaction buffer and pipet directly to the top of the column matrix. Incubate at room temperature for 15 min.
10.Add 500 µl RNA priming buffer and spin 30 s at 16,000 × g , room temperature. Discard the flow-through.
11.Add 500 µl RNA wash buffer and spin for 30 s at 16,000 × g , room temperature. Discard the flow-through.
12.Add 500 µl RNA wash buffer and spin for 2 min at 16,000 × g , room temperature. Discard the flow-through.
13.Spin the column for 1 min at 16,000 × g , room temperature.
14.Add 50 µl nuclease-free water directly to the center of the column matrix and spin for 30 s at 16,000 × g , room temperature.
15.Place the eluted RNA on ice.
16.Measure total RNA extracted using Qubit RNA Broad Range Assay Kit.
Basic Protocol 3: cDNA SYNTHESIS
The goal of this step is to generate the cDNA using the TSRT and then attach the TSO containing the adaptor to the additional C's deposited at the end of the first-strand cDNA transcript. TSRTs can introduce an error called strand invasion (Tang et al., 2013) where the TSO binds with the first strand of cDNA before the reverse transcriptase has completed first-strand synthesis, usually at a site with sequence complementarity to the oligo. These artifacts can be removed bioinformatically, but methods should also be adjusted to attempt to reduce potential problems. We have opted to add a step where the poly-dT primer is annealed to the mRNA before the TSO is added. Continue to use RNase-free reagents and materials until after first-strand cDNA synthesis using the TSRT.
Materials
-
SMARTScribe reverse transcriptase (Takara Bio cat. no. 639538)
-
5× first-strand buffer (Takara Bio cat. no. 639538)
-
20 mM dithiothreitol (DTT) (Takara Bio cat. no. 639538)
-
TSO (100 µM) (see oligonucleotides list in recipe)
-
ISPCR primer (10 µM) (see oligonucleotides list in recipe)
-
Poly-dT primer (10 µM) (see oligonucleotides list in recipe)
-
Deoxynucleotide (dNTP) solution mix (10 µM) (NEB cat. no. N0447S)
-
RNase-free water
-
RNase H (5000 U/ml) (NEB cat. no. M0297S)
-
Q5 High-Fidelity 2× Master Mix (NEB cat. no. M0492S)
-
Monarch PCR & DNA Cleanup Kit (NEB cat. no. T1030S)
-
Thermocycler
-
Eppendorf tubes (1.5-ml centrifuge tubes)
-
Vortex shaker
-
Refrigerated (4°C) microcentrifuge (16,000 × g)
-
Heat block
Perform cDNA synthesis
1.Set up your initial sample solution using:
- 400 ng of sample
- 2 µl poly-dT primer
- 1 µl dNTPs
- Water to 6 µl
2.Place the solution on a heat block at 74°C for 3 min, then remove the reaction and place it on ice.
3.Add to your reaction:
- 2 µl 5× SMART buffer
- 1 µl DTT
- 0.5 µl TSO
- 0.5 µl SMARTScribe reverse transcriptase
4.Using a thermocycler, set the reaction to:
-
42°C for 90 min
-
75°C for 15 min
-
Cool to 4°C
5.To your reaction add:
- 12.5 µl Q5 High-Fidelity 2× Master Mix
- 2 µl ISPCR primer
- 1 µl RNase H
6.Using a thermocycler, set the reaction to:
-
37°C fo 15 min
-
95°C for 1 min
-
65°C for 10 min
-
98°C for 45 s
-
98°C for 10 s
-
63°C for 30 s
-
72°C for 3 min
-
Repeat steps e-g six times
The initial step at 37°C is to allow for RNase digestion of the remaining annealed RNA. The 10 min spent at 65°C is to allow for second-strand cDNA synthesis. The PCR amplification following should be done a minimal number of times. If duplicates become a problem downstream, four cycles of PCR are likely sufficient, though the effects of this have not been tested.
7.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 8 µl elution buffer.
Basic Protocol 4: cDNA CIRCULARIZATION AND ROLLING CIRCLE AMPLIFICATION
The goal of this step is to generate circular DNA from the cDNA fragments and amplify them into linear concatemers using rolling circle amplification. The phi29 polymerase used in this protocol is a high-fidelity DNA polymerase that is extremely processive and well suited to rolling circle amplification. Circularization and rolling circle amplification is what allows this protocol to generate fragments of DNA that contain the TSS. It contains the added benefit that once a fragment of DNA is circularized, any linear DNA is digested away using effective, inexpensive, and stable exonucleases. A wide variety of exonucleases that are specific to linear DNA ensure complete digestion of all non-circularized DNA, with exonuclease I showing preference for linear single strand DNA, exonuclease III showing preference for linear double stranded DNA, and lambda exonuclease showing preference for nicked and 5′-phosphorylated double-stranded linear DNA post digestion, thus the library should only contain circular sequences generated using the sequence similarity present on the adaptors on the cDNA and the previously generated splint.
Materials
-
NEBuilder HiFi DNA Assembly Master Mix (NEB cat. no. E2621)
-
DNA splint (Basic Protocol 1)
-
NEBuffer 2 (NEB cat. no. B7002S)
-
Lambda exonuclease (5000 U/ml) (NEB cat. no. M0262S)
-
Exonuclease I (E. coli) (20,000 U/ml) (NEB cat. no. M0293S)
-
Exonuclease III (E. coli) (100,000 U/ml) (NEB cat. no. M0206S)
-
Monarch PCR & DNA Cleanup Kit (NEB cat. no. T1030S)
-
Exo-resistant random primer (Thermo Fisher Scientific cat. no. SO181)
-
phi29 DNA polymerase (10,000 U/ml) (NEB cat. no. M0269S)
-
phi29 DNA polymerase reaction buffer (10×) (NEB cat. no. M0269S)
-
Deoxynucleotide (dNTP) solution mix (10 µM) (NEB cat. no. N0447S)
-
Thermocycler
-
Eppendorf tubes (1.5 ml centrifuge tubes)
-
PCR tube
-
Vortex shaker
-
Refrigerated (4°C) microcentrifuge (16,000 × g)
-
Heat block
cDNA circularization and rolling circle amplification
1.In a PCR tube combine:
- 200 ng splint (from the previous step)
- 10 µl 2× NEBuilder HiFi DNA Assembly Master Mix
- 8 µl sample
- Add water to 20 µl
2.Incubate the reaction at 50°C for 60 min.
3.Add to the reaction:
- 5 µl NEBuffer2
- 1 µl lambda exonuclease
- 1 µl exonuclease I
- 0.5 µl exonuclease III
- Add water to 50 µl
4.Incubate the reaction at 37°C for 60 min, 80°C for 20 min, and then cool to 4°C or place on ice.
5.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 20 µl elution buffer.
6.Add to the eluted circular DNA.
- 2.5 µl dNTP
- 2.5 µl exo-resistant random primer
- 5 µl 10× phi29 buffer
- 1 µl phi29 polymerase
- Add water to 50 µl
7.Incubate at:
-
30°C for 4 hr
-
65°C for 10 min
-
cool to 4°C
The total time spent at 30°C can be adjusted significantly, and we have tested it for up to 16 hr (overnight). We have found that 4 hr have produced sufficient linear DNA for our needs, but if the yield is too low the reaction can be left running overnight.
8.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 18 µl elution buffer.
Basic Protocol 5: LIBRARY PREPARATION
At this stage, long linear fragments of DNA need to be broken down into short fragments that can be sequenced, and sequencing adaptors need to be attached. Any protocol that is commonly used to sequence genomic DNA on Illumina sequencers will likely work at this stage. We have used Tn5 loaded with sequencing adaptors to generate libraries (with an equivalent product referenced in the materials) using a tagmentation protocol, but this is not critical to the success of the protocol.
Materials
-
Tn5 loaded with sequencing adaptors (Illumina cat. no. 20034197)
-
TD buffer (see recipe)
-
Illumina Nextera barcoded primers
-
Q5® High-Fidelity 2× Master Mix (NEB cat. no. M0492S)
-
Monarch PCR & DNA Cleanup Kit (NEB cat. no. T1030S)
-
Thermocycler
-
Eppendorf tubes (1.5 ml centrifuge tubes)
-
PCR tube
-
Vortex shaker
-
Refrigerated (4°C) microcentrifuge (16,000 × g)
-
Heat block
Prepare sequencing library
1.Add to the sample:
- 20 µl TD buffer
- 2 µl Tn5
2.Incubate at 30°C for 30 min.
3.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 10 µl elution buffer.
4.Add to the reaction:
- 12.5 µl 2× Q5 High-Fidelity 2× Master Mix
- 1.25 µl 10 µM forward Nextera barcoded primer
- 1.25 µl 10 µM reverse Nextera barcoded primer
5.Incubate at:
-
72°C for 5 min
-
98°C for 2 min
-
98°C for 10 s
-
63°C for 30 s
-
72°C for 90 s
-
repeat steps c-e eight times.
The initial extension step at the start of the protocol is critical for extending off of the DNA fragment inserted by Tn5.
6.Purify using Monarch PCR & DNA Cleanup Kit with a 5:1 mixture of DNA binding buffer. Elute using 10 µl elution buffer.
REAGENTS AND SOLUTIONS
Oligonucleotides
- Forward splint primer
- ACTCTGCGTTGATACCACTGCTTTGAGGCTGATGAGTTCCATANNNNNTATATNNNNNATCACTACTTAGTTTTTTGATAGCTTCAAGCCAGAGTTGTCTTTTTCTCTTTGCTGGCAGTAAAAG
- Reverse splint primer
- ACTCTGCGTTGATACCACTGCTTAAAGGGATATTTTCGATCGCNNNNNATATANNNNNTTAGTGCATTTGATCCTTTTACTCCTCCTAAAGAACAACCTGACCCAGCAAAAGGTACACAATACTTTTACTGCCAGCAAAGAG
- TSO-UMI
- AAGCAGTGGTATCAACGCAGAGTACNNNNNNNNNNNNATrGrG+G
- Poly dT primer
- AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTVN
- ISPCR_oligo
- AAGCAGTGGTATCAACGCAGAGT
TD Buffer
- 0.38 g/L KCl
- 0.1 g/L Na2HPO4·7H2O
- 8.0 g/L NaCl
- 3.0 g/L Tris
COMMENTARY
Background Information
This protocol was adapted from R2C2 (Volden et al., 2018), which was originally designed as a technique to help improve error correction in Oxford Nanopore Technologies long-read sequencing technology by sequencing full-length cDNA concatemers to obtain a consensus sequence. It was realized that if these long concatemers were broken up into individual short read sequences they could be used to generate TSS data on any Illumina sequencing platform (Fig. 1) (Cole, Byrne, Adams, Volden, & Vollmers, 2020).
Several small modifications were made to the technique to optimize cDNA generation and rolling circle output, with less concern for retaining the integrity of the concatemer as the sequence would be fragmented regardless. The UMI sequence was originally present in the splint but was moved into the TSO to allow for a greater chance of detecting the UMI and the TSS sequence in a read fragment. We tested the technique using fresh plant leaf and root tissue.
There are many other available techniques available that make use of TSRT to attach adaptors to the 5′ end of an RNA transcript. STRIPE-seq (Policastro et al., 2020) uses a randomer (primer composed of random nucleotides) instead of a poly-dT primer, size selecting for transcripts which can be sequenced. The use of a randomer allows the technique to measure the TSS of non-polyadenylated RNA, but comes at the cost of reaction efficiency and necessitates the removal of rRNA. Single-cell tagged reverse transcription (STRT) (Islam et al., 2012) can be used on bulk libraries as well as single cell libraries (Adiconis et al., 2018), which retains the ultra-low input and ease of execution advantage of Smar2C2. For use with Illumina sequencing platforms this uses a restriction enzyme to randomly digest the full-length cDNA before ligating an adaptor to the 3′ end of the transcript (Islam et al., 2012). Although this is very effective in accurately identifying TSSs, it has been reported that when used on bulk data there are limits on the total amount of TSSs that can be obtained from a single sample that does not improve with increased input (Adiconis et al., 2018).
Critical Parameters
Clean extraction of full-length RNA is key to the success of the protocol, and any significant degradation will generate poor data. For this reason, it is important to use dedicated reagents, clean workspaces, and proper lab technique when handling RNA.
Do not input large amounts of RNA when generating cDNA from RNA using the TSRT. We recommend limiting your input to 400 ng of total RNA, and lower amounts will usually produce comparable results.
Proper linker formation is also critical to the circularization reaction. If yields are low when generating the linker, as outlined in the protocol, then it is safer to attempt linker formation a second time. Fortunately, the current protocol generates enough linkers for many library preparations, and they are very stable when stored at −20°C.
Troubleshooting
Most problems encountered in this protocol will be the result of poor RNA extractions or expired reagents. Because cDNA levels are low before circularization the first indicator of a failed reaction is little to no DNA generation after rolling circle amplification. Check the expiration on all enzymes and reagents, and ensure that all steps involving RNA are using RNase-free solutions.
When sequencing data is aligned, reads that cluster around poly-A tracts in the genome may indicate high genomic DNA contamination. If your protocol for RNA extraction does not include a genomic DNA digestion (as used in step 9 of Basic Protocol 2), consider including this in your protocol. Ensure that your DNase I is not expired and consider increasing the digestion time to ensure less contamination. While DNase digestions are not always included in RNA extraction protocols [such as with acid phenol:chloroform extraction and lithium chloride (LiCl) precipitation] (Green & Sambrook, 2019), they are highly advisable here.
Understanding Results
The number of TSSs that can be measured using this technique is largely dependent on the sequencing depth used. We were able to measure 70 million unique TSS reads from a single input (Murray, Mendieta, Vollmers, & Schmitz, 2022), and we have not reached or been able to determine a sequencing saturation point. If extremely low levels of RNA input are used then sequencing saturation may be reached earlier, but currently we do not know what these levels are.
While there are many methods that can be used to validate the results, the easiest and most readily available is likely genome annotations. Although gene annotations often do not precisely reflect true TSSs, we still expect empirically measured TSSs that are located proximally to gene annotations to cluster around the annotated TSS (Fig. 2). The degree to which this occurs may be somewhat dependent on the quality of the annotation that is available for the species being studied. However, correlations between annotations derived from RNA-seq (or comparable methods) should still be expected.

Initial processing of data (Fig. 3) usually includes identifying TSS reads, discovering individual TSS positions in the genome, and then clustering TSSs that are located in close proximity into transcriptional start regions (TSRs). We recommend using the Cutadapt software (Martin, 2011) to identify reads containing a TSS, extract them from the main sequence file, quality trim, and remove the TSS adaptor. Once the reads containing a TSS have been extracted and trimmed we opted to align them to a reference genome using STAR (Dobin et al. 2013).

This resulting BAM file can then be used to identify individual TSSs and TSRs. There are many available programs to do this (Adiconis et al., 2018; Haberle, Forrest, Hayashizaki, Carninci, & Lenhard, 2015; Thodberg, Thieffry, Vitting-Seerup, Andersson, & Sandelin, 2019). We opted to use TSRexplorer for its broad range of features and ease of execution (Policastro, Mcdonald, Brendel, & Zentner, 2021). TSRexplorer rapidly generates both single nucleotide TSS and TSR bed files using BAM file inputs generated by mapping TSS extracted from raw sequencing files.
It is worth noting that raw BAM files can be directly used to identify TSSs, as the strand and first nucleotide position are easily be extracted from individual reads. Although this lacks the convenience of software packages and makes more complicated downstream analysis more difficult, it does allow for the experimenter to directly interact with the data set if desired.
When initially interpreting the data, while the vast majority of reads should be present surrounding TSSs, a small number of reads are expected within gene bodies and some intergenic space and are usually treated as experimental noise. This is present in most TSS data sets, and has been traditionally been removed by setting a threshold of reads that any single position needed to be considered as a TSS. This threshold has previously been defined by looking at the percent of reads at each threshold value that occur at an expected location versus unexpected intergenic locations (Policastro et al., 2021).
This specific methodology can become a problem due to the high numbers of unique TSSs generated by smar2C2. Highly expressed genes can create intragenic TSSs that appear as true TSSs due to the sheer number of reads present, but only account for a small number of reads present in the gene. Setting a threshold which is high enough to remove these reads can also remove many TSS which appear to be valid from genes with lower base transcriptional levels. Our solution to this problem was to retain a threshold from which to consider individual loci, but to only consider TSRs that contain at least 10% of the total reads present at any gene annotation. This allows for a sliding scale that removes noise from genes based on their total transcription levels rather than trying to find a threshold that best suits an entire genome.
In addition to defining TSSs and TSRs, the single nucleotide position that is most prominent within a TSS can be defined, called a primary transcription start site (pTSS). This piece of information is useful for examining the positional enrichment of sequence motifs relative to the TSS, as critical promoter elements such as the TATA box are often positioned precisely relative to the TSS (Haberle & Stark, 2018). Novel motif discovery, known motif enrichment, and positional motif enrichment can be discovered using the MEME-suite software (Bailey, Johnson, Grant, & Noble, 2015), and having a single position from which to search from greatly aids in discovery.
Time Considerations
Initial RNA extraction and reverse transcription will require a dedicated block of time to complete, as it is not recommended to extract RNA and then pause before converting to cDNA. However, once cDNA has been generated it is easy and feasible to pause between most major steps of the protocol. Using a thermocycler, both the rolling circle amplification or the exonuclease digestion can be programmed and allowed to run overnight. If you find that you need an extended rolling circle amplification to achieve significant DNA output then we suggest allowing this step to run overnight for up to 16 hr.
The protocol can be completed over the course of two days, with many lengthy incubations allowing for relatively low amounts of hands on time at the bench. We have never attempted to process more then 16 samples at a single time, but can see no reason that large sample batches would not be viable and easily processed. Depending on the difficulty and protocol used for RNA extraction using large sample sizes may result into freshly extracted RNA sitting on ice for an extended period of time, which has the potential to degrade and reduce experimental output quality.
Acknowledgments
The authors would like to acknowledge the Georgia Genomics & Bioinformatics Core as well as the Georgia Advanced Computing Resource Center for research support. This research was supported by grants from the National Science Foundation (IOS-1546867 and IOS-1856627) to RJS.
Author Contributions
Andrew Murray : Conceptualization, data curation, formal analysis, investigation, methodology, software, validation, visualization, writing-original draft, writing-review and editing, Christopher Vollmers : Conceptualization, methodology, writing-review and editing, Robert J. Schmitz : Funding acquisition, methodology, project administration, resources, supervision, writing-review and editing.
Conflict of Interest
The authors declare no conflict of interest.
Open Research
Data Availability Statement
All raw and processed sequencing data generated by this technique and in previous applications of this technique have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE197144. All data used in this manuscript is available and can be accessed under this accession.
Literature Cited
- Adiconis, X., Haber, A. L., Simmons, S. K., Levy Moonshine, A., Ji, Z., Busby, M. A., … Levin, J. Z. (2018). Comprehensive comparative analysis of 5′-end RNA-sequencing methods. Nature Methods , 15(7), 505–511. doi: 10.1038/s41592-018-0014-2
- Bailey, T. L., Johnson, J., Grant, C. E., & Noble, W. S. (2015). The MEME Suite. Nucleic Acids Research , 43(W1), W39–W49. doi: 10.1093/nar/gkv416
- Batut, P., & Gingeras, T. R. (2013). RAMPAGE: Promoter activity profiling by paired-end sequencing of 5’-complete cDNAs. Current Protocols in Molecular Biology , 104, 25B.11.1–25B.11.16. doi: 10.1002/0471142727.mb25b11s104
- Cole, C., Byrne, A., Adams, M., Volden, R., & Vollmers, C. (2020). Complete characterization of the human immune cell transcriptome using accurate full-length cDNA sequencing. Genome Research , 30(4), 589–601. doi: 10.1101/gr.257188.119
- Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., … Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England) , 29(1), 15–21. doi: 10.1093/bioinformatics/bts635
- Green, M. R., & Sambrook, J. (2019). Removing DNA contamination from RNA samples by treatment with RNase-free DNase I. Cold Spring Harbor Protocols , 2019(10), pdb–prot101725. doi: 10.1101/pdb.prot101725
- Islam, S., Kjällquist, U., Moliner, A., Zajac, P., Fan, J. B., Lönnerberg, P., & Linnarsson, S. (2012). Highly multiplexed and strand-specific single-cell RNA 5′ end sequencing. Nature Protocols , 7(5), 813–828. doi: 10.1038/nprot.2012.022
- Haberle, V., Forrest, A. R., Hayashizaki, Y., Carninci, P., & Lenhard, B. (2015). CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Research , 43(8), e51. doi: 10.1093/nar/gkv054
- Haberle, V., & Stark, A. (2018). Eukaryotic core promoters and the functional basis of transcription initiation. Nature Reviews. Molecular Cell Biology , 19(10), 621–637. doi: 10.1038/s41580-018-0028-8
- Kulpa, D., Topping, R., & Telesnitsky, A. (1997). Determination of the site of first strand transfer during Moloney murine leukemia virus reverse transcription and identification of strand transfer-associated reverse transcriptase errors. EMBO Journal , 16(4), 856–865. doi: 10.1093/emboj/16.4.856
- Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal , 17(1), 10–12. doi: 10.14806/ej.17.1.200
- Murray, A., Mendieta, J. P., Vollmers, C., & Schmitz, R. J. (2022). Simple and accurate transcriptional start site identification using Smar2C2 and examination of conserved promoter features. The Plant Journal , 112(2), 583–596. doi: 10.1111/tpj.15957
- Policastro, R. A., Mcdonald, D. J., Brendel, V. P., & Zentner, G. E. (2021). Flexible analysis of TSS mapping data and detection of TSS shifts with TSRexploreR. NAR Genomics and Bioinformatics , 3(2), lqab051. doi: 10.1093/nargab/lqab051
- Policastro, R. A., Raborn, R. T., Brende, V. P., & Zentner, G. E. (2020). Simple and efficient profiling of transcription initiation and transcript levels with STRIPE-seq. Genome Research , 30(6), 910–923. doi: 10.1101/gr.261545.120
- Tang, D. T., Plessy, C., Salimullah, M., Suzuki, A. M., Calligaris, R., Gustincich, S., & Carninci, P. (2013). Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching. Nucleic Acids Research , 41(3), e44. doi: 10.1093/nar/gks1128
- Thodberg, M., Thieffry, A., Vitting-Seerup, K., Andersson, R., & Sandelin, A. (2019). CAGEfightR: analysis of 5'-end data using R/Bioconductor. BMC Bioinformatics , 20(1), 487. doi: 10.1186/s12859-019-3029-5
- Volden, R., Palmer, T., Byrne, A., Cole, C., Schmitz, R. J., Green, R. E., & Vollmers, C. (2018). Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proceedings of the National Academy of Sciences of the United States of America , 115(39), 9726–9731. doi: 10.1073/pnas.1806447115
- Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R., & Siebert, P. D. (2001). Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. BioTechniques , 30(4), 892–897. doi: 10.2144/01304pf02