Quality control analysis for 10X snRNA-seq
Dinh H Diep, Daniel Jacobsen
Abstract
Here we describe a computational protocol for performing quality control analysis on shallow sequencing data obtained from 10X snRNA-seq experiments. The workflow starts with raw MiSeq run folders and uses cellranger to generate count matrices. The raw count matrices are analyzed and sequencing saturation plots are generated. The saturation plots are then compared against plots from a reference set of libraries with varying qualities (bad, fair, good, great), thus allowing for the determination of sequencing requirements as well as an assessment of the overall quality of each 10X snRNA experiment.
Attachments
Steps
Install cellranger using instructions from https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/installation
Download the tar.gz file from this protocol.
Extract the tar.gz file from this protocol.
#Extract file (linux)
tar -xzf <FILENAME>
Install anaconda or miniconda Python distributions following given instructions.
Get anaconda from here: https://www.anaconda.com/products/distribution , OR
get miniconda from here: https://docs.conda.io/en/latest/miniconda.html
Install preseq using given instructions from http://smithlabresearch.org/software/preseq/.
Preseq requires the GSL libraries. Install GSL using the instructions from https://www.gnu.org/software/gsl/.
Create a symbolic link so that preseq can find the required gsl library.
#Create a symbolic link to GSL library (linux)
sudo ln -s /usr/local/lib/libgsl.so /usr/lib/libgsl.so.0
Install samtools using given instructions from http://www.htslib.org/download/.
Use conda to install bcl2fastq with the following terminal command:
#install bcl2fastq (linux)
conda install -c dranew bcl2fastq
Use conda to install required python packages with the following terminal command:
#Install python packages for 10X_snRNA_preseq_analysis package (linux)
conda install -c conda-forge numpy seaborn matplotlib pandas
Run cellranger mkfastq to generate fastq files. Make sure that the following placeholders are set to the correct paths and desired names.
<FASTQ_OUT> is the name of the output folder
#Generate fastq files from raw run folders (linux)
cellranger mkfastq --id=<FASTQ_OUT> --run=<RUN> --sample-sheet=<CSV>
Run cellranger count. Make sure that the following placeholders are set to the correct paths and desired names.
is the path to the cellranger reference data folder
#Generate the count matrix (linux)
cellranger count --id <ID> --fastqs <FASTQ_OUT> --sample <SAMPLE> --transcriptome <REF> --include-introns --expect-cells <NUM>
Run the preseq script in the folder downloaded from this protocol. Make sure that the following placeholders are sset to the correct paths and names.
<PATH_TO_FOLDER> is the path to the folder that was extracted from the tar.gz file.
#Generate preseq plots from 10X snRNA output folder (linux)
<PATH_TO_FOLDER>/scripts/loop.preseq.r.sh <ID>
View outputs.