Identification of differentially expressed long noncoding RNAs and pathways in liver tissues from rats with hepatic fibrosis
Xiong Xiao, Yan Wang, Xiaozhong Wang
function study
hepatic fibrosis
long noncoding RNAs
rat liver issues
qRT-PCR
quantitative reverse transcription polymerase chain reaction
Disclaimer
DISCLAIMER – FOR INFORMATIONAL PURPOSES ONLY; USE AT YOUR OWN RISK
The protocol content here is for informational purposes only and does not constitute legal, medical, clinical, or safety advice, or otherwise; content added to protocols.io is not peer reviewed and may not have undergone a formal approval of any kind. Information presented in this protocol should not substitute for independent professional judgment, advice, diagnosis, or treatment. Any action you take or refrain from taking using or relying upon the information presented here is strictly at your own risk. You agree that neither the Company nor any of the authors, contributors, administrators, or anyone else associated with protocols.io, can be held responsible for your use of the information contained in or linked to this protocol or any of our Sites/Apps and Services.
Abstract
To identify long non-coding RNAs (lncRNAs) and their potential roles in hepatic fibrosis in rat liver issues induced by CCl4, lncRNAs and genes were analyzed in fibrotic rat liver tissues by quantitative reverse transcription polymerase chain reaction (qRT-PCR).
Steps
RNA-Seq Raw Data Clean and Alignment
Raw reads containing more than 2-N bases were first discarded.
Then adaptors and low-quality bases were trimmed from raw sequencing reads using FASTX-Toolkit (Version 0.0.13). The short reads less than 16nt were also dropped.
After that, clean reads were aligned to the GRch38 genome by tophat2 (Kim, Pertea et al. 2013) allowing 4 mismatches. Uniquely mapped reads were used for gene reads number counting and FPKM calculation (fragments per kilobase of transcript per million fragments mapped) (Trapnell, Williams et al. 2010).
Differentially Expressed Genes (DEG) analysis
The R Bioconductor package edgeR (Robinson, McCarthy et al. 2010) was utilized to screen out the differentially expressed genes (DEGs). A false discovery rate 2 or < 0.5 were set as the cut-off criteria for identifying DEGs.
Functional enrichment analysis
To sort out functional categories of DEGs, Gene Ontology (GO) terms and KEGG pathways were identified using KOBAS 2.0 server (Xie, Mao et al. 2011). Hypergeometric test and Benjamini-Hochberg FDR controlling procedure were used to define the enrichment of each term.
LncRNA Prediction
LncRNA prediction pipeline was followed the method of one previous study (Cabili et al. 2011). Detail prediction pipeline and the filtering thresholds were described as follows:
(1) First, based on the alignment result of RNA-Seq, transcripts were assembled by Cufflinks V2.2 (Trapnell et al. 2012) using default parameters. After the initial assembly, transcripts with FPKM no less than 0.3 were reserved for the following filtering.
(2) Cuffcompare that was embedded in Cufflinks was used to compare the transcripts with known genes of reference genome, and novel transcripts including intergenic, intronic and antisense region were reserved as the candidate lncRNAs. Transcripts adjacent to known coding genes within 1000 bp were regarded as UTRs and also discarded.
(3) To filter the coding potential transcripts, coding potential score (CPS) was evaluated by coding potential calculator (CPC) software (Kong et al. 2007). CPC is a support vector machine-based classifier to assess the protein-coding potential of transcripts based on six biologically meaningful sequence features. Transcripts with CPS below zero were regarded as non-coding RNAs.
(4) Transcripts satisfying the above conditions, with multiple exons no smaller than 200 bases and single exon no smaller than 1000 bases were reserved as lncRNAs.
(5) Finally, we combined known and predicted lncRNAs from all samples together to obtain the final lncRNA set, then we re-calculated the expression level of each lncRNA genes. Antisense reads of lncRNAs were discarded.
Differentially Expressed lncRNAs
After getting the Expression level of all lncRNAs in all samples, differentially expressed lncRNAs were analyzed by using edgeR (Robinson et al. 2010), one of R packages. For each lncRNA, the p-value was obtained based on the model of negative binomial distribution. The fold changes were also estimated within this package. 0.05 q-value and 2-fold change were set as the threshold to define Differentially Expressed lncRNAs.
Cis acting
Based on the expression of each mRNA and DElncRNA, correlation coefficient and P-value are obtained for each mRNA-DELncRNA pair.
Then we filtered the result by a given threshold, with absolute correlation coefficient no less than 0.6 and P-value less than 0.05. Besides the positive correlation pairs, negative pairs with correlation coefficient less than 0 were also included. The filtered gene pairs format the expression network. For each differentially expressed lncRNA, we obtain expressed genes from its upstream and downstream region within 10000 bases, and these genes overlap with co-expressed genes to obtain lncRNA targets.