Clustering of differentially expressed genes
Ahmad Husaini AHS Suhaimi
Abstract
This differentially expressed genes clustering pipeline utilizes coseq v3.17 package (Rau & Maugis-Rabusseau, 2018) in R.
Steps
Clustering of differentially expressed genes (DEG) using Coseq package in R
Load the package (coseq).
library(coseq)
library(matrixStats)
Run Coseq on transformed and normalized counts.
Example:
Performing clustering on bud data with expected clusters, K=5-16.
Clustering process is repeated for 10x.
coseq_bud_logclr_1 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_2 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_3 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_4 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_5 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_6 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_7 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_8 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_9 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_10 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
Manually inspect the results and decide on the average number of clusters
Choose one clustering result to proceed with the subsequent steps
summary(coseq_bud_logclr_1)
summary(coseq_bud_logclr_2)
summary(coseq_bud_logclr_3)
summary(coseq_bud_logclr_4)
summary(coseq_bud_logclr_5)
summary(coseq_bud_logclr_6)
summary(coseq_bud_logclr_7)
summary(coseq_bud_logclr_8)
summary(coseq_bud_logclr_9)
summary(coseq_bud_logclr_10)
Assigning clusters to transcripts
Retrieve and tabulate the clustering information based on the chose clustering from the previous step.
Example:
coseq_bud_logclr_1 is chosen as the best clustering
results_coseq_bud_logclr: the new table/vector.
results_coseq_bud_logclr = clusters(coseq_bud_logclr_1)
Convert the vector into a data frame.
results_coseq_bud_logclr = data.frame(results_coseq_bud_logclr)
Create a column containing the assigned cluster number for each transcript in the read count data frame.
Example:
the new column: bud_logclr
the data frame with read counts: tcounts_logclr_exp_bud_ORF_scTMM
tcounts_logclr_exp_bud_ORF_scTMM$bud_logclr = results_coseq_bud_logclr_1$results_coseq_bud_logclr