Clustering of differentially expressed genes

Ahmad Husaini AHS Suhaimi

Published: 2023-07-06 DOI: 10.17504/protocols.io.rm7vzx82rgx1/v1

Abstract

This differentially expressed genes clustering pipeline utilizes coseq v3.17 package (Rau & Maugis-Rabusseau, 2018) in R.

Steps

Clustering of differentially expressed genes (DEG) using Coseq package in R

1.

Load the package (coseq).

library(coseq)
library(matrixStats)
2.

Run Coseq on transformed and normalized counts.

Example:

Performing clustering on bud data with expected clusters, K=5-16.

Clustering process is repeated for 10x.

coseq_bud_logclr_1 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_2 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_3 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_4 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_5 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_6 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_7 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_8 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_9 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")
coseq_bud_logclr_10 = coseq(tcounts_logclr_exp_bud_ORF_scTMM[,1:15], K=5:16, normFactor = "none", transformation = "none")

3.

Manually inspect the results and decide on the average number of clusters

Choose one clustering result to proceed with the subsequent steps

summary(coseq_bud_logclr_1)
summary(coseq_bud_logclr_2)
summary(coseq_bud_logclr_3)
summary(coseq_bud_logclr_4)
summary(coseq_bud_logclr_5)
summary(coseq_bud_logclr_6)
summary(coseq_bud_logclr_7)
summary(coseq_bud_logclr_8)
summary(coseq_bud_logclr_9)
summary(coseq_bud_logclr_10)

Assigning clusters to transcripts

4.

Retrieve and tabulate the clustering information based on the chose clustering from the previous step.

Example:

coseq_bud_logclr_1 is chosen as the best clustering

results_coseq_bud_logclr: the new table/vector.

results_coseq_bud_logclr = clusters(coseq_bud_logclr_1)

5.

Convert the vector into a data frame.

results_coseq_bud_logclr = data.frame(results_coseq_bud_logclr)

6.

Create a column containing the assigned cluster number for each transcript in the read count data frame.

Example:

the new column: bud_logclr

the data frame with read counts: tcounts_logclr_exp_bud_ORF_scTMM

tcounts_logclr_exp_bud_ORF_scTMM$bud_logclr = results_coseq_bud_logclr_1$results_coseq_bud_logclr

推荐阅读

Nature Protocols
Protocols IO
Current Protocols
扫码咨询