Transcriptome annotation

Thiago Mafra Batista, Rafael Rodrigues Ferrari

Published: 2024-06-13 DOI: 10.17504/protocols.io.5qpvok92bl4o/v1

Abstract

This protocol provides detailed, step-by-step instructions for students and researchers to annotate transcriptomes. In this tutorial, we will follow the Trinity -> TransDecoder -> Trinotate pipeline, using the SwissProt and Pfam databases for functional annotation of protein-coding transcripts.

Steps

FINDIG CODING REGIONS WITHIN TRANSCRIPTS

1.

TransDecoder (https://github.com/TransDecoder/TransDecoder/wiki)****)

Extracting the long open reading frames (ORFs)

Prepare a .pbs file to run the analysis remotely on Sagarana

/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.LongOrfs -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta
```***Including homology searches as ORF retention criteria***



**BlastP search**



*Prepare a .pbs file to run the analysis remotely on Sagarana*

blastp -query /home/fafinha/collaris/TransDecoder_run/2_homology_searches/blastp/Trinity.fasta.transdecoder_dir/longest_orfs.pep
-db /home/fafinha/collaris/TransDecoder_run/uniprot_sprot.fasta -max_target_seqs 1 -outfmt 6 -evalue 1e-5 -num_threads 64
-out /home/fafinha/collaris/TransDecoder_run/2_homology_searches/blastp/blastp_output.fmt6




*Download the Pfam database (Pfam-A.hmm)*



$wget [ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz](ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz)





*Decompress the file



$gzip -d Pfam-A.hmm.gz



*Index the database*

$/programs/hmmer-3.3.2/bin/hmmpress Pfam-A.hmm

/home/fafinha/anaconda3/bin/hmmscan --cpu 64 --domtblout /home/fafinha/collaris/TransDecoder_run/2_homology_searches/pfam/pfam.domtblout
/home/fafinha/bin/pfam/Pfam-A.hmm /home/fafinha/collaris/TransDecoder_run/2_homology_searches/blastp/Trinity.fasta.transdecoder_dir/longest_orfs.pep




\#Run the 'TransDecoder.Predict' script in the same directory where the 'Trinity.fasta.transdecoder_dir' folder is located



**Without homology**

$/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta




*BlastP*

$/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta
--retain_blastp_hits /home/fafinha/collaris/TransDecoder_run/run2/homology/blast/blastp_output.fmt6

$/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta
--retain_pfam_hits /home/fafinha/collaris/TransDecoder_run/run2/homology/pfam/pfam.domtblout

$/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta
--retain_blastp_hits /home/fafinha/collaris/TransDecoder_run/run2/homology/blast/blastp_output.fmt6
--retain_pfam_hits /home/fafinha/collaris/TransDecoder_run/run2/homology/pfam/pfam.domtblout

FUNCTIONAL ANNOTATION

2.

#Perform 'FINDIG CODING REGIONS WITHIN TRANSCRIPTS' first

Trinotate (https://github.com/Trinotate/Trinotate.github.io/blob/master/index.asciidoc)) (on kiko)

Generate databases

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
```****Blastn****



***Prepare a .pbs file to run the analysis remotely on Sagarana***

/programs/ncbi-blast-2.10.1+/bin/blastp -query /home/fafinha/collaris/Trinotate_run/1st_step/Trinity_reduced.fasta.transdecoder.pep -db \ /home/fafinha/collaris/Trinotate_run/1st_step/uniprot_sprot.fasta -num_threads 64 -outfmt 6 -evalue 1e-6
-out /home/fafinha/collaris/Trinotate_run/2nd_step/blastp.tab

$cat blastp.tab | sort -k1,1 -k12,12nr -k11,11n | sort -k1,1 -u > blastp_besthits.tab




***Prepare a .pbs file to run the analysis remotely on Sagarana***

/programs/ncbi-blast-2.10.1+/bin/blastx -query /home/fafinha/collaris/Trinotate_run/1st_step/Trinity_reduced.fasta \ -db /home/fafinha/collaris/Trinotate_run/1st_step/uniprot_sprot.fasta -num_threads 64 -outfmt 6 -evalue 1e-6
-out /home/fafinha/collaris/Trinotate_run/2nd_step/blastx.tab

$cat blastx.tab | sort -k1,1 -k12,12nr -k11,11n | sort -k1,1 -u > blastx_besthits.tab

$/home/thiagomafra/instaladores/tmhmm-2.0c/bin/tmhmm --short < /home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta.transdecoder.pep \

/home/thiagomafra/collaris/trinotate_run/fafinha/run2/tmhmm.out

$/home/thiagomafra/instaladores/hmmer-3.1b2-linux-intel-x86_64/binaries/hmmscan --cpu 64
--domtblout /home/thiagomafra/collaris/trinotate_run/fafinha/run2/TrinotatePFAM.out /home/thiagomafra/collaris/trinotate_run/fafinha/run2/Pfam-A.hmm
/home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta.transdecoder.pep > /home/thiagomafra/collaris/trinotate_run/fafinha/run2/pfam.log

$/home/thiagomafra/instaladores/signalp-4.1/signalp -f short -n /home/thiagomafra/collaris/trinotate_run/fafinha/run2/signalp.out
/home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta.transdecoder.pep

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/util/rnammer_support/RnammerTranscriptome.pl --transcriptome /home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta --path_to_rnammer /home/thiagomafra/instaladores/rnammer/rnammer

$/home/thiagomafra/instaladores/trinityrnaseq-v2.10.0/util/support_scripts/get_Trinity_gene_to_trans_map.pl
/home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta > Trinity.fasta.gene_trans_map




***Loading transcripts and coding regions***

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite init --gene_trans_map ./Trinity.fasta.gene_trans_map
--transcript_fasta /home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta
--transdecoder_pep /home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta.transdecoder.pep

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_swissprot_blastp blastp_besthits.tab

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_swissprot_blastx blastx_besthits.tab

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_pfam TrinotatePFAM.out

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_tmhmm tmhmm.out

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_signalp signalp.out

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_rnammer Trinity_reduced.fasta.rnammer.gff

$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite report > trinotate_annotation_report.xls

推荐阅读

Nature Protocols
Protocols IO
Current Protocols
扫码咨询