Transcriptome annotation
Thiago Mafra Batista, Rafael Rodrigues Ferrari
Abstract
This protocol provides detailed, step-by-step instructions for students and researchers to annotate transcriptomes. In this tutorial, we will follow the Trinity -> TransDecoder -> Trinotate pipeline, using the SwissProt and Pfam databases for functional annotation of protein-coding transcripts.
Steps
FINDIG CODING REGIONS WITHIN TRANSCRIPTS
TransDecoder (https://github.com/TransDecoder/TransDecoder/wiki)****)
Extracting the long open reading frames (ORFs)
Prepare a .pbs file to run the analysis remotely on Sagarana
/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.LongOrfs -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta
```***Including homology searches as ORF retention criteria***
**BlastP search**
*Prepare a .pbs file to run the analysis remotely on Sagarana*
blastp -query /home/fafinha/collaris/TransDecoder_run/2_homology_searches/blastp/Trinity.fasta.transdecoder_dir/longest_orfs.pep
-db /home/fafinha/collaris/TransDecoder_run/uniprot_sprot.fasta -max_target_seqs 1 -outfmt 6 -evalue 1e-5 -num_threads 64
-out /home/fafinha/collaris/TransDecoder_run/2_homology_searches/blastp/blastp_output.fmt6
*Download the Pfam database (Pfam-A.hmm)*
$wget [ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz](ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz)
*Decompress the file
$gzip -d Pfam-A.hmm.gz
*Index the database*
$/programs/hmmer-3.3.2/bin/hmmpress Pfam-A.hmm
/home/fafinha/anaconda3/bin/hmmscan --cpu 64 --domtblout /home/fafinha/collaris/TransDecoder_run/2_homology_searches/pfam/pfam.domtblout
/home/fafinha/bin/pfam/Pfam-A.hmm /home/fafinha/collaris/TransDecoder_run/2_homology_searches/blastp/Trinity.fasta.transdecoder_dir/longest_orfs.pep
\#Run the 'TransDecoder.Predict' script in the same directory where the 'Trinity.fasta.transdecoder_dir' folder is located
**Without homology**
$/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta
*BlastP*
$/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta
--retain_blastp_hits /home/fafinha/collaris/TransDecoder_run/run2/homology/blast/blastp_output.fmt6
$/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta
--retain_pfam_hits /home/fafinha/collaris/TransDecoder_run/run2/homology/pfam/pfam.domtblout
$/home/fafinha/bin/TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t /home/fafinha/collaris/Trinity_run/assembly/Trinity.fasta
--retain_blastp_hits /home/fafinha/collaris/TransDecoder_run/run2/homology/blast/blastp_output.fmt6
--retain_pfam_hits /home/fafinha/collaris/TransDecoder_run/run2/homology/pfam/pfam.domtblout
FUNCTIONAL ANNOTATION
#Perform 'FINDIG CODING REGIONS WITHIN TRANSCRIPTS' first
Trinotate (https://github.com/Trinotate/Trinotate.github.io/blob/master/index.asciidoc)) (on kiko)
Generate databases
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate
```****Blastn****
***Prepare a .pbs file to run the analysis remotely on Sagarana***
/programs/ncbi-blast-2.10.1+/bin/blastp -query /home/fafinha/collaris/Trinotate_run/1st_step/Trinity_reduced.fasta.transdecoder.pep -db \ /home/fafinha/collaris/Trinotate_run/1st_step/uniprot_sprot.fasta -num_threads 64 -outfmt 6 -evalue 1e-6
-out /home/fafinha/collaris/Trinotate_run/2nd_step/blastp.tab
$cat blastp.tab | sort -k1,1 -k12,12nr -k11,11n | sort -k1,1 -u > blastp_besthits.tab
***Prepare a .pbs file to run the analysis remotely on Sagarana***
/programs/ncbi-blast-2.10.1+/bin/blastx -query /home/fafinha/collaris/Trinotate_run/1st_step/Trinity_reduced.fasta \ -db /home/fafinha/collaris/Trinotate_run/1st_step/uniprot_sprot.fasta -num_threads 64 -outfmt 6 -evalue 1e-6
-out /home/fafinha/collaris/Trinotate_run/2nd_step/blastx.tab
$cat blastx.tab | sort -k1,1 -k12,12nr -k11,11n | sort -k1,1 -u > blastx_besthits.tab
$/home/thiagomafra/instaladores/tmhmm-2.0c/bin/tmhmm --short < /home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta.transdecoder.pep \
/home/thiagomafra/collaris/trinotate_run/fafinha/run2/tmhmm.out
$/home/thiagomafra/instaladores/hmmer-3.1b2-linux-intel-x86_64/binaries/hmmscan --cpu 64
--domtblout /home/thiagomafra/collaris/trinotate_run/fafinha/run2/TrinotatePFAM.out /home/thiagomafra/collaris/trinotate_run/fafinha/run2/Pfam-A.hmm
/home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta.transdecoder.pep > /home/thiagomafra/collaris/trinotate_run/fafinha/run2/pfam.log
$/home/thiagomafra/instaladores/signalp-4.1/signalp -f short -n /home/thiagomafra/collaris/trinotate_run/fafinha/run2/signalp.out
/home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta.transdecoder.pep
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/util/rnammer_support/RnammerTranscriptome.pl --transcriptome /home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta --path_to_rnammer /home/thiagomafra/instaladores/rnammer/rnammer
$/home/thiagomafra/instaladores/trinityrnaseq-v2.10.0/util/support_scripts/get_Trinity_gene_to_trans_map.pl
/home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta > Trinity.fasta.gene_trans_map
***Loading transcripts and coding regions***
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite init --gene_trans_map ./Trinity.fasta.gene_trans_map
--transcript_fasta /home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta
--transdecoder_pep /home/thiagomafra/collaris/trinotate_run/Trinity_reduced.fasta.transdecoder.pep
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_swissprot_blastp blastp_besthits.tab
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_swissprot_blastx blastx_besthits.tab
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_pfam TrinotatePFAM.out
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_tmhmm tmhmm.out
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_signalp signalp.out
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite LOAD_rnammer Trinity_reduced.fasta.rnammer.gff
$/home/thiagomafra/instaladores/Trinotate-Trinotate-v3.2.2/Trinotate Trinotate.sqlite report > trinotate_annotation_report.xls