The annotation pipeline for the genome of a snake

Abstract

Here are detailed methods use for the annotation of various snake genomes.

Before start

Steps

Repeat annotation_de novo

1) Run RepeatModeler to build a de novo library based on the input assembled genome sequence.

2) Using the library constructed in step 5 as the database, run RepeatMasker (v. 3.3.0) to find and then classify the repetitive sequences.

Note

2) using parameters "-nolow -no_is -norna -parallel 1"

Repeat annotation_database

Run TRF (v. 4.09), RepeatMasker and RepeatProteinMask (v. 3.3.0) to identify repeats in the genome at DNA and protein level, respectively, by aligning sequences against the Repbase library (v. 17.01).

Note

using parameters "-noLowSimple -pvalue 0.0001" when running RepeatProteinMask

Gene prediction_preparation

Mask these repetitive regions obtained above (step 4-6) with 'N's.

Note

Before gene prediction, mask the TE's (transposable elements) in the genome.

Gene prediction_de novo

Run Augustus (v3.0.3) to de novo predict genes in the repeat-masked genome sequences.

Note

using parameters "--species=Ophiophagus_hannah --uniqueGeneId=true --noInFrameStop=true --gff3=on --strand=both" when running Augustus.

Gene prediction_homolog

Download the publicly available protein sequences of representative homologous snake species, align these against our masked genome sequences with BLAT, and then based on the BLAT mapping results, GeneWise (v2.4.1 ) is then run to predict the genes.

Gene prediction_transcriptome

Then filter RNA-seq data using Trimmomatic(v0.30). The resulting data is then assembled by Trinity (v2.13.2). PASA(v2.0.2) was finally used to align transcript against the snake genome of interest to obtain gene structures.

Note

default parameters

Final gene set_MAKER

Integrate the genes predicted in step 4-6 to obtain the consensus gene set using the MAKER pipeline (v3.01.03).

Functional annotation

Map protein sequences of the final gene set to existing databases to identify their functions or motifs, such as SwissProt, TrEMBL, KEGG, InterPro.

Note

SwissProt, TrEMBL and KEGG: using BLASTP; Interpro: using InterProScan (v5.52-86.0) with seven different models (Profilescan, blastprodom, HmmSmart, HmmPanther, HmmPfam, FPrintScan and Pattern-Scan)

Abstract

Before start

Steps

Repeat annotation_de novo

Repeat annotation_database

Gene prediction_preparation

Gene prediction_de novo

Gene prediction_homolog

Gene prediction_transcriptome

Final gene set_MAKER

Functional annotation

推荐阅读