Exploring Manually Curated Annotations of Intrinsically Disordered Proteins with DisProt

Federica Quaglia, Federica Quaglia, András Hatos, András Hatos, Edoardo Salladini, Edoardo Salladini, Damiano Piovesan, Damiano Piovesan, Silvio C. E. Tosatto, Silvio C. E. Tosatto

Published: 2022-07-05 DOI: 10.1002/cpz1.484

intrinsically disordered proteins

AI 解读

Abstract

DisProt is the major repository of manually curated data for intrinsically disordered proteins collected from the literature. Although lacking a stable three-dimensional structure under physiological conditions, intrinsically disordered proteins carry out a plethora of biological functions, some of them directly arising from their flexible nature. A growing number of scientific studies have been published during the last few decades to shed light on their unstructured state, their binding modes, and their functions. DisProt makes use of a team of expert biocurators to provide up-to-date annotations of intrinsically disordered proteins from the literature, making them available to the scientific community. Here we present a comprehensive description on how to use DisProt in different contexts and provide a detailed explanation of how to explore and interpret manually curated annotations of intrinsically disordered proteins. We describe how to search DisProt annotations, both using the web interface and the API for programmatic access. Finally, we explain how to visualize and interpret a DisProt entry, the SARS-CoV-2 Nucleoprotein, characterized by the presence of unstructured N-terminal and C-terminal regions and a flexible linker. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC.

Basic Protocol 1 : Performing a search in DisProt

Support Protocol 1 : Downloading options

Support Protocol 2 : Programmatic access with DisProt REST API

Basic Protocol 2 : Exploring the DisProt Ontology page

Basic Protocol 3 : Visualizing and interpreting DisProt entries–the SARS-CoV-2 Nucleoprotein use case

INTRODUCTION

Intrinsically disordered proteins (IDPs) are characterized by the presence of unstructured and highly flexible segments, termed “intrinsically disordered regions” (IDRs), that lack a stable three-dimensional structure. IDRs can be easily detected by several biophysical and biochemical methods, among which X-ray and NMR are the most commonly used (Tompa, 2010; van der Lee et al., 2014). Missing electron density regions that cannot be detected on X-ray crystal structures are due to unobserved atoms that fail to properly scatter X-rays, denoting their structural flexibility (Tompa, 2010, 201; Uversky & Dunker, 2010). NMR spectroscopy studies are also widely used to assess the presence of unstructured protein segments, being able to recognize disordered regions that in crystal structures are visible due to the formation of crystal contacts (Dyson & Wright, 2019). Several additional methods can assess the presence of intrinsic disorder in a protein, such as circular dichroism, sensitivity to proteolysis, and small-angle X-ray scattering (Kragelund and Skriver, 2020; Tompa, 2010).

Intrinsically disordered proteins can also exist as partially structured folding intermediates, pre-molten globules and molten globules, that exhibit a higher degree of secondary structure than random coils while being less compact than native structures (van der Lee et al., 2014). IDPs can play a crucial role in several biological processes, such as membrane localization and interaction with protein chaperones, to name a few (Uversky & Dunker, 2010). The lack of structure in IDR segments in their unbound state provides a multiplicity of advantages due to their largely extended conformation, such as: (1) the possibility for a single IDR to be involved in interactions with more structurally different partners; (2) several structured partners being able to bind to a single region; (3) the coupled folding and binding that give the ability for high specificity; and (4) a reduced binding strength that allows for transient interactions (Bugge et al., 2020; Dogan, Gianni, & Jemth, 2014). IDRs can undergo a disorder-to-order transition upon binding of a partner, enabling them to play a central role as protein hubs, as in the case of p53 (DisProt identifier: DP00086) and α-synuclein (DisProt identifier: DP00070), or as targets of a structured hub, e.g., TAZ and KIX (Cumberworth, Lamour, Babu, & Gsponer, 2013; Dosztányi, Chen, Dunker, Simon, & Tompa, 2006; Oldfield et al., 2008; Wright & Dyson, 2015). Finally, IDPs can also be involved in the regulation of several biological processes, interacting with different types of binding partners such as proteins, nucleic acids, lipids, and small molecules (Tompa, 2005; van der Lee et al., 2014). Strikingly, some of the most well characterized and crucial functions of IDPs arise from their flexible nature: they can be flexible linkers connecting structured domains of a protein, or they can act as entropic clocks, bristles, and springs due to their entropic features (Uversky & Dunker, 2010; van der Lee et al., 2014).

DisProt is a service of the Italian node of ELIXIR, the European infrastructure for biological data, and a key resource for the recently established ELIXIR IDP user community (Davey et al., 2019). It is also the largest repository of manually curated annotations of intrinsically disordered proteins (IDPs) collected from the literature (Hatos et al., 2020; Piovesan et al., 2017; Quaglia et al., 2022a). A team of expert DisProt curators looks for new data on IDPs/IDRs from relevant publications and annotates them through a dedicated curation interface by means of intrinsic disorder–related annotation terms. DisProt relies on three different ontologies to annotate intrinsically disordered regions: the Intrinsically Disordered Proteins Ontology (IDPO), the Gene Ontology (GO), and the Evidence and Conclusion Ontology (ECO). IDPO is used to describe structural aspects of an IDP/IDR, self-functions and functions directly associated with their disordered state. Gene Ontology (Ashburner et al., 2000; Gene Ontology Consortium, 2021) is used to describe functional aspects of an IDP/IDR. The Evidence and Conclusion Ontology (Nadendla et al., 2022) describes the technique associated with an annotation. A DisProt entry corresponds to a protein isoform and unambiguously maps to a UniProt entry. DisProt annotations describe local properties of the protein sequence (e.g., intrinsically disordered regions), which are always supported by experimental evidence taken from the literature. Each DisProt annotation is uniquely identified by the DisProt entry accession number followed by a suffix starting with a lowercase letter r (example DP00086r003).

In this article, we provide detailed protocols explaining how to perform a search in DisProt (Basic Protocol 1), explore the ontologies used in DisProt (Basic Protocol 2), and visualize and interpret annotations of a DisProt entry (Basic Protocol 3). We also describe the downloading options in DisProt (Support Protocol 1) and programmatic access with the DisProt REST API (Support Protocol 2).

Basic Protocol 1: PERFORMING A SEARCH IN DisProt

DisProt is freely accessible at https://disprot.org/. This protocol describes how to search entries and to retrieve information in DisProt. From the home page, users can also navigate the DisProt blog (https://disprot.org/blog) to read posts describing our updates or explore the DisProt Twitter account (https://twitter.com/disprot_db) (Fig. 1).

Necessary Resources

Hardware

While DisProt works best on laptop or desktop computers, it is also easily accessible from smartphones and tablets. An active and stable internet connection is required.

Software

Internet browser, e.g., Firefox (http://www.mozilla.org/firefox), Google Chrome (http://www.google.com/chrome), or Safari (http://www.apple.com/safari)

Input data

Free text search against the database

Performing a text search

1.Open a web browser and connect to DisProt at https://disprot.org/.

2.Searches in DisProt can be performed either using the “Search” box on the top-middle of the DisProt home page, or by clicking on the “Browse” button available on the top-left of the home page.

Users can perform a search using the “Search” box on the top-middle of the DisProt home page to look for protein entries or entries referencing a specific publication.

Users can look for specific proteins, e.g., nucleoproteins, by typing the protein nameNucleoprotein. Users will be redirected to a list of all the nucleoprotein entries available in DisProt, e.g., Nucleoprotein from Measles virus (DisProt entry: DP00640).

Users might also be interested in looking for a specific publication. In this case, enter the corresponding PubMed identifier (PMID) of the publication in the search box. All entries that have at least one evidence referencing that publication will be displayed.

Alternatively, it is possible to perform an advanced search by clicking on the “Browse” button available on the top-left of the home page. Users will be redirected to an advanced search page, where they can refine their search and look for a specific query or a combination of them (Fig.2), e.g., a protein name and an organism.

Browse page–Text search. Users can perform advanced text searches, look for specific queries, and customize the results of their search.

3.Select “Text search” on the top-left side of the Browse page, then select a term from the drop-down menu.

Users can look for the following aspects:

A specific protein: select a “Protein name”, e.g., Nucleoprotein, or “UniProt”, e.g., P0DTC9.
A specific DisProt entry: select “DisProt”, e.g., DP03212.
A set of proteins from a specific organism: choose an “Organism”, e.g., “Gallus”, the “Taxon”, or “NCBI Taxon”.
UniProt Reference Clusters (UniRef). UniRef databases cluster UniProtKB sequences by gathering together proteins based on their sequence similarity (Suzek, Wang, Huang, McGarvey, & Wu,2015). Terms available are “UniRef50”, “UniRef90”, and “UniRef100” (clustering the sequences at 50%, 90% and 100% identity, respectively).
Entries from a specific curator: select the “Curator name” term and start typing the name you are looking for.
A specific reference: users can look for a specific PMID, e.g., 8632448, by selecting the “Reference identifier” term or for the title of the corresponding publication, e.g., “Alternative arrangements of the protein chain are possible for the adenovirus single-stranded DNA binding protein”, by selecting the “Reference name” term.
A specific term from the ontologies adopted in DisProt:An IDPO term: select a “IDPO identifier”, e.g., “flexible linker/spacer”, and “IDPO term name”, e.g., IDPO:00502.A Gene Ontology (GO) term: select a “GO identifier”, e.g., “modulation by virus of host cell cycle”, or the “GO term name”, e.g., GO:0060153. An Evidence and Conclusion Ontology (ECO) term: select a “ECO identifier”, e.g., “modulation by virus of host cell cycle”, or the “ECO term name”, e.g., ECO:0006163.

Users that wish to have a better insight on the terms of our ontology and read their descriptions can refer to the Ontology page available athttps://disprot.org/ontology.

Entries from a specific dataset: select “Dataset”, e.g., “Viral proteins”.
It is also possible to perform a free text search by selecting the “all fields” term in the drop-down menu.

4.It is possible to customize the table columns to visualize more details of an entry in the displayed results. Default columns include “ DisProt ID ”, “ UniProt Accession ”, “ Protein Name ”, “ Organism ”, “Sequence length”, and “ Disorder content ”. We suggest adding at least the “ annotated terms ” column to have an insight on the disorder aspects available for each entry.

5.Download the search results using the “Download selected” button at the top-left of the Browse page. Users can also choose to include ambiguous and/or obsolete entries by selecting the corresponding buttons above “Download selected”.

Select the type of pieces of evidence you want to download among: structural state (IDPO), structural transition (IDPO), disorder function (IDPO), molecular function (GO), biological process (GO), or cellular component (GO)
Select the type of desired data, i.e., “regions” or “consensus”.
Select the file format. Available options for download are JSON, TSV, FASTA, and GAF.

Performing a sequence similarity search

6.Open a web browser and connect to DisProt at https://disprot.org/.

7.Click on the “Browse” button on the top-left side of the home page (Fig. 3) to be redirected to the advanced search page.

Browse page–BLAST. Users can perform BLAST searches of a specific protein sequence against the entries available in DisProt.

8.Select “BLAST” on the top-left side of the Browse page to perform a BLAST (Altschul, Gish, Miller, Myers, & Lipman, 1990) sequence similarity search against DisProt entries.

9.Insert a protein sequence in the corresponding box and click on “Submit”.

Note

DisProt entries that match the query will be displayed in the results.

10.It is possible to customize the table columns to visualize more details of an entry in the displayed results. Default columns include “ DisProt ”, “ UniProt ”, “ Protein name ”, “ Organism ”, “ Sequence length ”, and “ Disorder content ” along with “ Bit-score ”, “ E-value ”, “ Identity ”, and “ Coverage ”.

Note

Entries are sorted by lowest E-value.

11.Click on “ See alignment ” to visualize where the query and the subject sequences align.

12.Download the search results using the “Download selected” button at the top-left of the Browse page. Users can also choose to include ambiguous and/or obsolete entries by selecting the corresponding buttons above “Download selected”.

Select the type of pieces of evidence you want to download among: structural state (IDPO), structural transition (IDPO), disorder function (IDPO), molecular function (GO), biological process (GO), and cellular component (GO).
Select the type of desired data, i.e., “regions” or “consensus”.
Select the file format. Available options for download are JSON, TSV, FASTA, and GAF.

Support Protocol 1: DOWNLOADING OPTIONS

From the DisProt “Download” page (https://disprot.org/download), users can download a specific release of the database, datasets and annotated aspects, or a specific version of the IDP ontology (Fig. 4).

Download page. Users can download a specific release of the database, datasets, and annotated aspects, or a specific version of the IDP ontology, in different file formats.