Using ConSurf to Detect Functionally Important Regions in RNA

Maya Rubin, Maya Rubin, Nir Ben-Tal, Nir Ben-Tal

Published: 2021-10-07 DOI: 10.1002/cpz1.270

Abstract

The ConSurf web server (https://consurf.tau.ac.il/) for using evolutionary data to detect functional regions is useful for analyzing proteins. The analysis is based on the premise that functional regions, which may for example facilitate ligand binding and catalysis, often evolve slowly. The analysis requires finding enough effective, i.e., non-redundant, sufficiently remote homologs. Indeed, the ConSurf pipeline, which is based on state-of-the-art protein sequence databases and analysis tools, is highly valuable for protein analysis. ConSurf also allows evolutionary analysis of RNA, but the analysis often fails due to insufficient data, particularly the inability of the current pipeline to detect enough effective RNA homologs. This is because the RNA search tools and databases offered are not as good as those used for protein analysis. Fortunately, ConSurf also allows importing external collections of homologs in the form of a multiple sequence alignment (MSA). Leveraging this, here we describe various protocols for constructing MSAs for successful ConSurf analysis of RNA queries. We report the level of success of these protocols on an exemplary set comprising a dozen RNA molecules of diverse structure and function. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC.

Basic Protocol 1 : Standard ConSurf evolutionary conservation analysis of an RNA query.

Basic Protocol 2 : ConSurf evolutionary conservation analysis of an RNA query with external MSA.

Support Protocol 1 : Construction of an MSA for an RNA query using other online servers.

Support Protocol 2 : Construction of an MSA for an RNA query using nHMMER locally

INTRODUCTION

In the sequences of macromolecules, the evolutionary rate per site, be it an amino acid position in a protein sequence or a nucleotide position in an RNA or DNA sequence, reflects a balance between a natural tendency of the position to mutate, i.e., ‘drift’, and natural selection. Rarely, the latter may lead to accelerated evolutionary rate, as for example in the ligand-recognition regions of antibodies and other components of our immune system. However, most often, natural selection limits the evolutionary rates of binding and catalytic sites, as well as of other sites that are biologically important. Thus, a slow evolutionary rate is often a clear mark of functionally important regions in protein, DNA, and RNA molecules (Capra & Singh, 2007; Del Sol, Pazos, & Valencia, 2003; Gallet, Charloteaux, Thomas, & Brasseur, 2000; Huang & Golding, 2014, 2015; Innis, 2007; Landgraf, Xenarios, & Eisenberg, 2001; Lichtarge, Bourne, & Cohen, 1996a, 1996b; Lichtarge, Yamamoto, & Cohen, 1997; Mayrose, Graur, Ben-Tal, & Pupko, 2004; Valdar, 2002). ConSurf provides a reliable and easy-to-use way to exploit this principle (Ashkenazy et al., 2016; Ashkenazy, Erez, Martz, Pupko, & Ben-Tal, 2010; Celniker et al., 2013; Mayrose et al., 2004). Starting from the user-provided sequence or structure of a query protein/RNA/DNA, ConSurf automatically collects a set of effective homologs, aligns their sequences, builds a phylogenetic tree that represents their evolutionary relationships, and estimates the evolutionary rates of the amino acid or nucleotide positions using a statistically robust evolutionary model. An outline of the ConSurf pipeline is shown in Figure 1.

A flowchart of the analysis steps in ConSurf. Calculations can start from the sequence or structure of the query. Here we exploit the possibility to include an external alignment of homologs.

While ConSurf offers a pipeline for analyzing both proteins and RNA/DNA molecules, it is most used in protein analysis and rarely with nucleotides. This is presumably because the nucleic acid analysis pipeline offered by ConSurf is frequently aborted because of failure to detect a large enough set of effective homologs to the query.

Here, we show how to improve the analysis of an RNA query by utilizing state-of-the-art sequence search tools in combination with the ConSurf pipeline. Basic Protocol 1 details an analysis that utilizes the MSA construction of ConSurf itself. This protocol, which often fails, is used mostly as a reference. Basic Protocol 2, the recommended alternative, details an analysis based on an externally constructed MSA. The Support Protocols provide guidance on constructing an MSA for an RNA query to be used with Basic Protocol 2.

Basic Protocol 1: STANDARD ConSurf EVOLUTIONARY CONSERVATION ANALYSIS OF AN RNA QUERY

This protocol provides guidance on using the ConSurf server to analyze the evolutionary conservation profile of an RNA query, given its 3D structure or nucleotide sequence.

Necessary Resources

Hardware

Computer with Internet connection, under Windows, Mac, or Linux

Software (recommended)

The PyMOL (Schrödinger, 2021), Chimera (Pettersen et al., 2004), or RasMol (Sayle & Milner-White, 1995) molecular visualizer

1.Upload RNA query.

Note

Enter the ConSurf server web page (https://consurf.tau.ac.il/) and select the Nucleotides option. You will be redirected to a page asking if your query has a known structure. Selecting ‘YES’ will allow you to enter the PDB ID of the molecule (Fig. 2); press “Next” and indicate the chain of interest if needed (Fig. 3). Alternatively, you may upload the coordinate file, in PDB format. If ‘NO’ is selected the server will proceed to ask for an MSA. Selecting ‘NO’ again will allow you to enter the query sequence, in FASTA format (Fig. 4).

Entering the PDB ID of the structure of the RNA query.

Entering the query sequence in FASTA format.

2.Select setting for the construction of a multiple sequence alignment (MSA). See Figure 5.

The server will ask if you wish to upload an MSA. Select NO, at which point ConSurf will allow you to select the homology search method and nucleotide database, as well as other parameters for generating an MSA:

An analysis page of an RNA query of known structure ready to be submitted. Default settings are used in all fields, and job title and e-mail address are added, as recommended.

Note

Homolog search algorithm–A choice between nBLAST (default) (Altschul, Gish, Miller, Myers, & Lipman, 1990; https://blast.ncbi.nlm.nih.gov/Blast.cgi) and nHMMER (Eddy, 2009; http://hmmer.org/).

Note

Nucleotides Database–Currently there is only one nucleotide database that can be searched, nr NCBI Nucleotide Home page; see Internet Resources).

Note

BLAST E-value Cutoff–This value is set at 0.001 by default and can be increased up to <1. Increasing the value may help in finding more homologs (but also potentially non-homologs).

Note

Select homologs for ConSurf analyses–There are two options to choose from, “automatically” or “manually.” Selecting “manually” will eventually require the user to mark sequences from the hits list, which could be valuable for users who are very familiar with their query. We highly recommend selecting “automatically.”

1.“automatically”: The user is asked to indicate the maximum number of homologs (150 is the default; selecting more than 300 would significantly slow the calculation), as well as the maximum and minimum sequence ID percentages (95 and 60 by default). In ConSurf, the hits (coming from the nBLAST or nHMMER search) are sorted by their E-values in an ascending order, based on the principle that the lower the E-value, the more likely the hit is to be a true homolog. When selecting “automatically,” a predetermined number of hits are sampled evenly from the sorted list to create the final list of homologs of the query protein. The user is also asked to choose between three methods for multiply align the selected homologs: MAFFT-L-INS-i (default), PRANK, or CLUSTALW.

3.Selecting analysis methods (Fig. 5)

Note

The Calculation Method can be Bayesian (Default) or Maximum Likelihood. For the Evolutionary Substitution Model there is a choice between “T92 model (Tamura, 1992),” “GTR: General Time Reversible,” “JC69 model (Juke & Cantor, 1969),” “HKY85 model (Hasegawa, Kishino, & Yano, 1985),” and “Best model” (default).

Note

We recommend using the default settings.

4.Run job entry (Fig. 5).

Note

There is an option to give a job title as well as an e-mail address. The latter is particularly convenient because, if used, an e-mail will be sent with a link to the results once the run has finished. Each job is given a number, regardless of whether a job title was entered, and the results are kept on the server for up to 3 months.

Basic Protocol 2: ConSurf EVOLUTIONARY CONSERVATION ANALYSIS OF AN RNA QUERY WITH EXTERNAL MSA

This protocol will provide guidance to using the ConSurf server to analyze the evolutionary conservation profile of an RNA query, given its 3D structure or nucleotide sequence, using an externally provided MSA. Two Support Protocols for constructing MSA for the query are provided further below.