Leveraging AI Advances and Online Tools for Structure-Based Variant Analysis

Francisco J. Guzmán-Vega, Francisco J. Guzmán-Vega, Ana C. González-Álvarez, Ana C. González-Álvarez, Karla A. Peña-Guerra, Karla A. Peña-Guerra, Kelly J. Cardona-Londoño, Kelly J. Cardona-Londoño, Stefan T. Arold, Stefan T. Arold

Published: 2023-08-04 DOI: 10.1002/cpz1.857

AlphaFold

gene variants

protein structure

structural analysis for non-experts

variant analysis

AI 解读

Abstract

Understanding how a gene variant affects protein function is important in life science, as it helps explain traits or dysfunctions in organisms. In a clinical setting, this understanding makes it possible to improve and personalize patient care. Bioinformatic tools often only assign a pathogenicity score, rather than providing information about the molecular basis for phenotypes. Experimental testing can furnish this information, but this is slow and costly and requires expertise and equipment not available in a clinical setting. Conversely, mapping a gene variant onto the three-dimensional (3D) protein structure provides a fast molecular assessment free of charge. Before 2021, this type of analysis was severely limited by the availability of experimentally determined 3D protein structures. Advances in artificial intelligence algorithms now allow confident prediction of protein structural features from sequence alone. The aim of the protocols presented here is to enable non-experts to use databases and online tools to investigate the molecular effect of a genetic variant. The Basic Protocol relies only on the online resources AlphaFold, Protein Structure Database, and UniProt. Alternate Protocols document the usage of the Protein Data Bank, SWISS-MODEL, ColabFold, and PyMOL for structure-based variant analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.

Basic Protocol : 3D Mapping based on UniProt and AlphaFold

Alternate Protocol 1 : Using experimental models from the PDB

Alternate Protocol 2 : Using information from homology modeling with SWISS-MODEL

Alternate Protocol 3 : Predicting 3D structures with ColabFold

Alternate Protocol 4 : Structure visualization and analysis with PyMOL

INTRODUCTION

Genetic variations can result in both advantageous adaptations and detrimental diseases. Many changes that have an effect on an individual's phenotype are located in the protein-coding regions of genomes (Backman et al., 2021). Hence, understanding how a gene variant affects its protein product is crucial for comprehending both normal and abnormal biological processes. In medicine, this knowledge can facilitate personalized treatments based on an individual's genetic profile, leading to improved diagnoses, more effective treatments, reduced side effects, and better health outcomes. Despite significant progress made in linking specific genes to certain disorders, determining the variants and underlying biological mechanisms remains a challenge for many disease phenotypes. Consequently, cancer drivers may not be identified in time, and many patients with suspected rare genetic diseases either never receive a definitive diagnosis or do so only after a lengthy and exhausting “diagnostic odyssey,” during which they may experience irreversible damage (MacArthur et al., 2014).

Traditional methods for predicting variant pathogenicity typically employ various classification algorithms to generate a score indicating the likelihood of a variant being damaging. Among the most widely used in silico prediction tools are SIFT (Ng & Henikoff, 2001), PolyPhen-2 (Adzhubei et al., 2010, 2013), and CADD (Rentzsch et al., 2018). More recent methods utilize advanced deep-learning techniques (Frazer et al., 2021; Qi et al., 2021), including large language models (Brandes et al., 2022; Lin et al., 2023), to predict the pathogenicity of missense variants with greater accuracy. However, although predicted pathogenicity scores may aid in identifying a driver mutation, they do not elucidate how a variant impacts protein function.

A protein's function is dependent on its three-dimensional (3D) structural features. Several computational resources have been developed to predict or document the impact of amino acid substitutions on protein structures. For instance, Missense3D (Ittisoponpisan et al., 2019) predicts the structural damage resulting from a point mutation, and its associated Missense3D-BD database contains pre-calculated results for about 4 million known missense variants from the Humsavar, ClinVar, and gnomAD resources. VarSite (Laskowski et al., 2020) annotates known disease-associated variants in human genes with structural information derived from experimentally determined 3D structures in the Protein Data Bank (PDB). Although these resources are valuable for understanding the effects of mutations, they have limitations. Most notably, Missense3D-BD and VarSite only provide structural annotations for previously reported variants, and VarSite only annotates protein structures from the PDB, which contains (partial) structures of just 17% of human genes. Additionally, these tools currently lack important features for assessing the impact of a variant, such as information about proximity to protein sites involved in catalytic activity, regulation, or ligand binding. Finally, interpreting the features provided by Missense3D may be challenging without interactively visualizing the 3D structural context.

For these reasons, being able to view and study a novel mutated residue within its 3D structural context can be essential for understanding the causes and mechanisms of a disease. Before 2021, the ability to map an amino acid variant onto a 3D structure was severely limited due to the lack of reliable 3D structural information for more than 80% of human proteins. Homology modeling may infer the 3D structure of a human protein from known structures of similar nonhuman proteins, but the accuracy depends on the availability and sequence identity of structural templates.

In 2020, the AI-based method AlphaFold demonstrated its ability to predict the 3D structure of proteins from their amino acid sequence with an accuracy that can be on par with that of high-quality experimental structures (Jumper et al., 2021). In June 2021, AlphaFold became publicly available, and its Protein Structure Database now contains precalculated 3D structures for 200 million proteins, including all human proteins (Tunyasuvunakool et al., 2021; Varadi et al., 2021). This resource enables scientists and healthcare providers to quickly assess the impact of human gene variants. However, a step-by-step guide for structure-based variant analysis using these methods and resources is still needed.

We provide protocols to help non-experts, including clinicians and healthcare personnel, use these resources to quickly assess the molecular impact of a gene variant. The basic protocol relies only on online resources and allows non-experts to develop hypotheses about how a mutation affects protein function. Depending on the variant and protein, the information can be obtained within minutes to hours. Alternate protocols describe the use of additional programs and resources.

Understanding how a mutation affects a protein's structure and function is essential for linking phenotypes to gene variants and personalizing therapy. However, our protocol can be used to investigate the impact of mutations on any protein, including those from plants or bacteria.

STRATEGIC PLANNING

The Basic Protocol is the simplest approach, relying only on web-based tools. It uses information from the UniProt and AlphaFold databases and their online visualization tools. The protocol has three steps: Preparation, Mapping, and Analysis (Fig. 1). We also propose four Alternate Protocols for obtaining additional information Alternate Protocol 1 (Using experimental models from the PDB) and Alternate Protocol 2 (Using information from homology modeling with SWISS-MODEL) provide information on ligand binding or protein-protein interactions. Alternate Protocol 3 (Predicting 3D structures with ColabFold) produces 3D models of protein sequences not precalculated, such as specific truncations, isoforms, or protein complexes. Alternate Protocol 4 (Structure visualization and analysis with PyMOL) provides a focused protocol for the visualization and analysis of variants in 3D protein structures.

Schematic overview of the different protocols presented in this manuscript. PTMs, post-transcriptional modifications.

NOTE : All protocols involving animals must be reviewed and approved by the appropriate Animal Care and Use Committee and must follow regulations for the care and use of laboratory animals. Appropriate informed consent is necessary for obtaining and use of human study material.

Basic Protocol: 3D MAPPING BASED ON UniProt AND AlphaFold

The Basic Protocol is ideal for quickly evaluating standard protein forms precalculated by AlphaFold. The insights gained allow to identify, or rule out, effects linked to protein stability, and, in some cases, catalysis. If this protocol fails to yield conclusive results, we recommend the Alternate Protocols. As case examples for the Basic Protocol, we will analyze two protein variants (Arg799Cys and Arg918Trp) of the AGTPBP1 protein, implicated in infantile-onset neurodegeneration (Shashi et al., 2018).

Necessary Resources

Hardware

Computer with internet access

Software

Standard internet browser

Preparation

1.To identify the protein sequence of interest in UniProt, go to the UniProt website (https://www.uniprot.org/) and search for the name of its gene or transcript ID. Click on the entry for the correct species to access the entry website (Fig. 2).

UniProt start page for Q9UPW5 (AGTPBP1). Sections can be accessed by clicking on their titles in the left-side menu.

2.In the entry page for your protein, click on “Sequence & Isoforms” in the section menu displayed at top left in the UniProt window (Fig. 3).

Note

If different protein transcript/isoform sequences are available, the easiest approach is to take the sequence identified by UniProt, normally the first one listed, as the “canonical sequence”(Q9UPW5-1 in this example). A more rigorous approach is to take the reference sequence recommended by the MANE initiative (Matched Annotation from NCBI and EMB-EBI; Morales et al., 2022), shown in Figure 3. MANE provides a set of high-confidence transcripts and corresponding proteins to serve as universal standards for variant reporting.

“Sequence & Isoforms” section. Select the isoform sequence and verify that the residues of interest are present with their correct numbers. The red boxes highlight Arg799 and Arg918. Below the sequences are cross-references to the IDs from different databases and their corresponding UniProt isoform IDs. The MANE-Select isoform is highlighted in red.

3.Verify that the chosen UniProt sequence contains the wild-type residue(s) at the correct position(s).

Note

In our example, Q9UPW5-1 correctly contains both arginines in position 799 and 918 (Fig. 3). If this is not the case, refer to the database and ID that were used to report the variant, for example the GenBank ID (https://www.ncbi.nlm.nih.gov/genbank/). To identify the corresponding UniProt sequence, you can search (using Ctr+F) the transcript ID and see if it is reported in this UniProt entry. If so, it is next to the corresponding UniProt ID (e.g., Q9UPW5-2 or Q9UPW5-3). Click on the UniProt ID to be taken to the amino acid sequence, and search again for your amino acid of interest. If your sequence ID is not located in this entry, search for it in the main UniProt search bar at the very top of the page to find the entry that is associated with it.

4.Gather general information on the gene from UniProt.

Note

UniProt contains a wealth of information that can help identify the effects of mutations. In addition to “Sequence & Isoforms,” its left-side menu offers ten other categories, including “Function,” “Disease & Variants,” “PTM/Processing,” and “Interaction.” Here we discuss those categories that are particularly relevant for variant analysis.

Note

The “Function” category provides a brief overview of the protein's functions, including catalytic activity, that may be affected by mutations. Additional information is available through diagrams, gene ontology (GO), and tables providing features such as residues involved in ligand binding or catalysis. In our example, AGTPBP1 has a zinc-binding site involving residues 920, 923, and 1017, and an active site at residue 970 (Fig. 4A). Note that the mutated Arg918 residue is close to the zinc-binding site, providing a first hint that its mutation to tryptophan may impair zinc binding.

Note

The “Disease & Variants” section summarizes diseases associated with previously reported gene variants. It includes information on affected residues and reported phenotypes. If a novel mutation is near known mutations, or in the same domain, its phenotypic consequences may be similar.

Note

The “Interaction” section lists protein-protein interactions. Variants can directly or indirectly affect binding sites, for example by destabilizing the protein domain or post-translational modification (PTM) site responsible for the interaction (Fig. 4B). In our example, an interaction with MYLK is suggested.

Note

The “PTM/Processing” category allows you to check whether your residue of interest is affected by or near a PTM site (Fig. 4C).

Note

We will discuss the “Structure” category in the Mapping section below.

Note

To assess the effect of a protein variant, it is important to determine if it is located in a functional 3D domain or a disordered region. Disordered regions are usually less sensitive to mutations. The “Family & Domains” section lists known domains, motifs, and unstructured regions. For an often more complete list and description of domains, visit the InterPro page link provided in the “Family and domain databases” subsection. In our example, the UniProt domain annotation is very rudimentary; however, the InterPro page shows that Arg799 is in the cytosolic carboxypeptidase N-terminal domain and Arg918 is in the zinc carboxypeptidase domain (Fig. 5). An additional “ARM-like” helical N-terminal domain is suggested.

Note

It is important to identify whether a variant targets a “Transmembrane Domain” in the corresponding section. Surface mutations may have different effects in transmembrane domains than in cytoplasmic proteins due to their hydrophobic environment. Mutations in transmembrane domains can also affect transport activities or destabilize or delocalize membrane proteins. The free internet tool Phobius (https://phobius.sbc.su.se/) can be used as an alternative approach. AGTPTP1 does not have transmembrane regions.

Additional sections providing functional information about a protein. (A) The “Features” section lists AGTPBP1 residues involved in catalysis and zinc binding (boxed in red). (B) Known protein-protein interactions are listed in the “Interaction” section. (C) The “PTM/Processing” section lists known phosphoserine sites for AGTPBP1.

“Family & Domains” section. Here you will find lists of domains and motifs annotated by UniProt in the sequence (red box above). Dedicated databases such as InterPro (red box below) can contain additional information (the dashed inset shows InterPro domains).

Mapping

After completing the preparation step, in which you identify the wild-type residue in the protein sequence and gather background information on the protein's function and features, the next step is mapping. In this step you will identify the wild-type residue of your variant in its 3D protein context. Below we describe the simplest way to do this by using pre-calculated AlphaFold structures and web-based visualization tools. Alternatively, you can obtain 3D structures from the PDB or through homology modeling and use other programs for structure visualization. These approaches are described in detail in Alternate Protocols 1-4.

5.To access AlphaFold structures on UniProt, scroll to the “Structure” section (Fig. 6). Below the interactive structure viewer, you will find a table with the available 3D structures for your protein.

The “Structure” section shows available 3D protein structures. For AGTPBP1, only an AlphaFold model is available. This structure is shown in the viewer colored by confidence score (pLDDT). Clicking on the “AlphaFold” link (red box) opens the corresponding page in the AlphaFold Protein Structure Database.

6.Look for the entry with “AlphaFold” in the “SOURCE” column and click on the hyperlink in the “LINKS” column. This will take you to the entry page for this model in the AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/).

Note

This database contains AlphaFold predictions for the human proteome and 20 other model organisms' proteomes.

7.Map your residue onto the 3D structure by using the interactive AlphaFold Structure Viewer.

Note

The AlphaFold page displays protein information from UniProt at the top and has two interactive features below: The Structure Viewer and the Predicted Aligned Error (PAE) plot.

Note

The Structure Viewer shows the protein sequence and predicted 3D structure (Fig. 7A). By default, residues in the 3D structure are color-coded according to AlphaFold's predicted local distance difference test (pLDDT). The pLDDT estimates the confidence in the position and conformation of each residue on a scale from 0 to 100. The key to the color bands is shown on the left. Blue tones represent high confidence, with dark blue (pLDDT > 90) indicating correct side chain conformation. Yellow/orange may indicate low confidence in the 3D structure, or, more likely, a flexible/disordered region without a stable 3D structure (Tunyasuvunakool et al., 2021). The model for AGTPBP1 shows that the 3D-structured regions are modelled with confidence (light blue) or high confidence (dark blue). Extensive yellow and orange regions are a central flexible linker (residues 439-623) and C-terminal tail (residues 1142-1226) (Fig. 7B).

Structural analysis of the precalculated AlphaFold model for UniProt Q9UPW5-1 (AGTPBP1, isoform 1). (A) Default view, with the protein colored by pLDDT quality score (key on the left). (B) Flexible protein regions that have low pLDDT scores are circled in red. (C-E) Domains manually selected in the PAE plot (gray rectangle with white frame and “x” in top right corner) and highlighted green in structure. C and D represent folded domains, and E the flexible linker between them. The dark green off-diagonal PAE sections (red boxes in E) indicate that the two domains are stably bound to each other, despite the linker between them. (F) Ball-and-stick representation of Arg799 and surrounding residues. (G) Screenshot menu.

8.Assess the confidence in the relative positioning of residues and domains with the PAE plot.

Note

The PAE plot indicates the confidence in the position of a residue relative to other residues in the structure. Dark green corresponds to low positional error. Residues within well-folded domains form a dark green rectangle on the diagonal of the PAE plot. Two domains that stably associate with each other give rise to a dark green off-diagonal feature. Hence, the PAE can reveal extents of domains and contacts between distant parts of a protein. More information on the PAE plot can be found in the “Predicted aligned error tutorial” copied below each entry of the AlphaFold database, and in Zhang et al. (2022). The PAE plot is interactive and can be used to visualize individual domains in the 3D model by selecting (by a mouse-click and drag) a region of interest. The protein residues corresponding to the selected portion of the PAE will be highlighted in green on the 3D model (Fig. 7C-E). The PAE plot of AGTPBP1 clearly shows two domains, visible as rectangles along the diagonal covering residues 16-438 and 624-1141 (Fig. 7C and D). The first domain corresponds to the ARM-like helical domain (see the discussion of “Family & Domains” in step 4). The second rectangle comprises the zinc carboxypeptidase domain and the cytosolic carboxypeptidase N-terminal domain, suggesting that they form one structural unit. The off-diagonal green imprint also strongly suggests that both the N-terminal ARM-like domain and C-terminal catalytic unit stably interact, even though they are separated by a large, flexible linker (residues 439-623; Fig. 7E)

9.Identify the wild-type residue in the 3D structure.

Note

Hovering your cursor over residues in the protein structure or Sequence Viewer activates an information box in the bottom-right corner. This box displays information such as position, residue type, and pLDDT score (Fig. 7F).

10.Click on a residue in the AlphaFold Structure or Sequence Viewer to get a zoomed-in view of the amino acid and its intramolecular interactions.

Note

The interactive interface allows you to manipulate the view of the protein structure and focus on specific regions using your mouse or touchpad. You can zoom in and out using the scroll wheel and rotate the model by clicking and holding the left mouse button while moving the cursor. A right mouse click allows you to zoom out while keeping the residue atoms displayed, removing the narrow depth cueing.

11.Capture and save an image of the visualization. In the top right corner of the Structure Viewer , there are three icons representing the following options: Top, to reset the view to the default settings; middle, to capture a screenshot of the current structure view (this can be copied or downloaded as a PNG file, with a transparent or white background; Fig. 7G); and bottom, to enable widescreen mode for a larger view of the model. Mouse over the icons to display their functions.

12.Assess whether the location of the residues of interest in the 3D model overlaps with a known functional feature.

Note

Through this approach, we see that both Arg799 and Arg918 are located close to each other in a folded structure, which we have identified above as the catalytic unit of AGTPBP1.

13.Download the PDB model for further analysis. The AlphaFold model can be downloaded in the PDB file format to your local computer (“Download” > “PDB file”) and then be visualized with more versatile structure viewers, such as the PDB Mol* viewer (see Alternate Protocol 1, step 5) or PyMOL (see Alternate Protocol 4).

Analysis

14.Assess the function of the wild-type residue in its 3D context.

Note

To understand the functional repercussions of a variant, it is necessary to first identify the role of the wild-type residue. Clicking on Arg799 in the Sequence Viewer zooms the structure model on this residue and shows all side chains in its vicinity (Fig. 7F). The ball-and-stick models use distinct colors for different atom types: nitrogen (blue), oxygen (red), sulfur (yellow), and carbon (gray). Dashed lines indicate hydrogen bonds (blue) and pi stacking interactions (green). Currently the AlphaFold Structure Viewer switches this color scheme to an pLDDT-only color scheme after a region has been selected in the PAE plot. Reloading the website reverts to the atom color view. Hovering the mouse over the residues reveals their identity. Clicking again on the highlighted residue will remove the ball-and-stick representation.

Note

In this representation, we can see that the side chain of Arg799 is mostly buried in the 3D fold, which is unusual for charged residues. Arg799 forms hydrogen bonds with the side chain of His847 and the backbones of Gln781 and Ile1118 (Fig. 7F). The backbone of Arg799 forms another hydrogen bond with Glu661. We can conclude that Arg799 plays an important role in stabilizing this structural part of the protein through hydrogen bonds. We can further infer that the mostly buried Arg799 is unlikely to be directly involved in ligand binding, catalysis, or PTMs.

Note

The side chain of Arg918 remains partly solvent accessible. It engages in hydrogen bonds with the side chain of Ser927 and the main chain of His920 and Pro921. Additional hydrogen bonds to Asn960 and Tyr1016 are formed through the backbone of Arg799. Thus, akin to Arg799, Arg918 plays an important role in stabilizing the catalytic unit. From our Preparation step, we further know that His920 is involved in coordinating a zinc ion, jointly with Glu923 and His1018, which are next to Arg918. Hence, Arg918 may be involved in stabilizing the binding site for a catalytic cofactor.

15.Assess the effect of the substitution on the 3D structure. The final step in analyzing the effect of a variant is to determine whether the substituting residue can maintain the function of the wild-type residue. Although the AlphaFold Structure Viewer does not allow substitution of the wild-type with the mutant residues in the display, the severity of a substitution can often be estimated by comparing the size and stereochemistry of the wild-type and mutant residues (see Table 1).

Note

In our example, replacing the large, polar, positively charged, and flexible Arg799 with a small, nonpolar cysteine would eliminate all H-bonds formed by the Arg799 side chain (although backbone hydrogen bonds may be preserved). Additionally, the smaller cysteine would create a large gap in the structural fold. These combined effects are predicted to severely affect the structural integrity of the catalytic domain, destabilizing the protein fold and indirectly hampering catalytic activity.

Note

Substituting Arg918 with a nonpolar tryptophan, which has a large, rigid aromatic side chain, is likely to result in steric clashes with surrounding residues, including the zinc-binding residues His920, Glu923, and His1017. This mutation would perturb the structural integrity near the active site and significantly impair the catalytic function of the zinc carboxypeptidase domain, while introducing structural instability.

Note

In conclusion, both variants are predicted to impair catalytic function and overall protein stability through slightly different molecular mechanisms.

Table 1. Physicochemical Properties of Amino Acids

Side chain	Amino acid			Size	Other
Negative	Aspartic acid	Asp	D	Medium large	Charged carboxylic acid group; often caps α-helices
Negative	Glutamic acid	Glu	E	Large, flexible	Charged moiety as in Asp, but longer carbon side chain
Positive	Arginine	Arg	R	Large, flexible	Charged guanidino group; can coordinate phosphate groups
	Lysine	Lys	K	Large, flexible	PTM of charged amine group is a major signal in epigenetics
	Histidine	His	H	Large	Aromatic imidazole group is partially protonated at physiological pH
Uncharged polar	Asparagine	Asn	N	Medium large	Like Asp, but with polar carboxamide
	Glutamine	Gln	Q	Large	Likes Glu, but with polar carboxamide
	Serine	Ser	S	Small	Small; can be phosphorylated
	Threonine	Thr	T	Medium-small	Like Ser but with additional hydrophobic moiety; can be phosphorylated
	Tyrosine	Tyr	Y	Large	Aromatic, with hydroxy moiety that can be phosphorylated or form H-bond
Nonpolar	Alanine	Ala	A	Small	Rigid and small
	Glycine	Gly	G	Tiny	No side chain; flexible; can form sharp turns in backbone
	Valine	Val	V	Medium-small	Larger than Ala, but smaller than Ile or Leu
	Leucine	Leu	L	Medium	Can often be replaced by Ile
	Isoleucine	Ile	I	Medium	Can often be replaced by Leu
	Proline	Pro	P	Small	Rigidifies backbone; breaks α-helices and β-strands
	Phenylalanine	Phe	F	Large	Aromatic; Tyr without hydroxy substituent
	Methionine	Met	M	Medium	Long, thin, and flexible
	Tryptophan	Trp	W	Large	Aromatic indole moiety that can also make a H-bond
	Cysteine	Cys	C	Small	Can form disulfide bonds

Note that all residues except Ile, Leu, and Phe can be subject to PTMs. Only charged residues can form ionic bonds (also called salt bridges); charged and uncharged polar residues can form hydrogen bonds (H-bonds). Of the nonpolar residues, only tryptophan can form an H-bond.

16.Next steps : This Basic Protocol, in conjunction with Table 1, provides a straightforward approach to evaluating the structural impact of variants at the molecular level. The AlphaFold Structure Viewer is also useful for creating figures for presentations and publications. However, precalculated AlphaFold models do not include ligands or cofactors. The AlphaFill server (https://alphafill.eu) attempts to automatically add ligands to precalculated AlphaFold structures based on experimental data. For example, the server correctly identifies the substrate-binding site for AGTPBP1 (Q9UPW5) using 25% identity but incorrectly suggests another ligand. Additionally, many proteins form multimers or are part of macromolecular complexes with other proteins or nucleic acids, which may be important for comprehensive variant analysis. Alternate Protocols 1, 2, and 3 provide guidance on accessing this information. Alternate Protocol 4 outlines simple steps for using PyMOL as a more versatile alternative to the online AlphaFold Structure Viewer for displaying and analyzing structures.

Alternate Protocol 1: USING EXPERIMENTAL MODELS FROM THE PDB

Precalculated AlphaFold models do not include cofactors or macromolecular binding partners and present only a single conformation, even though many proteins alternate between multiple structural states. If experimentally determined 3D structures of the affected protein are available, they may provide additional information. These experimental structures are freely accessible in the Protein Data Bank (PDB; https://rcsb.org). If no experimental structures have been reported for the gene region of interest, you can try using the SWISS-MODEL service described in Alternate Protocol 2.

As an example, we will evaluate the mutation Arg1748Cys in the gene SETD1B (Weerts et al., 2021).