Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN

Laurel F. Kinman, Barrett M. Powell, Ellen D. Zhong, Bonnie Berger, Joseph H. Davis

Published: 2022-11-13 DOI: 10.1038/s41596-022-00763-x

Extended

Extended Data Fig. 1 Assessing cryoDRGN input parsing.

Comparison of 10,000 cryoDRGN-parsed particles back-projected at D = 128 px (left) with the unsharpened map from cryoSPARC’s homogeneous refinement (right).

Extended Data Fig. 2 Assessing convergence of representative cryoDRGN density maps during network training.

a , Particle sets of interest A–J identified in epoch 49 by the ‘UMAP local maximum’ method are mapped to prior epochs’ UMAP embeddings. The on-data median latent value of each particle set is embedded into UMAP space and annotated for each epoch. Note that each annotated point maps to the same high-occupancy region of UMAP space following convergence. b , Corresponding volumes generated from each on-data median latent value at five epoch intervals as shown in a . Note that the volumes’ gross morphology stabilizes by epochs 14–19, though some additional details in maxima I and J require 24–29 epochs of training. c , FSC plots correlating each local maximum volume at epoch j and at epoch j -5 .

Extended Data Fig. 3 Visualizing particle filtering.

a , Representative particles filtered by ind_keep.star, selected for further training, and corresponding 2D classification using default cryoSPARC parameters. b , Representative particles filtered by ind_bad.star excluded from further training, and corresponding 2D classification using default cryoSPARC parameters. c , Three-way Venn diagram of ‘junk’ particles identified by one of the following methods: two classes from k = 6 Gaussian mixture model latent-space classification (red, 35,421 particles); nine classes from k = 20 k -means latent-space classification (green, 29,080 particles); or latent encoding magnitude ( z -norm) exceeding 0.5 standard deviations larger than the mean (blue, 30,879 particles). d , Corresponding CryoSPARC 2D-classification results using ‘junk’ particles identified through the GMM (top), k -means (middle) or z -norm (bottom) filtering approaches. e , f , UMAP embedding ( e ) or PCA projections of latent space ( f ) highlighting location of junk particles identified by GMM (red), k -means (green) or z -norm (blue) methods.

Extended Data Fig. 4 Training and assessing convergence of high-resolution training.

a , Representative plot of average total loss at each epoch. b , Median per-particle movement through latent space, characterized by vectors connecting each particle’s latent embedding in successive epochs. Resulting vector dot products (left), magnitude (center) and cosine distance (right) are shown. c , Identification of representative latent embeddings via the ‘UMAP local maxima method’. The UMAP embedding of epoch 99 is binned into a 2D histogram, smoothed, annotated with local maxima and overlaid with the maxima. The on-data median UMAP location of each maximum and its neighboring eight bins is shown. Label order corresponds to decreasing particle count in each local maximum. d , e , Map–map correlation ( d ) and FSC ( e ) at Nyquist frequency calculated between representative volumes generated as defined in c at five epoch intervals. Epochs for which the encoder network has not converged are noted with dotted lines.

Extended Data Fig. 5 Assessing convergence of representative cryoDRGN density maps during high-resolution training.

a , Particle sets A–J identified by the ‘UMAP local maximum’ method (Box 1 ) mapped to prior epochs as illustrated in Extended Data Fig. 2 . b , Corresponding volumes generated from labeled positions in a . Note that the volumes’ gross morphology stabilizes by epochs 19–29, though maximum I stabilizes as a 70S ribosome around epoch 39. c , FSC plots between volumes from each local maximum offset by five epochs of training, as in Extended Data Fig. 2 . The map-to-map FSC stabilizes by epoch 39.

Extended Data Fig. 6 Assessing results of high-resolution training.

a , The UMAP representation of the latent space resulting from 50 epochs of high-resolution training, colored by indicated imaging parameters. b , Angular and translational pose distributions. c , PCA of the latent space, colored by the 20 k -means cluster centers automatically generated by cryodrgn analyze. Numbered black dots indicate the locations in latent space of each k -means cluster center volume.

Extended Data Fig. 7 Sampled points from latent space used in subunit occupancy analysis.

UMAP representation of the latent space resulting from 50 epochs of high-resolution training, with contours colored with darker blues as particle density increases. Sampled points correspond to the centers of 500 k -means clusters and are indicated with white circles.

Extended Data Fig. 8 Confusion matrix of published class labels and classes assigned by subunit occupancy analysis.

k -Means 500 cluster center maps were assigned to 15 classes by subunit occupancy analysis. Particles within a given k -means 500 cluster are assigned to the same subunit occupancy class as the center map. Published particle labels were drawn from ref. 16 , and the fractional correspondence is plotted as a heatmap. Note that published classes A and F corresponded to 70S and 30S particles, respectively.

Extended Data Fig. 9 Graph traversal through latent space for the B→D1→D2→D3→D4→E3→E5 assembly pathway.

Centroid volumes from the subunit occupancy classes were aligned and compared with the assembly intermediate structures identified in ref. 16 to determine approximate equivalences between published classes and subunit occupancy classes. The volumes corresponding to intermediates B, D1, D2, D3, D4, E3 and E5 were provided to cryodrgn graph_traversal as anchor points; the resulting path through latent space is shown. Non-anchor points are indicated with white circles, whereas anchor points and their corresponding class ID are shown with colored circles. Volumes resulting from the complete graph traversal are shown in Supplementary Video 3 .

Extended Data Fig. 10 Selection of particles corresponding to the C4 minor class.

Particles (1,149) in the C4 class were identified by subunit occupancy analysis and are highlighted in orange.

Supplementary information

Supplementary Information

Supplementary Protocols 1–6 and Supplementary Tables 1 and 2.

Supplementary Video 1

PC1 trajectory from high resolution training . Density maps sampled along PC1 were automatically generated by the cryodrgn analyze command. Volumes are displayed at the same isosurface level, and generated from the 5 th to 95 th PC1 value along the PC1 axis.

Supplementary Video 2

PC2 trajectory from high-resolution training . Density maps sampled along PC2 were automatically generated by the cryodrgn analyze command. Volumes are displayed at the same isosurface level, and generated from the 5 th to 95 th PC2 value along the PC2 axis.

Supplementary Video 3

Graph traversal showing the B→D1→D2→D3→D4→E3→E5 assembly pathway . Graph traversal pathway was generated using the cryodrgn graph_traversal command as described in the protocol. The path taken by the traversal through latent space is shown in Extended Data Figure 9. All volumes are displayed at the same isosurface level.

Improved high-molecular-weight DNA extraction, nanopore sequencing and metagenomic assembly from the human gut microbiome

Multiplexed single-cell analysis of organoid signaling networks

查看全部

Sections

Figures

References

Extended
Supplementary information

Lyumkis, D. Challenges and opportunities in cryo-EM single-particle analysis. J. Biol. Chem. 294, 5181–5197 (2019).
Wu, M. & Lander, G. C. Present and emerging methodologies in cryo-EM single-particle analysis. Biophys. J. 119, 1281–1289 (2020).
Serna, M. Hands on methods for high resolution cryo-electron microscopy structures of heterogeneous macromolecular complexes. Front. Mol. Biosci. 6, 33 (2019).
Dashti, A. et al. Retrieving functional pathways of biomolecules from single-particle snapshots. Nat. Commun. 11, 4734 (2020).
Dashti, A. et al. Trajectories of the ribosome as a Brownian nanomachine. Proc. Natl Acad. Sci. USA 111, 17492–17497 (2014).
Haselbach, D. et al. Long-range allosteric regulation of the human 26S proteasome by 20S proteasome-targeting cancer drugs. Nat. Commun. 8, 15578 (2017).
Gui, M. et al. Structures of radial spokes and associated complexes important for ciliary motility. Nat. Struct. Mol. Biol. 28, 29–37 (2021).
Zhong, E., Bepler, T., Berger, B. & Davis, J. CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 8, 176–185 (2021).
Punjani, A. & Fleet, D. J. 3D variability analysis: resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. J. Struct. Biol. 213, 107702 (2021).
Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. eLife https://doi.org/10.7554/eLife.42166 (2018).
Grant, T., Rohou, A. & Grigorieff, N. cisTEM, user-friendly software for single-particle image processing. eLife https://doi.org/10.7554/eLife.35383 (2018).
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Nakane, T., Kimanius, D., Lindahl, E. & Scheres, S. H. Characterisation of molecular motions in cryo-EM single-particle data by multi-body refinement in RELION. eLife https://doi.org/10.7554/eLife.36861 (2018).
Kingma, D. & Welling, M. Auto-encoding variational Bayes. 2nd International Conference on Learning Representations (2013).
Zhong, E.D., Bepler, T., Davis, J.H. & Berger, B. Reconstructing continuous distributions of 3D protein structure from cryo-EM images. Eighth International Conference on Learning Representations (2020).
Davis, J. H. et al. Modular assembly of the bacterial large ribosomal subunit. Cell 167, 1610–1622 e1615 (2016).
Rabuck-Gibbons, J. N., Lyumkis, D. & Williamson, J. R. Quantitative mining of compositional heterogeneity in cryo-EM datasets of ribosome assembly intermediates. Structure https://doi.org/10.1016/j.str.2021.12.005 (2022).
von Loeffelholz, O. et al. Focused classification and refinement in high-resolution cryo-EM structural analysis of ribosome complexes. Curr. Opin. Struct. Biol. 46, 140–148 (2017).
Zhong, E.D., Lerer A., Davis J.H. & Berger B. CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images. IEEE/CVF International Conference on Computer Vision (2021).
Punjani, A. & Fleet, D. J. 3D flexible refinement: structure and motion of flexible proteins from cryo-em. Preprint at bioRxiv https://doi.org/10.1101/2021.04.22.440893 (2021).
Ludtke, S. & Chen, M. Deep learning based mixed-dimensional GMM for characterizing variability in CryoEM. Nat. Methods 18, 930–936 (2021).
Zhong, E. D., Lerer, A., Davis, J. H. & Berger, B. Exploring generative atomic models in cryo-EM reconstruction. Preprint at Arxiv https://arxiv.org/abs/2107.01331v1 (2021).
Rosenbaum, D. et al. Inferring a continuous distribution of atom coordinates from cryo-EM images using VAEs. Preprint at Arxiv https://arxiv.org/abs/2106.14108v1 (2021).
Sekne, Z., Ghanim, G. E., van Roon, A. M. & Nguyen, T. H. D. Structural basis of human telomerase recruitment by TPP1-POT1. Science 375, 1173–1176 (2022).
Chaaban, S. & Carter, A. P. Structure of dynein-dynactin on microtubules shows tandem recruitment of cargo adaptors. Preprint at bioRxiv https://doi.org/10.1101/2022.03.17.482250 (2022).
Schoppe, J. et al. Flexible open conformation of the AP-3 complex explains its role in cargo recruitment at the Golgi. J. Biol. Chem. 297, 101334 (2021).
Punjani, A., Zhang, H. & Fleet, D. J. Non-uniform refinement: adaptive regularization improves single-particle cryo-EM reconstruction. Nat. Methods 17, 1214–1221 (2020).
Zivanov, J., Nakane, T. & Scheres, S. H. W. A Bayesian approach to beam-induced motion correction in cryo-EM single-particle analysis. IUCrJ 6, 5–17 (2019).
Scheres, S. H. RELION: implementation of a Bayesian approach to cryo-EM structure determination. J. Struct. Biol. 180, 519–530 (2012).
Cheng, Y., Grigorieff, N., Penczek, P. A. & Walz, T. A primer to single-particle cryo-electron microscopy. Cell 161, 438–449 (2015).
McInnes, L., Healy, J., Saul, N. & Grossberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).
Davis, J. H. & Williamson, J. R. Structure and dynamics of bacterial ribosome biogenesis. Philos. Trans. Soc. B https://doi.org/10.1098/rstb.2016.0181 (2017).
Trabuco, L. G., Villa, E., Schreiner, E., Harrison, C. B. & Schulten, K. Molecular dynamics flexible fitting: a practical guide to combine cryo-electron microscopy and X-ray crystallography. Methods 49, 174–180 (2009).
Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27, 14–25 (2018).

Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN

Extended

Supplementary information

推荐阅读