Optical Genome Mapping for Applications in Repeat Expansion Disorders

Bart van der Sanden, Bart van der Sanden, Kornelia Neveling, Kornelia Neveling, Andy Wing Chun Pang, Andy Wing Chun Pang, Syukri Shukor, Syukri Shukor, Michael D. Gallagher, Michael D. Gallagher, Stephanie L. Burke, Stephanie L. Burke, Erik-Jan Kamsteeg, Erik-Jan Kamsteeg, Alex Hastie, Alex Hastie, Alexander Hoischen, Alexander Hoischen

Published: 2024-07-05 DOI: 10.1002/cpz1.1094

Abstract

Short tandem repeat (STR) expansions are associated with more than 60 genetic disorders. The size and stability of these expansions correlate with the severity and age of onset of the disease. Therefore, being able to accurately detect the absolute length of STRs is important. Current diagnostic assays include laborious lab experiments, including repeat-primed PCR and Southern blotting, that still cannot precisely determine the exact length of very long repeat expansions. Optical genome mapping (OGM) is a cost-effective and easy-to-use alternative to traditional cytogenetic techniques and allows the comprehensive detection of chromosomal aberrations and structural variants >500 bp in length, including insertions, deletions, duplications, inversions, translocations, and copy number variants. Here, we provide methodological guidance for preparing samples and performing OGM as well as running the analysis pipelines and using the specific repeat expansion workflows to determine the exact repeat length of repeat expansions expanded beyond 500 bp. Together these protocols provide all details needed to analyze the length and stability of any repeat expansion with an expected repeat size difference from the expected wild-type allele of >500 bp. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC.

Basic Protocol 1 : Genomic ultra-high-molecular-weight DNA isolation, labeling, and staining

Basic Protocol 2 : Data generation and genome mapping using the Bionano Saphyr® System

Basic Protocol 3 : Manual De Novo Assembly workflow

Basic Protocol 4 : Local guided assembly workflow

Basic Protocol 5 : EnFocus Fragile X workflow

Basic Protocol 6 : Molecule distance script workflow

INTRODUCTION

Tandem repeats (TRs) constitute a very common class of variation, with a typical human genome containing more than 1 million TR loci (Gymrek, 2017). It is even estimated that TRs account for the majority of structural variants (SVs) larger than 50 bp (Dolzhenko et al., 2024; English et al., 2023). A subgroup of the TRs are the short tandem repeats (STRs). These STRs typically consist of repeat motifs of 2-6 bp in size and are scattered throughout the entire human genome, not limited to coding sequence (Tankard et al., 2018). To date, the expansion of at least 60 STR loci has been associated with disease (Tanudisastro et al., 2024). For many repeat expansion loci, there is a strong correlation between the size of the repeat expansion and the age of onset and severity of the associated disease, with a larger repeat size causing a younger age of onset and more severe phenotype (Depienne & Mandel, 2021). Also, later generations present with a younger age of onset and a more severe phenotype than the previous generations, a phenomenon known as anticipation. In addition, repeat expansions are mitotically unstable, meaning that they carry a risk of DNA replication error during each cell division. This somatic instability of known disease-causing STR loci is associated with age of onset and severity of the disease but also with disease progression, and the extend of somatic instability is correlated with the expansion size and age (Depienne & Mandel, 2021; Morales et al., 2012; Overend et al., 2019). Therefore, determining the exact repeat size and assessing the possibility of somatic repeat instability is of great importance for determining disease prognosis and allowing proper family counseling.

Even though repeat expansion disorders are well known to constitute an important group of rare human diseases, TR expansions remain very difficult to detect, mainly because of their repetitive nature (Depienne & Mandel, 2021; Read et al., 2023). Current standard of care (SOC) repeat expansion detection methods are time consuming and costly because these must be performed on a per-gene level using Southern blotting, targeted PCR, and repeat-primed PCR (RP-PCR) methods. The phenotypic overlap between patients with repeat expansions in different genes makes it difficult to determine the most probable repeat expansion locus up front, which leads to extensive parallel testing (Tankard et al., 2018). Short-read exome and genome sequencing (SRS) efforts have contributed to improved genome-wide STR detection; however, these are limited by the read and total fragment lengths of SRS (Dolzhenko et al., 2017; van der Sanden et al., 2021). Long-read sequencing (LRS) can overcome this read-length limitation, but its high cost still hampers the introduction of this approach as a routine diagnostic test for repeat expansion disorders and other rare diseases (Kucuk et al., 2023). Targeted LRS may decrease the sequencing costs but will reintroduce the need for per-gene testing efforts. Overall, this means that all current repeat expansion detection methods still have their own limitations, with the inability to accurately detect the exact repeat length of (very) large repeats and assessing the somatic instability of repeat expansions being the most important diagnostic limitations. Optical genome mapping (OGM) is another long-read technology, which can be used as a cost-effective and easy-to-use alternative for structural variant (SV) detection (Barseghyan et al., 2022, 2023) and is also capable of detecting STR expansions and contractions (Facchini et al., 2023; Guruju et al., 2023; Iqbal et al., 2023). By scanning ultra-high-molecular-weight (UHMW) DNA that is labeled at CTTAAG sequence motifs and subsequently assembling genomes and mapping the assemblies against a reference genome, OGM can identify different types of SVs, including insertions, duplications, deletions, inversions, and translocations (Mantere et al., 2021; Neveling et al., 2021). Because of its independence of sequence context and its ultra-long molecules and genome-wide coverage, OGM can be used to interrogate even the most difficult regions of the genome, including repeat expansion and contraction loci (Facchini et al., 2023; Guruju et al., 2023; Iqbal et al., 2023). OGM therefore has great potential to become a go-to test for the detection of long repeat expansions, with no upper size limit expected and with the opportunity to generate high coverage.

Here, in Basic Protocol 1, we present the protocol for isolating UHMW DNA from blood and cell lines and preparing the DNA for OGM. In Basic Protocol 2, we provide the protocol for data generation and genome mapping using the Bionano Saphyr® system. Finally, Basic Protocol 3, 4, 5, and 6 explain how the output files of Basic Protocol 2, the .bnx molecule files, can be used in three different repeat expansion detection workflows: the Manual De Novo Assembly workflow, the local guided assembly (Local-GA) workflow or EnFocus Fragile X workflow, and the molecule distance script workflow.

STRATEGIC PLANNING

The manufacturer supports using peripheral blood, preserved cells, and cultured cells as sample types. Peripheral blood must be collected in EDTA tubes; samples in heparin tubes require immediate addition of DNA stabilizer (provided by the manufacturer) because heparin alone does not adequately stabilize samples for OGM applications. Samples in Vacutainer tubes can be stored at 4°C for up to 4 days. However, it is best to isolate samples as quickly as possible after collection or, alternatively, aliquot and freeze them at −80°C for future isolation. It is advisable to store at least two aliquots of each sample to ensure that a backup is available should the initial isolation attempt fail. Samples kept at −80°C remain stable for a minimum of 2 years, based on the longest duration tested, with an expectation of stability for several more years.

For all sample types, the initial steps of the DNA isolation process vary slightly based on the sample source, but then a uniform protocol follows once the desired ultra-high-molecular-weight (UHMW) DNA has been extracted (Fig. 1). This approach is outlined separately for each sample type, indicating that multiple samples may be processed together, depending on the technician's proficiency.

Overview of the six Basic Protocols presented in this manuscript.
Overview of the six Basic Protocols presented in this manuscript.

Basic Protocol 1: GENOMIC ULTRA-HIGH-MOLECULAR-WEIGHT DNA ISOLATION, LABELING, AND STAINING

In this first protocol, ultra-high-molecular-weight (UHMW) genomic DNA (gDNA) is first extracted from the desired sample. Using a Nanobind disk, the UHMW DNA is bound and separated from the mixture using a magnet. Following extraction, the sample undergoes quality control (QC) checks to verify that the DNA concentration is sufficient for further analysis. The next step involves labeling the DNA using DLE-1, which attaches the DL-Green fluorophore via covalent modification at each CTTAAG sequence motif. Next, the labeled DNA is stained using an intercalating dye that stains the phosphate backbone of the DNA molecule. Together, the labeling and staining make it possible to distinguish each molecule from the background (DNA stain) and to distinguish DLE-1 recognition sites (DL-Green). Subsequently, the labeled, stained DNA is prepared for visualization and mapping in the Bionano Saphyr® instrument as outlined in Basic Protocol 2. Additional QC checks are carried out on these prepared samples to assess their suitability for the upcoming loading and mapping procedures. When the protocols are executed correctly, the concentration of the final labeled DNA should be in the range of 4 to 12 ng/µl, with a coefficient of variation (CV) of <0.25 across the samples. Basic Protocol 1 describes the standard UHMW DNA extraction and subsequent DNA labeling and staining methods for OGM purposes as recommended by Bionano Genomics. Therefore, versions of Basic Protocol 1 (with slight variations depending on the cell types covered) have also been published in previous Current Protocols manuscripts that describe methods using Bionano OGM (Koppikar et al., 2023; Sahajpal et al., 2023). In addition, future Current Protocols manuscripts describing the Bionano OGM method will contain the same Basic Protocol 1 (Broeckel, 2024; Kanagal-Shamanna, 2024).

Materials

  • Sample: fresh or frozen peripheral blood (either stabilized with EDTA or stabilized with sodium heparin with immediate addition of Bionano Prep DNA Stabilizer) or aliquots of fresh or cryopreserved cells containing 1 million cells to 1.5 million cells

  • SP-G2 Blood & Cell Culture DNA Isolation Kit (Bionano Genomics, cat. no. 80060) containing 2.5× Wash Buffer 1 and 2 (WB1 and WB2) concentrates, Protein LoBind tubes, standard microcentrifuge tubes, Cell Buffer, RNase A, ultrapure nuclease-free water, RBC Lysis Buffer, Digestion Enhancer, DNA Stabilizer, Thermolabile Proteinase K, Lysis and Binding Buffer (LBB), Nanobind disks, disk retriever sheath, and Elution Buffer

  • TexQ disinfectant concentrate (VWR, cat. no. TWTX651)

  • Bleach

  • ≥99.5% isopropanol, molecular biology grade (ThermoFisher Scientific, cat. no. T036181000 or equivalent)

  • Qubit dsDNA Broad Range (BR) Assay Kit (ThermoFisher Scientific, cat. no. Q32853)

  • Qubit dsDNA High Sensitivity (HS) dsDNA Assay Kit (ThermoFisher Scientific, cat. no. Q32854)

  • Direct label and stain (DLS) kit (Bionano Genomics, cat. no. 80005) containing 20× DL-Green, 5× DLE-1 buffer, 10× DLE-1, ultrapure nuclease-free water, DLS 24-well plate, 13-mm DLS membranes, DLS plate-sealing strips, 1 M dithiothreitol (DTT), 4× Flow Buffer, DNA stain, and round-bottom amber DLS tubes

  • Ice bucket and ice

  • −20°C and −80°C freezers

  • 4°C refrigerator

  • Hemocytometer and phase-contrast microscope or automated cell counter

  • 10-, 20-, 200-, and 1000-µl pipets and nuclease-free, filtered pipet tips

  • Water bath, 37°C

  • Refrigerated centrifuge with swinging-bucket rotor for 15-ml conical tubes

  • Benchtop vortexer (VWR, cat. no. 10153-838)

  • Mini benchtop microcentrifuge (2200 × g spin; LabNet cat. no. C1301B)

  • Microcentrifuge adapters for 0.2-, 0.5-, and 1.5-ml tubes

  • HulaMixer™ sample mixer (Thermo Fisher, cat. no. 05-408-138)

  • 50- and 15-ml conical polypropylene centrifuge tubes (Fisher Scientific, cat. nos. 05-539-12 and EW-17701-11)

  • 2-ml nuclease-free microcentrifuge tubes (Fisher Scientific, cat. no. 30119487)

  • 200-µl wide-bore pipet tips, filtered, aerosol (e.g., USA Scientific)

  • 5- and 10-ml disposable pipets, sterile

  • Pointed forceps (e.g., Electron Microscopy Sciences, cat. no. 1011-8810)

  • Thermoblock

  • DynaMag-2 magnetic tube rack (Thermo Fisher, cat. no. 12321D)

  • Extra-long 1000-µl pipet tips, sterile (VWR, cat. no. 16466-008)

  • Bionano Prep SP Magnetic Retriever (Bionano Genomics, cat. no. 80031)

  • Qubit fluorometer (Invitrogen, cat. no. Q32866)

  • Qubit™ Assay Tubes* (Thermo Fisher, cat. no. Q33238)

  • Thin-wall PCR tube

  • Thermocycler with heated lid

  • For blood samples:

    • Vari-Mix Test Tube Rocker
    • Parafilm
    • HemoCue® WBC Analyzer (Fisher Scientific, cat. no. 22-601-017)
    • HemoCue® microcuvettes (Fisher Scientific, cat. no. 22-601-018)
    • Bionano Prep DNA stabilizer (Bionano Genomics, cat. no. 20398), optional

NOTE : Use nuclease-free, filtered pipet tips for all pipetting.

NOTE : Use bleach for blood disposal and TexQ disinfectant concentrate (VWR, cat. no. TWTX651) to treat potential biohazard waste material.

NOTE : Wash Buffers 1 and 2 are diluted in 100% ethanol (200-proof ethanol, molecular biology grade; Sigma-Aldrich, cat. no. E7023) per the manufacturer's instructions.

Day 1

Ultra-high-molecular-weight DNA isolation

For fresh peripheral blood, go to step 1a. For frozen peripheral blood, go to step 1b. For fresh cell culture, go to step 1c. For cryopreserved cells, go to step 1d.

Lysing and pelleting cells from fresh peripheral blood

1a. Label one 1.5-ml Protein LoBind tube and one 0.5-ml elution tube (both from the SP-G2 Blood & Cell Culture DNA Isolation Kit).

2a. Remove tube from 4°C refrigerator and either mix on Vari-Mix Test Tube Rocker for 15 min at room temperature or invert tube 15 times to ensure good uniformity.

3a. Immediately dispense 20 µl onto a piece of Parafilm and use HemoCue cuvette to measure WBCs.

4a. Record HemoCue reading.

5a. Perform the following calculations:

  1. Transfer Volume (µl) = 1500 ÷ HemoCue Reading
  2. RBC Lysis Solution (µl) = Transfer Volume × 3
  3. Removal Volume (µl) = (Transfer Volume – 40 µl) ÷ 2

The transfer volume is the volume containing 1.5 million cells, and the removal volume is the volume that will be discarded, leaving 40 µl for the WBC pellet.

6a. Pipet the transfer volume into the previously labeled 1.5-ml Protein LoBind tube.

7a. Add 3 volumes of RBC Lysis Solution. Cap tube, invert ten times to mix, and place on HulaMixer at 5 rpm for 10 min at room temperature.

8a. Spin the tube in a microcentrifuge for 2 min at 2200 × g , room temperature. Inspect the bottom of the tube for WBC pellet.

9a. Using a 1000-µl pipet, remove the original volume of RBC Lysis Solution.

Note
The remaining volume equals the original transfer volume. Be careful not to dislodge the WBC pellet.

10a. Set 200-µl pipet at the removal volume and discard the supernatant twice.

Note
This will result in 40 µl remaining in the 1.5-ml Protein LoBind tube. Be careful not to dislodge the WBC pellet.

Note
Continue to step 14.

Lysing and pelleting cells from frozen peripheral blood

1b. Label one 1.5-ml Protein LoBind tube and one 0.5-ml elution tube (both from the SP-G2 Blood & Cell Culture DNA Isolation Kit).

2b. Remove one 650-µl aliquot of frozen blood from −80°C freezer and thaw immediately in 37°C water bath for 2 min using a floating tube rack. After 2 min, remove aliquots from the water bath and keep at room temperature.

Note
Freeze-thaw results in RBC lysis.

3b. Invert tube ten times to ensure good uniformity, and then immediately dispense 20 µl onto a piece of Parafilm and use a HemoCue cuvette to measure WBCs.

4b. Record HemoCue reading.

5b. Perform the following calculations:

  1. Transfer Volume (µl) = 1500 ÷ HemoCue reading
  2. Removal Volume (µl) = (Transfer Volume – 40 µl) ÷ 2

6b. Pipet the transfer volume into the previously labeled 1.5-ml Protein LoBind tube.

7b. Spin the tube for 2 min at 2200 × g , room temperature. Inspect the bottom of the tube for WBC pellet.

8b. Set 200-µl pipet at the removal volume and discard the supernatant twice.

Note
This will result in 40 µl remaining in the 1.5-ml Protein LoBind tube. Be careful not to dislodge the WBC pellet.

Note
Continue to step 14.

Pelleting cells from fresh cell culture

1c. Prepare 1200 µl Stabilizing Buffer (SB) for each sample by mixing 1176 µl Cell Buffer with 24 µl DNA Stabilizer (both from the SP-G2 Blood & Cell Culture DNA Isolation Kit). Vortex briefly and place on ice.

2c. Prepare 48 µl Stabilizing Buffer/RNase A Cocktail Master Mix for each sample by mixing 36 µl Stabilizing Buffer with 12 µl RNase A (from kit). Vortex briefly and place on ice.

3c. Label one 1.5-ml Protein LoBind tube and one 0.5-ml elution tube for each sample (both from kit). Place 1.5-ml Protein LoBind tubes on ice.

4c. Resuspend and count viable cells from cultures.

5c. Calculate the volumes of original stock cell cultures containing 1.5 million viable cells.

6c. Transfer appropriate volumes into prechilled 1.5-ml Protein LoBind tubes (or into 15-ml conical tubes if necessary to concentrate).

7c. Spin tubes for 5 min at 500 × g , 4°C. Inspect the bottoms of the tubes for cell pellets.

8c. Discard supernatants and add 1 ml cold SB to each pellet.

9c. Using the 1000-µl pipet, resuspend pellets to wash in SB.

10c. Spin tubes for 2 min at 2200 × g , 4°C. Inspect the bottoms of the tubes for cell pellets.

11c. Aspirate entire supernatants and discard. Keep pellets on ice until all supernatants have been removed.

12c. Add 40 µl cold SB onto each pellet.

Note
This will result in 40 µl SB and a cell pellet in each 1.5-ml Protein LoBind tube.

Note
Continue to step 14.

Pelleting cells from cryopreserved cells

1d. Before thawing the cells, estimate the volume of each frozen sample needed to obtain 1.5 million live cells for downstream DNA isolation. If cell count is unknown, determine an approximate volume containing at least 1.5 million live cells.

2d. Using the volumes of frozen sample determined in step 1d, calculate the required volume of Stabilizing Buffer (SB) needed to top-off each sample to 1.5 ml:

  1. 1.5 ml total volume – [volume for DNA isolation] = [volume of Stabilizing Buffer to add].

3d. In addition to the SB volume calculated in step 2d, prepare another 1200 µl Stabilizing Buffer (SB) for each sample by mixing 1176 µl Cell Buffer with 24 µl DNA Stabilizer (both from the SP-G2 Blood & Cell Culture DNA Isolation Kit). Vortex briefly and place on ice.

4d. Prepare 48 µl Stabilizing Buffer/RNase A Cocktail Master Mix for each sample by mixing 36 µl Stabilizing Buffer with 12 µl RNase A. Vortex briefly and place on ice.

5d. Label one 1.5-ml Protein LoBind tube and one 0.5-ml elution tube for each sample (both from kit). Add volumes of SB calculated in step 2d to 1.5-ml Protein LoBind tubes and place on ice.

6d. Remove aliquots of cryopreserved cells from −80°C freezer and thaw immediately in 37°C water bath. As soon as an individual aliquot is fully thawed, remove it from the water bath and keep at room temperature.

7d. Resuspend cells with a 1000-µl pipet and transfer 1.5E+06 cell volumes calculated in step 1d into Protein LoBind tubes with SB.

8d. Invert cells to mix and spin tubes for 10 min at 500 × g , 4°C. Inspect the bottoms of the tubes for cell pellets.

9d. Discard supernatants and add 1 ml cold SB to each pellet.

10d. Using the 1000-µl pipet, resuspend pellets to wash in SB.

11d. Spin tubes for 2 min at 2200 × g , 4°C. Inspect the bottoms of the tubes for cell pellets.

12d. Aspirate entire supernatants and discard. Keep pellets on ice until all supernatants have been removed.

13d. Add 40 µl cold SB onto each pellet.

Note
This will result in 40 µl SB and a cell pellet in each 1.5-ml Protein LoBind tube.

Note
Continue to step 14.

Lysis and digestion

14.Set 200-µl pipet to 35 µl and pipet sample five times to mix. Visually inspect to ensure that the pellet is completed dislodged and homogenized in the ∼40 µl solution.

15.Prepare Lysis and Digestion Cocktail Master Mix (from kit) in a 2.0-ml microcentrifuge tube for a batch size of three or fewer samples or in a conical tube for a larger batch size. In preparing the master mix, follow the order of component addition listed in Table 1. Cap the tube, invert mix 15 times, and place the tube on a tube rack at room temperature.

Note
Do not vortex. Always keep the thermolabile proteinase K (TLPK) on ice.

Table 1. Lysis and Digestion Cocktail Master Mix Preparation
Reagent Volume per sample 20% excess master mix Number of samples Master mix total (µl) Order of addition
Digestion enhancer 270.0 µl × 1.2 × = µl 1
Nuclease-free water 66.25 µl × 1.2 × = µl 2
LBBa 80.0 µl × 1.2 × = µl 3
DE detergenta 3.75 µl × 1.2 × = µl 4
TLPKb 10.0 µl × 1.2 × = µl 5
Total lysis and digestion cocktail master mix volume 430.0 µl µl
  • a

    Pipet LBB and DE detergent slowly due to high viscosity and risk of bubble formation.

  • b

    Add just before use at step 11 in gDNA isolation.

16.Add 430 µl of Complete Lysis and Digestion Cocktail Master Mix to each sample. Cap the tube. Change tips between samples.

17.Rotate samples on HulaMixer for 15 min at 10 rpm, room temperature. Make sure the HulaMixer is set at “no” shaking/vibration.

18.During the rotation return the TLPK back to −20°C storage.

19.Pulse spin tube for 2 s to collect liquid at the bottom of the tube.

20.Incubate sample in a Thermoblock pre-set to 55°C for 10 min, with no shaking.

gDNA binding, washing, and elution

21.Using forceps, carefully transfer a single Nanobind disk (from the SP-G2 Blood & Cell Culture DNA Isolation Kit) to the lysate.

Note
Disks can sometimes stick together.

22.Add 480 µl of 100% isopropanol to all tubes. Cap and invert tubes five times to mix.

23.Rotate sample on HulaMixer for 15 min at 10 rpm, room temperature. The HulaMixer should be set to no shaking/vibration.

Note
Ensure that the Nanobind disk does not remain in the lid of the tube during initial rotations. If it does, turn off rotator and invert microcentrifuge tube until the Nanobind disk goes back into solution. Replace the tube on the HulaMixer and resume mixing.

24.Examine the tube to ensure that the gDNA is associated with the Nanobind disk.

Note
After Wash Buffer 1 is added, the DNA will precipitate and become visible. At this stage, the DNA is visible to the naked eye, given sufficient quantity and quality.

Note
If the DNA is not stuck to the disk, ensure while placing the tube in magnetic tube rack that the gDNA gets stuck to the Nanobind disk and is not floating in the solution.

25.Place sample tubes into Dynamag clear magnetic tube rack and visually inspect all tubes in rack to ensure that gDNA is tethered to the Nanobind disk.

26.If gDNA strands are visibly hanging low, quickly invert 180° to bring the gDNA into closer association with the Nanobind disk.

27.180° inversions can be repeated numerous times until the gDNA association with the Nanobind disk appears unchanged.

28.Combine the clear rack with the magnetic base as outlined below:

  1. Invert magnetic tube rack and place upside down with sample lids touching the work surface. The tubes will be on the same row of the rack, and in the row furthest from you.

  2. Invert Dynamag magnetic base and lower onto clear rack.

  3. Tilt combined apparatus slowly 90° toward you while it continues to rest on surface. The tubes will now be horizontal and visible to you.

  4. Tilt combined apparatus slowly 90° toward you while it continues to rest on surface, so that it stands fully upright and tubes are facing you.

  5. Make sure that the Nanobind disk is held to the magnet near the top of the liquid level. If not, re-rack.

29.Remove supernatant as outlined below, being careful not to aspirate the gDNA:

  1. Angle entire rack at a 45° angle by holding in one hand (grasping the entire apparatus from below with tubes visible to you and lids toward your other hand).

  2. Wait 2 s for gDNA to lay on the Nanobind disk.

  3. Slowly remove all liquid with a 1000 µl extra-long tip angled away from the Nanobind disk and/or gDNA to avoid disruption.

30.Perform wash using WB1 (from kit):

  1. Dispense 700 µl WB1 directly onto the disks in the tubes and cap tubes.

  2. Lift clear tube rack to separate from magnetic base.

  3. Invert clear rack with tubes 180° four times to wash.

  4. Re-rack clear tube rack and tubes with magnetic base as described in step 28.

  5. Remove supernatant.

31.Perform wash using WB2 (from kit); repeat this step twice:

  1. Dispense 500 µl WB2 directly onto the disks in the tubes and cap.

  2. Lift clear rack to separate from magnetic base.

  3. Invert clear rack 180° ten times to wash.

  4. Re-rack clear tube rack and tubes with magnetic base as described in step 28.

  5. Remove supernatant.

32.Open tube lid fully (parallel to lab bench) and lift each tube apart from base.

33.In close proximity to a new Protein LoBind tube, transfer Nanobind disk to the elution tube (0.5 ml) using Bionano Prep SP Magnetic Retriever. Cap tube to prevent disk drying.

34.Spin the elution tube in benchtop microcentrifuge for 5 s.

35.Remove all residual liquid at the bottom of the tube using a 10 µl standard tip.

Note
It is necessary to displace the Nanobind disk using the tip to reach the liquid at the bottom of the tube. Move tip around in small circular motion to remove all residual liquid from bottom of tube.

36.Add 65 µl of Elution Buffer (from kit) to the elution tube.

37.Spin the tube on benchtop microcentrifuge for 5 s.

38.Using a 10 µl standard tip, gently nudge Nanobind disk towards the bottom of the tube, making sure that it is fully submerged in liquid. The disk should remain parallel to the bench surface.

39.Incubate submerged Nanobind disk in Elution Buffer at room temperature for 20 min.

40.Collect extracted gDNA by transferring eluate to the labeled 2.0-ml microcentrifuge tube with a standard 200-µl tip.

41.Spin the tube with the Nanobind disk in benchtop microcentrifuge for 5 s and transfer all of the remaining eluate containing viscous gDNA to the same labeled 2.0-ml microcentrifuge tube as in the previous step using a standard 200-µl tip. You may remove the disk before aspirating the remaining elution buffer.

Note
Almost all of the viscous gDNA comes off the Nanobind disk during the spin.

42.Place the 2.0-ml microcentrifuge tube containing gDNA in rack of HulaMixer Sample Mixer and rotate for 1 hr at 10 rpm, room temperature.

43.Remove tube from rack of HulaMixer and pulse-spin tube on benchtop microcentrifuge for 2 s to bring the gDNA to the bottom of the tube. Allow the gDNA to equilibrate overnight at room temperature (25°C) to homogenize.

Day 2

DNA quantitation

44.Refer to the Qubit dsDNA BR Assay Kit user manual for kit details.

Note
If the gDNA has been stored at 4°C, equilibrate at room temperature before moving to the next step.

45.Mix by slowly pipetting the entire gDNA volume into a 200 µl wide bore tip and then slowly dispensing the volume. Avoid creating bubbles. Repeat this process at least five times.

Note
If gDNA uptake stalls due to high viscosity, it may be necessary to stir gently while slowly releasing the plunger to withdraw the gDNA.

46.Prepare Qubit dsDNA BR Assay Kit working solution by diluting the Dye Assay Reagent into BR Dilution Buffer (1:200) per kit instructions.

47.Each sample will be quantified in triplicate. Add Qubit working solution to 0.5-ml Qubit Assay Tubes as follows:

  1. For each unique sample, add 48 µl each of Qubit working solution to three separate Qubit Assay Tubes for three replicates.

  2. For the Qubit Standards, add 40 µl each of Qubit working solution to two separate Qubit Assay Tubes for two replicates.

48.Prepare triplicate quantifications for each sample by removing a 2-µl aliquot from the left side of each sample and dispensing it into the sample's corresponding Qubit Assay Tube 1 with working solution from step 47a, rinsing tip when dispensing. Repeat two more times by removing a 2-µl aliquot from the middle of the sample tube and placing it in Assay Tube 2 from step 47a and removing a 2-µl aliquot from the right side of the sample tube and placing in Assay Tube 3 from step 47a, rinsing tip when dispensing.

49.For the Qubit DNA standards, add 10 µl of standard 1 to Assay Tube 1 containing 40 µl of Qubit working solution from step 47b, and add 10 µl of standard 2 to Assay Tube 2 from step 47b. Place assay tubes in a floating rack and sonicate for 10 min.

Note
If a bath sonicator is not available, vortex for at least 10 s at maximum speed.

50.Once sonication/vortexing is complete, retrieve Assay Tubes and add 150 µl of working solution to each sonicated DNA aliquot and Qubit DNA Standard aliquot. Vortex for 5 s, incubate samples for at least 2 min, and read DNA concentration on the Qubit fluorometer.

51.The coefficient of variation (CV = standard deviation/mean) from three readings should be ≤0.30.

Note
If CV is >0.30, gently pipet-mix the entire volume of gDNA with five strokes (1 stroke = 1 up stroke + 1 down stroke) using a wide bore tip. Let the gDNA rest at least overnight at room temperature before repeating quantitation.

Note
Typical DNA concentrations range from 45 to 90 ng/µl.

Direct labeling and staining (DLS) of all sample types

52.Setup as described below:

  1. Thaw 20× DL-Green (from DLS kit). Vortex well, pulse spin, and hold on ice in 4°C aluminum block.

  2. Thaw 5× DLE-1 Buffer (from DLS kit). Vortex well, pulse spin. Hold at room temperature until use.

  3. Flick 20× DLE-1 Enzyme (from DLS kit) three times and pulse spin. Hold on bench in −20°C enzyme block.

  4.         Label one thin-wall PCR tube (0.5 ml) per sample and one amber-color microcentrifuge tube (from DLS kit; 1.5 or 2 ml). PCR tube (0.5 ml) per sample and one amber-color microcentrifuge tube (from DLS kit; 1.5 or 2 ml).

        The labeling reaction can be performed for 6-12 samples with ease.

53.In a thin-wall PCR tube, combine 750 ng gDNA and nuclease-free water to achieve a final volume of 19.5 µl. When pipetting, carefully draw viscous gDNA into a standard tip; it can take up to 30 s to fill the tip to the appropriate level. Releasing the plunger too quickly can produce a bubble in the tip, resulting in undersampling (start over if this occurs).

Note
If DNA concentration is between 39 and 150 ng/µl, use 750 ng gDNA as input. If concentration is ≥26 ng/µl and <39 ng/µl, use 500 ng as input.

Note
750 ng ÷ [gDNA concentration (ng/µl)] = µl of gDNA; or 500 ng ÷ [gDNA concentration (ng/µl)] = µl of gDNA

Note
19.5 µl – (µl of gDNA) = µl ultra-pure water

54.Prepare labeling master mix according to Table 2.

Note
After making the master mix, leave 5× DLE-1 Buffer at room temperature to use in step 61.

Table 2. Labeling Master Mix Preparation
Reagent Volume per sample 20% excess master mix Number of samples Master mix total (µl)
5× DLE-1 buffer 6.0 µl × 1.2 × = µl
20× DL-Green 1.5 µl × 1.2 × = µl
10× DLE-1 3.0 µl × 1.2 × = µl
Total labeling master mix volume 10.5 µl µl

55.Add 10.5 µl labeling master mix to the thin-wall PCR tube containing 19.5 µl gDNA. Be sure not to touch the gDNA while adding the master mix.

56.Pulse spin and using a standard pipet tip with pipet set to 28 µl, mix sample slowly up and down five times. Pulse-spin tube for 2 s. Protect from light.

57.Incubate in a thermocycler using a heated lid set at 47°C (or “On” if no temperature choice is available) as follows:

  1. 1 hr at 37°C

  2. Hold at 4°C until next step. Proceed quickly to the next step.

58.Before proceeding, pulse spin briefly if any condensation is visible on the tube wall.

DL-Green cleanup, pre-stain reaction, and membrane adsorption in microplate

59.Dispense 5 µl proteinase K directly into the central bulk of the sample contained in the thin-wall PCR tube. To avoid inadvertently removing DNA that may adhere to the tip, do not mix.

60.Incubate in a thermocycler using a heated lid set at 60°C (or “On” if no temperature choice is available) as follows:

  1. 30 min at 50°C.

  2. Hold at 4°C until next step. Proceed quickly to the next step.

61.For each sample, prepare 30 µl of 1× DLE-1 Buffer (6 µl 5× DLE-1 Buffer + 24 µl H2O). Mix by vortexing.

62.Label the microplate (supplied by the manufacturer), two wells per sample.

63.Dispense 25 µl of 1× DLE-1 Buffer into the center of one well of the DLS Microplate.

64.Use forceps to place a DLS membrane on top of buffer.

Note
Membranes may be wetted up to 10 min before sample application. If not proceeding right away, seal wells immediately with a DLS Plate Sealing Strip to prevent evaporation until ready to proceed.

65.Perform DL-Green cleanup by dispensing labeled DNA sample onto the center of the wetted membrane:

  1. Using a standard pipet tip, dispense entire volume (∼35 µl) of labeled DNA onto the middle of the DLS Membrane.

  2. Seal membrane wells with DLS plate-sealing strip. While holding the microplate, apply pressure to secure the sealing strip to the top rim of the wells to prevent evaporation.

  3. Protect the microplate from light (cover) and incubate at room temperature for 1 hr.

66.After 1 hr, hold the plate securely and carefully remove the sealing strip.

67.Using an unfiltered standard pipet tip, with the pipet set to 70 µl, aspirate the entire labeled sample while making contact perpendicularly with the membrane and moving the tip across the DNA area while aspirating to collect the DNA. Transfer into a new labeled thin-wall PCR tube. Pulse spin for 2 s. Protect tubes from light.

68.Using a 200-µl pipet, dispense 20 µl of the labeled sample from the thin-wall PCR or 0.5-ml amber tube into the DLS round-bottom amber tube (2 ml). If sample volume recovered is <20 µl, make up the difference with the appropriate amount of 1× DLE-1 Buffer before proceeding to next step.

DNA staining and homogenization

69.Bring 1 M DTT, 4× Flow Buffer, and DNA Stain to room temperature, vortex well, and pulse spin briefly. Keep at room temperature to avoid crystallization of the DMSO in the DNA Stain.

70.Prepare staining master mix according to Table 3.

Table 3. Staining Master Mix Preparation
Reagent Volume per sample 20% excess master mix Number of samples Master mix total (µl)
4× Flow Buffer 15.0 µl × 1.2 × = µl
10× DTT 6.0 µl × 1.2 × = µl
DNA stain 3.5 µl × 1.2 × = µl
Nuclease-free water 15.5 µl × 1.2 × = µl
Total staining master mix volume 40.0 µl µl

71.For each labeled DNA, add 40 µl staining master mix on top of the labeled sample (20 µl) contained in the 2-ml DLS round-bottom amber tube and pulse spin. Do not touch the DNA and do not mix.

Note
Master mix is dispensed on top of solution in order to avoid inadvertently drawing out DNA that may stick to the pipet tip.

72.Place amber DLS tubes containing samples into HulaMixer with speed set to 5 rpm. Mix for 1 hr at room temperature with all options other than rotation turned off.

73.After 30 min, remove sample from the HulaMixer. Pulse spin to collect contents.

74.Proceed to step 75 or store at 4°C protected from light.

Quantification of labeled and stained DNA

75.Using a wide-bore tip on a 200-µl pipet set to 50 µl, mix labeled and stained DNA five times and pulse-spin.

76.Quantify the labeled DNA as per the above DNA quantification instructions (steps 45-50) using the Qubit HS (high-sensitivity) dsDNA Assay Kit.

Note
Each sample will be quantified in replicate instead of triplicate.

Note
The labeled DNA concentration should ideally fall between 4-16 ng/µl with a CV (standard deviation ÷ mean) between samplings of <0.25. If both samplings are outside the range of 4-16 ng/µl, see Troubleshooting section. If one sample is between 4 and 16 ng/µl and the other is outside this range, see below:

  1. If one sampling is between 4 and 16 ng/µl and the other is >12 ng/µl, proceed to load chip.
  2. If one sampling is between 4 and 16 ng/µl and the other is <4 ng/µl, repeat HulaMixer mixing for 30 min and repeat the quantitation.

Basic Protocol 2: DATA GENERATION AND GENOME MAPPING USING BIONANO SAPHYR® SYSTEM

In this second protocol, a Saphyr® Chip consumable is loaded with the extracted and quantified UHMW gDNA and subsequently put into the Bionano Saphyr® instrument for data generation. The data generation run can be customized with multiple configurations related to the number of chips, total run time, and data collection target. The metrics of the run can be continuously visualized. The Saphyr® instrument captures images of labeled DNA molecules and converts the images into molecule data files (.bnx files). Once the chip run is complete, the completed molecule data files are automatically imported into the Bionano Access web application. These molecule files can then be used to perform various bioinformatics operations in the next protocol.

Materials

  • Labeled and stained UHMW DNA gDNA from Basic Protocol 1

  • Saphyr® Chip G3.3 (Bionano Genomics, cat. no. 20440)

  • Bionano Access Software (Bionano Genomics)

  • Saphyr® System with Bionano Access Server (Bionano Genomics, cat. no. 90023)

  • 200-, 20-, and 10-µl pipets (General Lab Supplier)

  • Saphyr® G3.3 Clip (Bionano Genomics)

Chip set-up and queue

1.Log into Bionano Access software.

2.Select “Chips” to open chip list.

3.Click “Add Chip.”

4.Give the chip a unique name.

5.Select the appropriate part number: “20440 - 3 flowcells.”

6.Add any optional information into the designated entry bars.

7.Click “Next.”

8.Enter the molecule set information: Throughput Target 800 Gbp, Molecule Job Name (optional), Sample name (from the drop-down menu, or create a new sample name), Label Green 01, Enzyme DLE-1, Reference hg38_DLE1_0 kb_0labels_masked_YPARs.cmap or hg19_DLE1_0 kb_0labels_masked_YPARs.cmap, and isolation and labeling kit lot numbers (optional).

Note
Recommended throughput for blood, cells and cell culture is 800 Gbp. An increase in data collection can be used for instances where greater sensitivity may be required. If applicable and desired, you may select an Auto pipeline submission to launch once the flowcell has completed: i.e., select Auto De Novo Analysis.

9.Click “add to flowcell 1”

10.Repeat steps 8 and 9 for flowcells 2 and 3 if required.

Note
Each Saphyr® Chip has three different flow cells, which makes it possible to run three different samples on each chip

11.Confirm flowcell information is correct and select “Save Chip.”

Chip and instrument loading

12.Allow the labeled sample and Saphyr® Chip to equilibrate to room temperature for 30 min before opening.

Note
Opening the pouch with the chip before equilibration can lead to a decrease in performance.

13.Use the power switch to turn on the Saphyr® instrument.

14.Log into Windows.

Note
Username and password are provided by a Bionano representative.

15.Initialize the instrument by launching the Instrument Control Software using the icon on the desktop.

16.Log on to the instrument using the username and password.

17.Open the pouch containing the Saphyr® Chip and place on a clean laboratory bench.

18.With a 10-µl pipet, gently stir the sample.

19.Slowly aspirate 8.5 µl of sample and slowly dispense into the inlet well of flowcell 1.

Note
Pipet slowly. Use positive pressure to avoid bubble formation. The fluid level must remain flat. If the fluid overflows, the flowcell is no longer usable.

20.Allow liquid to travel through the flowcell nanochannels for 2 min.

21.Repeat for flowcells 2 and 3 if necessary.

22.Aspirate 11 µl of sample and dispense into the outlet well of flowcell 1.Repeat for flowcells 2 and 3 if necessary.

Note
Pipet slowly. Use positive pressure to avoid bubble formation. The fluid level must remain flat. If the fluid overflows, the flowcell is no longer usable.

23.Add 2 µl of nuclease-free water to the hydration reservoir on the outlet side of the chip.

24.Align the chip plug with the electrode facing down over flowcell 1 and insert it into the chip. Repeat for flowcells 2 and 3.

25.Place Saphyr® Clip seal on top of the chip plugs and press down firmly on the edges.

26.Align the Saphyr® Clip onto the Saphyr® Chip and attach by applying even pressure downward on both the right and left side of the clip.

27.Click “Insert Chip” from the home screen on the instrument.

Note
Before inserting the chip, check the bottom for dust and debris.

28.Lift the sample door and press the “Press” button to release bundle arm.

29.Insert one or two chips.

Note
One Saphyr® system can hold two chips at a time. Samples within a chip are run simultaneously, and chips within an instrument are run sequentially. In other words, the Saphyr can image three samples at a time, and when those three samples are completed, it will automatically move on to the next set of three samples in the queue.

Note
Keep note of which chip goes into which platform. Platform 1 is on the left and platform 2 is on the right.

30.Lower bundle arm and close sample door.

31.Click “Chip is inserted.”

Data collection

32.Click “Configure run” from the home screen. You will be prompted to configure each chip inserted into the machine.

Note
The chip inserted into platform 1 will be configured first.

33.Select the corresponding unique chip name from the drop-down list.

Note
Review the runtime and throughput parameters to confirm the experiment has been set up properly.

34.Repeat steps 1 and 2 for the second chip inserted if necessary.

35.Click “Accept” to start the run.

Note
If two chips were inserted choose which chip to run first by clicking “Start Run” on the corresponding chip. Saphyr® goes through chip registration, DNA loading optimization and scanning processes. If applicable, when the first chip is finished the second chip will automatically start.

36.Check the Molecule Quality Report on Bionano Access after run completion to confirm quality for data transfer.

Note
Upon run completion, the Molecule Quality Report is generated. Please see the Quality Control Metrics and Troubleshooting section for the Analytical QC parameters.

Basic Protocol 3: MANUAL DE NOVO ASSEMBLY WORKFLOW

In the third protocol, we demonstrate a workflow to assess the known repeat expansion loci in ATXN10 (Morato Torres et al., 2022), C9orf72 (Barseghyan et al., 2022), CNBP (van der Sanden et al., 2024), DMPK (Otero et al., 2021), FMR1 (Iqbal et al., 2023), FXN (Yu et al., 2024), NOP56 (Lam et al., 2023), RFC1 (Facchini et al., 2023), and STARD7 (van der Sanden et al., 2024). These were selected because OGM has previously proven to be able to detect expansions of these repeat loci. The repeat expansion assessment described in this protocol uses the commercially available Bionano De Novo Assembly pipeline in Bionano Access (Fig. 2). This protocol is commercially available and is conducted in Bionano Access.

Diagram of three workflows using commercial (blue) and non-commercial (green) software to analyze repeat expansion genes: De Novo Assembly pipeline, local guided assembly or EnFocus Fragile X pipeline, and molecule distance script. The plus and minus symbols represent the results of the initial manual investigation for the presence of repeat expansion disorders and whether the repeat expansion is related to the FMR1 locus.
Diagram of three workflows using commercial (blue) and non-commercial (green) software to analyze repeat expansion genes: De Novo Assembly pipeline, local guided assembly or EnFocus Fragile X pipeline, and molecule distance script. The plus and minus symbols represent the results of the initial manual investigation for the presence of repeat expansion disorders and whether the repeat expansion is related to the FMR1 locus.

Materials

  • Bionano Access Software (Bionano Genomics)

Whole-genome analysis (de novo assembly) manual repeat expansion characterization

1.Generate Solve 3.7 De Novo Assembly with full coverage and hg38 DLE1 reference.

2.View De Novo Assembly results in Bionano Access Genome Browser. Navigate to the locus of interest. Use the .bed file search feature or paste the coordinates from Table 4 below directly into Genome Browser tab.

Table 4. Genomic Coordinates of Associated Loci for Manual De Novo Assembly Workflow
Gene CHR Start label ID End label ID Start (BP) End (BP) Size (BP)
ATXN10 chr22 5045 5048 45,794,058 45,799,413 5355
C9ORF72 chr9 6238 6239 27,570,140 27,573,946 3806
CNBP chr3 26,243 26,246 129,169,450 129,181,839 12,389
DMPK chr19 5926 5927 45,752,584 45,771,947 19,363
FMR1 chrX 29,071 29,074 147,910,189 147,927,661 17,472
FXN chr9 10,548 10,549 69,031,102 69,050,405 19,303
NOP56 chr20 460 465 2,639,624 2,664,964 25,340
RFC1 chr4 7723 7724 39,343,732 39,350,590 6858
STARD7 chr2 18,524 18,527 96,183,966 96,200,747 16,781
  • Reference hg38 start and end label IDs, with corresponding base pair coordinates, and normal genomic distance between label IDs for repeat expansion genes. These label IDs were chosen for De Novo Workflow input. Size (bp) refers to the genomic distance between reference label interval without indication of repeat expansion gene disorder.

3.Ensure labels surrounding the label interval of interest are well aligned and useful for calculating expansion size. Label intervals that are too wide decrease sensitivity in molecule distance estimation. Additionally, adjacent labels must be well spaced from each other to minimize optical errors from collapsed labels, as two or more labels in proximity can be read as one extra-bright signal.

4.On the reference, hover cursor over the interval's start and end labels. Record the base pair positions of both reference labels. Reference distance is the base pair distance between start and end reference label coordinates.

5.Align maps to the reference start label of interval of interest. Hover cursor on the reference start label and press “L.”

6.Hover cursor over the interval start and end labels all maps visualized. Record the start and end base pair positions for each map. Maps’ label distances are base pair distance between start and end maps’ label coordinates (Map 1 and Map 2).

Note
Take note of strand direction, or negative values will be calculated.

7.Repeat expansion size is calculated as the base pair difference between reference distance and maps’ label distance.

8.Repeat units are counted by dividing the repeat expansion size by number of base pairs per repeat.

9.To examine maps’ molecule support, change the “Molecules” option in the Genome Browser from “Pack” to “Label Distance.”

10.Right click on one of the maps and select “Show Molecules.”

11.Align molecule labels to the map's start label of interval of interest. Hover cursor on the map's start label and press “L.”

12.Sort molecules by insert size. Select labels of interest by pressing the Space bar while hovering over start label, then repeat with the end label.

13.Assign alleles to maps. For example, a longer map, presumably containing a repeat expansion, will be assigned to allele 1, while the shorter map similar in length to reference will be assigned to allele 2.

Basic Protocol 4: LOCAL GUIDED ASSEMBLY WORKFLOW

In this fourth protocol, we analyze the same known repeat expansion loci as in Basic Protocol 3, but applying the local guided assembly workflow to size the repeat expansions (Fig. 2). This workflow was developed as part of the study by van der Sanden et al. (2024). This protocol is not commercially available and is conducted in the command line.

NOTE : The Conda environment and Solve have to be installed in the Linux operating system. Documentation for installing Solve on the command line can be found in 30182 Bionano Solve Installation Guide and in the GitHub repository. The latest version of Solve can be obtained from Bionano Software Downloads.

The Local-GA script (local_guided_assembly.sh script) must be run on command line as it utilizes multiple Solve pipelines. The script first aligns molecules to reference, and then the local assembly portion builds maps at the loci of interest. More information on the specific pipelines run can be found in Bionano document 30205, Guidelines for Running Solve on the Command Line, and 30194, How to Align a BNX to Reference.

Materials

Local guided assembly (Local-GA) on various repeat expansion genes

1.Download bnx.gz file from Bionano Access.

2.Run Solve 3.7 Local-GA on command line using the local_guided_assembly.sh script from the seed files. The script takes two arguments: (1) path to .bnx file and (2) repeat expansion gene of interest. For example: “sh ∼/scripts/local_guided_assembly.sh ∼/*.gz ∼.”

Note
The script references user-specified path to Solve installation. Assemblies reference hg38 reference .cmap Repeat gene target will reference the .csv table of repeat expansion gene coordinates matching gene argument of interest, which can be found in the molecule distance repository under guided_enfocus_scripts/coo_csvs.

3.Output files can be found in the output/repeat_report/ directory. An analysis summary can also be found, which lists the ID and calculated repeat expansion counts for each map reported on Bionano Access. This table will be used for assigning alleles to maps.

4.A more comprehensive output can be found in the file output/repeat_report/repeat_analysis.zip:

1.This .zip file can be uploaded into Bionano Access as an EnFocus Fragile X object. Once imported, results can be viewed by clicking on “View Maps” or “Maps to Reference with SV.”

In Bionano Access, despite the .zip file being imported as a Fragile X pipeline object, the Genome Browser shows only regions queried by gene-specific repeat coordinates .csv (e.g., chr 3 for CNBP). Local guided assembly for Fragile X (FMR1) is a part of the commercial package. See EnFocus Fragile X Analysis section below.

2.The .zip file also contains molecule support for Local-GA maps, which can be used to assign alleles to map. Unzip the file and navigate to the following directory: /output/contigs/exp_refineFinal1/alignmol/merge

These .xmap and .cmap files contain information on molecule support for each Local-GA map. Molecule ID information (CmapID) in query cmap (q.cmap) files can be traced back to the guided assembly, allowing mapping of alleles.

5.In the Genome Browser, visualize maps assembled at the region of interest.

6.Click on the magnifying glass icon to show all maps.

7.Align maps to the reference start label of interval of interest. Hover cursor on the reference start label and press “L.”

8.Assess stability of maps and molecules in Access for the presence of mosaicism. Click on the magnifying glass icon to “Show All Maps.” Zoom in/out, and click “Ctrl” and scroll on maps, to visualize all maps aligning to the repeat expansion region.

Note
The presence of repeat expansion regions with multiple aligned maps and with various insertion lengths might indicate unstable expansion of variable lengths.

9.Right-click on specific map and select “Show Molecules” to visualize all molecules covering the specific repeat expansion locus and align molecule labels to the map's start label of the interval of interest by hovering cursor on the map's start label and press “L.”

Note
If the molecules present a high variability in inter-label distances, this might indicate repeat instability.

10.Cross-compare with de novo results to assign alleles to maps. Maps will have varying label distances between repeat expansion label intervals. In general, a longer map, presumably containing repeat expansion, will be assigned to allele 1, while the shorter map similar in length to reference will be assigned to allele 2.

Note
A repeat analysis report table could contain maps with short/zero, as well as large repeat expansion counts. Maps are assigned to alleles based on repeat counts. For example, maps with large repeat counts will be assigned to allele 1, while maps with short repeat counts will be assigned to allele 2. There are also cases where all maps contain similar repeat counts, in which case it might be a homozygous allele. For repeat report maps with ambiguous repeat counts, the global mean of repeat counts is used as a cutoff value for assigning alleles 1 or 2. Maps with “-1” repeat counts are excluded since the repeat counts are unknown. Maps viewable in Access “View Maps” but are not reported in repeat analysis summary table did not pass repeat analysis during local assembly. However, these maps might reveal molecule support for mosaicism.

Basic Protocol 5: EnFocus FRAGILE X WORKFLOW

In this fifth protocol, we demonstrate a workflow to assess repeat expansions in FMR1 , causing Fragile X syndrome. The FMR1 repeat expansion assessment described in this protocol uses the commercially available Bionano EnFocus Fragile X pipeline in Bionano Access (Fig. 2). This protocol is commercially available and is conducted in Bionano Access.

NOTE : More information on the Fragile X pipeline can be found in Bionano document 30457, Bionano Solve Theory of Operation Bionano EnFocus Fragile X Analysis.

Materials

  • Bionano Access Software (Bionano Genomics)

EnFocus Fragile X assembly analysis

1.Generate Solve 3.7 EnFocus Fragile X assembly with full coverage and hg38 DLE1 reference.

2.Select the alignment file on Bionano Access and click “View Results” in the Options section on the right-hand side of the screen.

Note
The user will automatically be taken to the FMR1 region on chromosome X. This view will have the region already analyzed for repeat number.

3.Confirm that the alignment of the generated maps to the reference sequence is true and accurate.

Note
User can also confirm molecule support from this view by right clicking on the blue maps and clicking “Show Molecules.”

4.View the calculated repeat number in the table at the bottom of the screen under the “Repeat” tab.

5.Assess stability of maps and molecules on Access for presence of mosaicism. Click on the magnifying glass icon to “Show All Maps.” Zoom in/out, and click “Ctrl” and scroll on maps, to visualize all maps aligning to repeat expansion region.

Note
Repeat expansion regions with multiple maps aligned maps and with various insertion lengths to it might indicate unstable expansion of variable lengths.

6.Right-click on the specific map and select “Show Molecules” to visualize all molecules covering the Fragile X repeat expansion locus, and align molecule labels to the label of the map to the left of the interval of interest by hovering cursor on the label and pressing “L.”

Note
If the molecules present a high variability in inter-label distances, this might indicate repeat instability.

Basic Protocol 6: MOLECULE DISTANCE SCRIPT WORKFLOW

In this sixth protocol, we use the molecule distance script on the same known repeat expansion loci as in Basic Protocols 3 and 4.This workflow was also developed as part of the study by van der Sanden et al. (2024) and is used to visualize the distance between two labels of interest in each of the molecules in order to identify evidence suggestive of somatic repeat instability (Fig. 2). This protocol is not commercially available and is conducted in the command line.

Materials

Molecule distance script on various repeat expansion genes

Map files (.xmap, q.cmap, and r.cmap) from local guided assembly were used as input for the molecule distance script—specifically, the alignment of molecules to reference from the alignmolvref step.

1.Navigate to the guided assembly output directory:

GA_local_output/contigs/alignmolvref/merge

Note
These are alignmolvref .xmap and .cmap files, which are needed as input for the molecule distance script.

2.The molecule distance script is a downstream analysis tool written in R. The script takes six arguments as input parameters:

  1. Alignmolvrefreference file (r.cmap)

  2. Alignmolvrefalignment map file (.xmap)

  3. Alignmolvrefquery file (q.cmap)

  4. Start Label ID

  5. End Label ID

  6.         Output directory

        q.cmapcontains molecules that map to labels in reference (GRCh38)r.cmap. Alignment between q.cmap and r.cmap are annotated in.xmapfiles. More information on the relationship between.xmapand.cmapfiles can be found in Bionano documents 30040, XMAP File Format Specification Sheet, and 30039, CMAP File Format Specification Sheet.

3.Start and end labels are determined to flank repeat expansion genes to accurately capture genomic expansion size (Table 5). Adjacent labels must be well spaced from each other to minimize optical resolution errors from collapsed labels, as two or more labels in proximity can be read as one extra-bright signal.

Table 5. Genomic Coordinates of Associated Loci for Molecule Distance Script
Gene CHR Start label ID End label ID Start (BP) End (BP) Size (BP)
ATXN10 chr22 5045 5048 45,794,058 45,799,413 5355
C9ORF72 chr9 6238 6239 27,570,140 27,573,946 3806
CNBP chr3 26,242 26,247 129,168,220 129,186,501 18,281
DMPK chr19 5925 5930 45,740,976 45,829,452 88,476
FMR1 chrX 29,071 29,074 147,910,189 147,927,661 17,472
FXN chr9 10,548 10,549 69,031,102 69,050,405 19,303
NOP56 chr20 460 466 2,639,624 2,682,950 43,326
RFC1 chr4 7722 7725 39,339,156 39,362,887 23,731
STARD7 chr2 18,524 18,530 96,183,966 96,205,283 21,317
  • Reference hg38 start and end label IDs, with corresponding base pair coordinates, and normal genomic distance between label IDs for repeat expansion loci. These label IDs were chosen for the molecule distance workflow input and intended to be more encompassing of the region of interest. Size (bp) column refers to the genomic distance between reference label interval without indication of repeat expansion gene disorder.

4.The script reads in query molecules (the q.cmap file), reference molecules (the r.cmap file), and alignment of maps (the .xmap) file. The computational assignment is: Given a set of chromosome labels in the r.cmap file, find all molecules aligning to the label set and record each genomic distance between label sets.

5.The input .xmap contains the column “Alignments,” with reference and query molecules alignments expressed as tuple strings. Each tuple has a reference label ID and query label ID. Alignment tuples containing reference chromosome, reference start label ID, and reference end label IDs are parsed. These filtered alignments contain information on the aligned query molecules—query molecule ID, query start label ID, and query end label ID.

6.Then, the input q.cmap file is filtered using the query molecule information. The script aggregates row of the q.cmap file with CMapID and SiteIDs matching the aligned query molecule ID and query label IDs, respectively.

7.Data in the Position column are also recorded, as these denote the corresponding base pair coordinates. For example: In a .q.cmap, 160 rows belong to a query CMapID. Out of the 160, two rows were found, each with a SiteID matching the aligned query molecule label start or label end.

8.Each molecule aligned to both the user-specified reference labels is then aggregated into a table. Molecules not aligned to both labels are excluded.

9.After molecules aligning to user-specified reference interval have been filtered and compiled, genomic distances are calculated using the base pair difference between start and end coordinates.

10.Besides genomic distances, repeat expansion (or contraction) sizes are also calculated by subtracting the reference genomic distance from query genomic distances (map(s) label distance – GRCh38 label distance = expansion size).

11.One of the main output files is a table of filtered molecules (complete_data.csv) containing both specified reference labels of interest. Relevant information in this complete data table included query molecule IDs, query interval label IDs (label and base pair coordinates), distances between query labels, distance between reference labels, and expansion size.

12.This table is used as input for visualizations: e.g., histogram plots, violin plots, and Gaussian mixture model (GMM) auto-clustering plots.

  1. Molecule distance bar plots visualize the distance between specified label intervals.

  2.         Auto-clustering uses the Mclust R package to determine clusters for molecule distances using a maximum likelihood method. The clustering bell curves were then overlaid on a histogram of molecule distance frequency, showing distribution of molecule distances and the alleles it was classified as.

        This workflow is written in R. More information on workflow scripts and resources can be found in this repository.

COMMENTARY

Background Information

Optical genome mapping (OGM) was first described in 1993 as a method to provide restriction maps of Saccharomyces cerevisiae chromosomes (Schwartz et al., 1993). Since then, the technology has continually developed, resulting in the Irys system in a nanochannel-based automatable technology that was for a long time mainly used for genome scaffolding of plants and animal genomes and to validate NGS-based sequence assemblies (Luo et al., 2016). Current updated applications of OGM now combine microfluidics, high-resolution microscopy, and automated image analysis, allowing genome-wide imaging and de novo assembly (Barseghyan et al., 2023; Bocklandt et al., 2019; Chan et al., 2018; Mantere et al., 2021). Since the launch of the Saphyr® system, with the potential to extract ultra-high-molecular-weight DNA directly from human tissues such as blood, one of the main uses of the OGM technology has been the detection of large copy number variants (CNVs) and structural variants (SVs) in human genomes. A coverage-based CNV-detecting algorithm thereby allows the detection of large, unbalanced aberrations such as aneuploidies or large terminal gains or losses. A distinct SV algorithm instead makes use of split-read-like molecule analysis, comparing the fluorescent labels and label distances between the de novo assembly of a sample to a reference genome map. This allows for genome-wide detection of SVs, including insertions, duplications, deletions, inversions, and (balanced and unbalanced) translocations, with a resolution down to ∼500 bp (Mantere et al., 2021). To date, OGM has mainly been used to complement classical cytogenetic tests, including karyotyping, fluorescence in situ hybridization (FISH), and CNV microarrays, because each of these classical cytogenetic tests has limitations and OGM has proved to be able to overcome these (Neveling et al., 2021). In addition, it has been applied to identify SVs that previously remained undetected by other genomic technologies (Brakta et al., 2023; Broeckel et al., 2024; Fadaie et al., 2021; Iqbal et al., 2023; Sabatella et al., 2021; Sahajpal et al., 2021; Soler et al., 2023).

The diagnostic identification of repeat expansions also suffers from technical limitations. Current routine testing includes PCR, repeat-primed PCR, and Southern blotting, which are time consuming and gene specific (Tankard et al., 2018). Short-read sequencing is limited by its 100- to 150-bp read length and short total fragment length, which makes detecting large repeat expansions nearly impossible. Long-read sequencing allows to interrogate large repeat expansions, but the costs are still too high or coverage may be limited to implement it as a first-tier test for suspected repeat expansion samples (Chaisson et al., 2023; Kucuk et al., 2023; Owusu & Savarese, 2023). Most of these limitations can be overcome by OGM. It has been proven that OGM can accurately detect SVs larger than 500 bp (Mantere et al., 2021; Neveling et al., 2021), and it has been demonstrated that it can identify specific repeat expansions and contractions, either directly or indirectly (Barseghyan et al., 2022; Facchini et al., 2023; Guruju et al., 2023; Iqbal et al., 2023). Since OGM allows genome-wide de novo assemblies, the entire genome can be analyzed simultaneously rather than the analysis being limited to one specific repeat locus. In addition, OGM uses native, non-amplified DNA, which allows the characterization of “true” representation of DNA molecules within a sample. Finally, OGM is the first method that can visualize and semi-quantitatively assess somatically instable repeat samples. However, OGM cannot confirm the nucleotide context of the repeat, and hence doe not enable detection of the actual repeat motif or any interruptions of one or multiple motifs. This also means that an insertion in the predicted label interval must be assumed to be due to the repeat expansion. Also, the resolution of OGM is 500 bp, which limits the technology in characterizing shorter repeat expansions.

Critical Parameters

Maintaining high data quality in OGM technology relies on acquiring ultra-high molecular weight (UHMW) DNA from the intended samples with specific handling and storage requirements. Under optimal conditions, peripheral blood should be collected using EDTA tubes and stored at 4°C for a maximum of 4 days. For later analysis, aliquots of 650 µl should be preserved in screw-cap vials at −80°C (Koppikar et al., 2023; Sahajpal et al., 2023).

When isolating UHMW DNA from cultured cells, it is recommended that this process be performed starting from fresh cultures containing at least 1 million cells. If storage is necessary, cells can be stored either as dry pellets (1.5 million cells) or mixed with 40 µl of stabilizing buffer at −80°C. If stored as dry pellets, the stabilizing buffer should be added immediately atop the pellet before it thaws (Koppikar et al., 2023; Sahajpal et al., 2023).

Quality Control Metrics and Troubleshooting

All quality control metrics and troubleshooting details are presented in Tables 6 and 7.

Table 6. Pre-Analytical QC Parameters
Parameters Recommended Potential reasons for deviation and possible corrections
Sample type
Blood 1.5 M cells Low sample volume: If the sample was properly handled and stored, and the cell count is ≥0.4 M, DNA isolation should be performed with care ensuring the DNA is visible during precipitation and attaches to the Nanobind disk during washing. The elution volume should be reduced to ∼40-50 µl (making sure the disc is submerged in the elution buffer for 2 hr before proceeding to the next step).
Inappropriate sample storage and/or handling: If a new sample draw is not possible, maximum amount of sample should be used as starting material and following the above recommendations.
Cell culture 1.5 M cells Fresh cells harvested at inadequate confluency: It is recommended that cells be harvested after counting for a suspension culture. A small amount of medium can be taken out for counting from suspension cultures and should only be harvested if a minimum of 1 M viable cells are available. If >30% dead cells are present, centrifugation at a low speed spin (300 × g) for 10 min, 4°C, should be performed. The supernatant (which contains a high percentage of dead cells) should be discarded using a pipet, with care taken to avoid aspirating the pellet at the bottom (enriched with viable cells).
Frozen cells improperly handled and/or stored: Count the cells and calculate the number of viable cells. Perform a low speed spin as described above to target at least 1 M viable cells (minimum 0.4 M cells). Perform DNA isolation with care and elute in ∼40-50 µl elution buffer as described above.
DNA quantification
DNA concentration 39-150 µg/µl Low sample input: See above for each sample type. The DNA can be eluted in less volume of elution buffer.
DNA not precipitated during isolation: In certain cases, DNA might not sufficiently precipitate during the precipitation step (isopropanol step). Ensure that the Nanobind disk remains in solution during the 15-min mixing with isopropanol on the HulaMixer. For challenging samples for which DNA is not visible after the 15-min mixing, the sample can be mixed for an additional 15 min on the HulaMixer.
DNA mass lost during washing: Ensure that the Nanobind disk remains in solution during the precipitation step. During the washing steps with Wash Buffers A and B, ensure that the DNA remain bound to the Nanobind disk. If the DNA falls off the disk, ensure that the DNA sits on top of the disk and is not discarded during pipetting.
DNA not homogenized: If DNA is visible during quantification and the eluate is not viscous, leave the vial for another day to let DNA be homogenized. The sample can be mounted on a HulaMixer for 1 hr at 10 rpm before leaving for a day for homogenization.
DNA viscosity Low sample input: DNA should be eluted in a smaller volume of elution buffer. Most often, a higher concentration DNA is more viscous (e.g., DNA eluted in 50 µl of elution buffer will be more viscous than when eluted in 65 µl buffer).
Compromised sample: Elute the sample in smaller volume of buffer.
Labeled DNA quantification
DNA concentration 4-16 µg/µl Low labeled DNA recovered during clean-up: Assuming that the un-labeled DNA quantification was within range, the most likely step where DNA can be lost is during the DL-Green clean-up. It is recommended that during this process, the pipet should be set to 50-75 µl and DNA should be aspirated at once in a rotating motion at the strip to ensure all DNA is aspirated.
  • Table adapted from Koppikar et al. (2023) and Sahajpal et al. (2023).
Table 7. Analytical QC Parameters
Parameters Recommended Potential reasons for deviation and possible corrections
N50 (≥150 kbp) 230 kbp Poor sample quality: See Table 6 for troubleshooting a poor-quality sample.

Processing steps that can potentially lead to low N50 values:

i. Basic Protocol Basic Protocol 1, step 39: The manufacturer protocol specifies a 20-min incubation of the Nanobind disk attached to the DNA in Elution Buffer, followed by transfer of the eluate to a fresh 2-ml fresh microcentrifuge tube, controlled shearing (homogenization) of the eluate using a 200-µl pipet (five times for slow mixing of the DNA), and finally placement of the tube on the HulaMixer. However, for laboratories generally seeing low N50 values or for samples that result in low N50 values, the 20-min incubation with elution buffer should be extended to 2 hr, and the eluate should be transferred to a 2-ml microcentrifuge tube and the tube directly placed in the HulaMixer (omitting the mixing step with the 200-µl pipet).

ii. Basic Protocol 1, Materials: The sample input can be increased to increase the number of viable cells, which results in a greater number of larger DNA molecules. Care must be taken not to exceed >3 million cells, as this may cause improper cleaning during wash steps; an additional wash with wash buffer B can be included.

iii. Basic Protocol 1, step 36: The DNA can be eluted in a smaller amount of elution buffer, as more concentrated DNA may be better protected from sheering in the subsequent pipetting steps.

Map rate ≥70% Improper washing: Ensure that the DNA is properly washed during the washing steps. The DNA should be inspected visually in each washing step and should be compared to DNA from other samples being processed in the same batch (helps to identify if a specific DNA is not being properly washed). If required, an additional wash with the wash buffer B should be performed.
Insufficient enzyme: For a sample that has appropriate N50 but was below the minimum thresholds for map rate and labeling density, an extra 0.5 µl of enzyme can be added.
Lack of labeling: The labeling reaction incubation for 1 hr at 37°C can be extended to 2-4 hr.
Average label density 14-17/100 kbp Map rate and labeling density parameters are usually correlated (directly proportional; thus, the three points above are also relevant to samples with an average label density below the minimum threshold.
High average label density and low map rate: This combination generally indicate a flow cell failure. For a sample that has >220 kbp N50, the experiment should be repeated on another flow cell.
Effective coverage ≥75× Protease treatment of labeled DNA: For a sample or sample type that is not yielding the intended data, the 30 min proteinase K digestion step at 50°C should be extended to 1 hr.
Controlled shearing of labeled DNA before loading: If a sample has failed and the metrics observed had too high N50 values (>350 kb), controlled shearing of the labeled DNA can be performed before loading the sample on the flow cell. The labeled DNA should be slowly pipetted up and down two to five times using a standard 200-µl pipet.
  • Table adapted from Koppikar et al. (2023) and Sahajpal et al. (2023).

Understanding Results

De Novo Assembly pipeline

Using the De Novo Assembly pipeline, assembled genome maps constructed from long molecules are directly aligned with a reference human genome assembly. Structural variations (SVs) such as insertions, duplications, deletions, as well as balanced and unbalanced events like inversions and translocations, are identified based on discrepancies in label alignment between the sample and reference assembly. Additionally, a coverage-based algorithm facilitates the detection of copy number variations (CNVs) and aneuploidies.

The De Novo Assembly pipeline is a tool for whole-genome assembly and analysis. This is the tool traditionally used in repeat expansion analysis protocols, and it is described here in depth. A newly released version of the Bionano Access Software (1.8) is accompanied by whole-genome guided assembly capabilities, providing an alternative whole-genome assembly and analysis tool. The following interpretation remains relevant regardless of which whole-genome assembly tool is used.

In the Bionano Access software, assembly, SV, and CNV data are visualized in Circos plots (Fig. 3) that offer a comprehensive overview of all identified SVs within the sample. In the context of repeat expansion testing, SVs in known repeat expansion loci are of most interest. A repeat expansion locus or loci of interest can be located through a search of the genomic coordinates or gene name in a relevant .bed file (Fig. 4). Figure 5 shows examples of both short and long repeat expansions within the CNBP gene (shown in the teal shade), which were identified using the gene name search using a .bed file containing all hg38 genes. In the De Novo Assembly workflow, the size of the repeat expansion can be determined by using the distance between two specific labels within the gene (see Table 4 for the specific labels) and determining the distance difference between the assembly maps and the reference. Confidence and support in the generated maps can be visualized by investigating the individual molecule support for the OGM map (Fig. 6).

Circos plot visualization in Bionano Access software.
Circos plot visualization in Bionano Access software.
Genome Browser view in which a region of interest can be searched by genomic coordinates (orange) or gene name by clicking the binocular icon (red) and searching for the gene name (blue) within the selected .bed file (green).
Genome Browser view in which a region of interest can be searched by genomic coordinates (orange) or gene name by clicking the binocular icon (red) and searching for the gene name (blue) within the selected .bed file (green).
De Novo Assembly maps (blue) aligned to the reference (green) in the CNBP gene region (orange). Top, short repeat expansion; bottom, long repeat expansion.
De Novo Assembly maps (blue) aligned to the reference (green) in the CNBP gene region (orange). Top, short repeat expansion; bottom, long repeat expansion.
De Novo Assembly map with CNBP repeat expansion and supporting molecules sorted by label distance. Top, short repeat expansion; bottom, long repeat expansion.
De Novo Assembly map with CNBP repeat expansion and supporting molecules sorted by label distance. Top, short repeat expansion; bottom, long repeat expansion.

Local guided assembly and EnFocus Fragile X

After Local-GA or EnFocus Fragile X analysis is run, a repeat analysis summary is generated (Table 8). This table is compared to the de novo results to assign alleles to maps. For example, in Figure 7, the shorter map will be assigned to allele 1, while the longer map will be assigned to allele 2.

Table 8. Example Data Output for Guided Assembly Maps Along CNBP Gene
Map ID Query contig length Repeat spanning coverage Repeat count Allele
32 10504975 54 8 1
9421 11073852 28 723 2
Local-GA assembly maps along RFC1 gene denoting the reference (top) and the maps associated with allele 1 (middle) and allele 2 (bottom).
Local-GA assembly maps along RFC1 gene denoting the reference (top) and the maps associated with allele 1 (middle) and allele 2 (bottom).

Subsequently, the molecules assigned to each specific Local-GA or EnFocus pipeline map can be visualized and used to find evidence of somatic instability (Fig. 8). This analysis requires the assessment of the variability in inter-label distance between the start label of the interval of interest and the subsequent labels in the map. A high inter-label distance variability indicates potential somatic instability for that specific repeat expansion.

Local-GA assembly maps and corresponding molecules along an RFC1 and CNBP repeat indicating the difference for an expansion with and without evidence of somatic instability. Top, an RFC1 expansion without evidence of somatic instability because there is limited variability in the inter-label distance between the start label of the interval of interest (red label) and the subsequent label (black label) between the different molecules. Bottom, a CNBP expansion with evidence of somatic instability because of the high variability in inter-label distance between the start label of the interval of interest (red label) and the subsequent label (black label) between the different molecules. The scale is the same for both panels.
Local-GA assembly maps and corresponding molecules along an RFC1 and CNBP repeat indicating the difference for an expansion with and without evidence of somatic instability. Top, an RFC1 expansion without evidence of somatic instability because there is limited variability in the inter-label distance between the start label of the interval of interest (red label) and the subsequent label (black label) between the different molecules. Bottom, a CNBP expansion with evidence of somatic instability because of the high variability in inter-label distance between the start label of the interval of interest (red label) and the subsequent label (black label) between the different molecules. The scale is the same for both panels.

Molecule distance script

When data are processed using the molecule distance script, the major output file is the filtered molecules table (Table 9). This table contains the distances between the specified reference labels of interest for the repeat expansion locus being investigated. The relevant information in the table has been highlighted in the example (Table 9). This table is also used as input for visualizations (e.g., bar plots, histogram plots, violin plots, and Gaussian mixture model (GMM) auto-clustering plots). Molecule distance bar plots are used to visualize distance between specified label intervals (Fig. 9A, C, and E). Then, auto-clustering can be used to determine clusters of molecules with similar distances using a maximum-likelihood method. Overlaying bell curves and histogram of molecule distance frequency show the distribution of molecule distances and the classified alleles (Fig. 9B, D, and F). These two types of histograms also indicate potential somatic repeat instability. For samples without somatic instability, the bar plot shows minimal size differences within each allele (Fig. 9A and C). Also, the GMM shows two separate normal distributions (Fig. 9B and D). In contrast, for a sample with potential somatic instability one or both alleles show a “stairway” pattern (Fig. 9E) and the GMM shows a flatter distribution for the somatically unstable repeat allele since the molecules all have different distances between the two labels of interest (Fig. 9F).

Table 9. Example Output Data of Molecule Distance Script Over CNBP Repeat
Molecule number Query contig ID Query label x Query position x Query label y Query position y Query distance Reference distance Repeat size (bp) Repeat count
1 1455 46 378048 49 395738 17691 18281 −591 −148
2 136273 5 38545 2 20473 18072 18281 −209 −52
3 139627 16 99752 19 118511 18759 18281 478 119
4 154996 20 149117 16 129857 19260 18281 979 245
5 219679 14 94152 17 111830 17678 18281 −603 −151
6 397510 27 178169 30 197185 19016 18281 735 184
7 403100 18 126579 15 108017 18562 18281 281 70
8 471655 34 261341 31 242066 19275 18281 994 249
9 539627 22 161224 19 143968 17256 18281 −1025 −256
10 773682 34 218347 38 236611 18264 18281 −17 −4
  • Only the first ten molecules are presented in the table. Values presented in the table have been rounded to the nearest whole number for aesthetic and practicality purposes. Tables generated by the molecule distance script may present numbers with decimal values.
The molecule distances between CNBP labels 26242 and 26247. Shown are results for a sample with a short repeat expansion without somatic instability (A and B); a sample with a long repeat expansion without somatic instability (C and D); and a sample with a long repeat expansion with somatic instability (E and F). Molecule distance bar plots are used to visualize distance between specified label intervals (left) and a histogram of molecule distance frequency with automatic allele assignment (right).
The molecule distances between CNBP labels 26242 and 26247. Shown are results for a sample with a short repeat expansion without somatic instability (A and B); a sample with a long repeat expansion without somatic instability (C and D); and a sample with a long repeat expansion with somatic instability (E and F). Molecule distance bar plots are used to visualize distance between specified label intervals (left) and a histogram of molecule distance frequency with automatic allele assignment (right).

The number of repeats can be determined using the De Novo Assembly workflow, the local guided assembly or EnFocus Fragile X workflow, and the molecule distance script workflow. The three workflows will result in slightly different repeat lengths, and these values can subsequently be used at the discretion of the investigator to make interpretations or draw conclusions based on supporting evidence and information provided by clinical guidelines and published literature.

Time Consideration

The total time required for sample preparation is <48 hr, with data ready for analysis within 72-96 hr.

Acknowledgments

We would like to acknowledge colleagues from the diagnostic division of the Radboudumc (Genome Diagnostics Nijmegen) as well as the Radboud Genomics Technology Center for their support. In particular, we would like to thank Maartje Pennings, Eveline Kamping and Ronald van Beek for their technical support. We would also like to thank Joyce Lee and Jillian Burke for their technical support.

We thank Dr. Mark Corbett for providing data for one sample to also optimize our protocol for STARD7 repeat expansions.

Dr. Hoischen was supported by the Solve-RD project. The Solve-RD project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 779257. This research was part of the Netherlands X-omics Initiative and partially funded by NWO (Dutch Research Council, 184.034.019).

Author Contributions

Bart van der Sanden : Formal analysis; investigation; visualization; writing—original draft; writing—review and editing. Kornelia Neveling : Formal analysis; investigation; writing—original draft; writing—review and editing. Andy Wing Chun Pang : Data curation; formal analysis; investigation; methodology; software; supervision; validation; writing—original draft; writing—review and editing. Syukri Shukor : Data curation; formal analysis; investigation; methodology; software; validation; writing—original draft; writing—review and editing. Michael D. Gallagher : Data curation; formal analysis; investigation; methodology; software; validation. Stephanie L. Burke : Methodology; project administration; visualization; writing—original draft; writing—review and editing. Erik-Jan Kamsteeg : Resources; writing—review and editing. Alex Hastie : Conceptualization; funding acquisition; methodology; project administration; resources; software; supervision; writing—review and editing. Alexander Hoischen : Conceptualization; funding acquisition; project administration; resources; supervision; writing—review and editing.

Conflict of Interest

AWCP, SS, MDG, SLB, and AHa are employees and shareholders of Bionano Genomics, a company commercializing an optical genome mapping technology. The remaining authors declare that they have no competing interests.

Open Research

Data Availability Statement

All relevant data have been provided.

Local Guided Assembly script and accompanying files are available at:

https://github.com/bionanogenomics/local_guided_assembly/

Molecule distance script is available at:

https://github.com/bionanogenomics/molecule_distance/

Literature Cited

  • Barseghyan, H., Pang, A. W. C., Clifford, B., Serrano, M. A., Chaubey, A., & Hastie, A. R. (2023). Comparative Benchmarking of optical genome mapping and chromosomal microarray reveals high technological concordance in CNV identification and additional structural variant refinement. Genes (Basel) , 14(10), 1868. https://doi.org/10.3390/genes14101868
  • Barseghyan, H., Pang, A. W. C., Zhang, Y., Sahajpal, N. S., Delpu, Y., Lai, C.-Y. J., Lee, J., Tessereau, C., Oldakowski, M., Kolhe, R. B., Houlden, H., Nagy, P. L., Bossler, A. D., Chaubey, A., & Hastie, A. R. (2022). Neurogenetic variant analysis by optical genome mapping for structural variation detection-balanced genomic rearrangements, copy number variants, and repeat expansions/contractions. In C. Proukakis (Ed.), Genomic structural variants in nervous system disorders (pp. 155–172). Springer. https://doi.org/10.1007/978-1-0716-2357-2_9
  • Bocklandt, S., Hastie, A., & Cao, H. (2019). Bionano genome mapping: High-throughput, ultra-long molecule genome analysis system for precision genome assembly and haploid-resolved structural variation discovery. Advances in Experimental Medicine and Biology , 1129, 97–118. https://doi.org/10.1007/978-981-13-6037-4_7
  • Brakta, S., Hawkins, Z. A., Sahajpal, N., Seman, N., Kira, D., Chorich, L. P., Kim, H. G., Xu, H., Phillips, J. A., 3rd., Kolhe, R., & Layman, L. C. (2023). Rare structural variants, aneuploidies, and mosaicism in individuals with Mullerian aplasia detected by optical genome mapping. Human Genetics , 142(4), 483–494. https://doi.org/10.1007/s00439-023-02522-8
  • Broeckel, U. (2024). Optical genome mapping for constitutional disorder applications. Current Protocols (in preparation).
  • Broeckel, U., Iqbal, M. A., Levy, B., Sahajpal, N., Nagy, P. L., Scharer, G., Rodriguez, V., Bossler, A., Stence, A., Skinner, C., Skinner, S. A., Kolhe, R., & Stevenson, R. (2024). Detection of constitutional structural variants by optical genome mapping: A multisite study of postnatal samples. The Journal of Molecular Diagnostics , 26(3), 213–226. https://doi.org/10.1016/j.jmoldx.2023.12.003
  • Chaisson, M. J. P., Sulovari, A., Valdmanis, P. N., Miller, D. E., & Eichler, E. E. (2023). Advances in the discovery and analyses of human tandem repeats. Emerging Topics in Life Sciences , 7(3), 361–381. https://doi.org/10.1042/etls20230074
  • Chan, S., Lam, E., Saghbini, M., Bocklandt, S., Hastie, A., Cao, H., Holmlin, E., & Borodkin, M. (2018). Structural variation detection and analysis using Bionano optical mapping. Methods in Molecular Biology , 1833, 193–203. https://doi.org/10.1007/978-1-4939-8666-8_16
  • Depienne, C., & Mandel, J. L. (2021). 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges? American Journal of Human Genetics , 108(5), 764–785. https://doi.org/10.1016/j.ajhg.2021.03.011
  • Dolzhenko, E., English, A., Dashnow, H., De Sena Brandine, G., Mokveld, T., Rowell, W. J., Karniski, C., Kronenberg, Z., Danzi, M. C., Cheung, W. A., Bi, C., Farrow, E., Wenger, A., Chua, K. P., Martínez-Cerdeño, V., Bartley, T. D., Jin, P., Nelson, D. L., Zuchner, S., … Eberle, M. A. (2024). Characterization and visualization of tandem repeats at genome scale. Nature Biotechnology , https://doi.org/10.1038/s41587-023-02057-3
  • Dolzhenko, E., van Vugt, J., Shaw, R. J., Bekritsky, M. A., van Blitterswijk, M., Narzisi, G., Ajay, S. S., Rajan, V., Lajoie, B. R., Johnson, N. H., Kingsbury, Z., Humphray, S. J., Schellevis, R. D., Brands, W. J., Baker, M., Rademakers, R., Kooyman, M., Tazelaar, G. H. P., van Es, M. A., … Eberle, M. A. (2017). Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Research , 27(11), 1895–1903. https://doi.org/10.1101/gr.225672.117
  • English, A., Dolzhenko, E., Jam, H. Z., Mckenzie, S., Olson, N. D., Coster, W. D., Park, J., Gu, B., Wagner, J., Eberle, M. A., Gymrek, M., Chaisson, M. J. P., Zook, J. M., & Sedlazeck, F. J. (2023). Benchmarking of small and large variants across tandem repeats. BioRxiv , https://doi.org/10.1101/2023.10.29.564632
  • Facchini, S., Dominik, N., Manini, A., Efthymiou, S., Currò, R., Rugginini, B., Vegezzi, E., Quartesan, I., Perrone, B., Kutty, S. K., Galassi Deforie, V., Schnekenberg, R. P., Abati, E., Pichiecchio, A., Valente, E. M., Tassorelli, C., Reilly, M. M., Houlden, H., Bugiardini, E., & Cortese, A. (2023). Optical genome mapping enables detection and accurate sizing of RFC1 repeat expansions. Biomolecules , 13(10), 1546. https://doi.org/10.3390/biom13101546
  • Fadaie, Z., Neveling, K., Mantere, T., Derks, R., Haer-Wigman, L., den Ouden, A., Kwint, M., O'Gorman, L., Valkenburg, D., Hoyng, C. B., Gilissen, C., Vissers, L., Nelen, M., Cremers, F. P. M., Hoischen, A., & Roosing, S. (2021). Long-read technologies identify a hidden inverted duplication in a family with choroideremia. HGG Advances , 2(4), 100046. https://doi.org/10.1016/j.xhgg.2021.100046
  • Guruju, N. M., Jump, V., Lemmers, R., Van Der Maarel, S., Liu, R., Nallamilli, B. R., Shenoy, S., Chaubey, A., Koppikar, P., Rose, R., Khadilkar, S., & Hegde, M. (2023). Molecular diagnosis of facioscapulohumeral muscular dystrophy in patients clinically suspected of FSHD using optical genome mapping. Neurology Genetics , 9(6), e200107. https://doi.org/10.1212/nxg.0000000000200107
  • Gymrek, M. (2017). A genomic view of short tandem repeats. Current Opinion in Genetics & Development, 44, 9–16. https://doi.org/10.1016/j.gde.2017.01.012
  • Iqbal, M. A., Broeckel, U., Levy, B., Skinner, S., Sahajpal, N. S., Rodriguez, V., Stence, A., Awayda, K., Scharer, G., Skinner, C., Stevenson, R., Bossler, A., Nagy, P. L., & Kolhe, R. (2023). Multisite assessment of optical genome mapping for analysis of structural variants in constitutional postnatal cases. The Journal of Molecular Diagnostics , 25(3), 175–188. https://doi.org/10.1016/j.jmoldx.2022.12.005
  • Kanagal-Shamanna, R. (2024). Optical genome mapping for cell line and gene editing quality control applications (Working Title). Current Protocols (In preparation).
  • Koppikar, P., Shenoy, S., Guruju, N., & Hegde, M. (2023). Testing for facioscapulohumeral muscular dystrophy with optical genome mapping. Current Protocols , 3(1), e629. https://doi.org/10.1002/cpz1.629
  • Kucuk, E., van der Sanden, B., O'Gorman, L., Kwint, M., Derks, R., Wenger, A. M., Lambert, C., Chakraborty, S., Baybayan, P., Rowell, W. J., Brunner, H. G., Vissers, L., Hoischen, A., & Gilissen, C. (2023). Comprehensive de novo mutation discovery with HiFi long-read sequencing. Genome Medicine , 15(1), 34. https://doi.org/10.1186/s13073-023-01183-6
  • Lam, T., Rocca, C., Ibanez, K., Dalmia, A., Tallman, S., Hadjivassiliou, M., Hensiek, A., Nemeth, A., Facchini, S., Consortium, G. E. R., Wood, N., Cortese, A., Houlden, H., & Tucci, A. (2023). Repeat expansions in NOP56 are a cause of spinocerebellar ataxia Type 36 in the British population. Brain Communications , 5(5), fcad244. https://doi.org/10.1093/braincomms/fcad244
  • Luo, M. C., Deal, K. R., Murray, A., Zhu, T., Hastie, A. R., Stedman, W., Sadowski, H., & Saghbini, M. (2016). Optical nano-mapping and analysis of plant genomes. Methods in Molecular Biology , 1429, 103–117. https://doi.org/10.1007/978-1-4939-3622-9_9
  • Mantere, T., Neveling, K., Pebrel-Richard, C., Benoist, M., van der Zande, G., Kater-Baats, E., Baatout, I., van Beek, R., Yammine, T., Oorsprong, M., Hsoumi, F., Olde-Weghuis, D., Majdali, W., Vermeulen, S., Pauper, M., Lebbar, A., Stevens-Kroef, M., Sanlaville, D., Dupont, J. M., … El Khattabi, L. (2021). Optical genome mapping enables constitutional chromosomal aberration detection. American Journal of Human Genetics , 108(8), 1409–1422. https://doi.org/10.1016/j.ajhg.2021.05.012
  • Morales, F., Couto, J. M., Higham, C. F., Hogg, G., Cuenca, P., Braida, C., Wilson, R. H., Adam, B., del Valle, G., Brian, R., Sittenfeld, M., Ashizawa, T., Wilcox, A., Wilcox, D. E., & Monckton, D. G. (2012). Somatic instability of the expanded CTG triplet repeat in myotonic dystrophy type 1 is a heritable quantitative trait and modifier of disease severity. Human Molecular Genetics , 21(16), 3558–3567. https://doi.org/10.1093/hmg/dds185
  • Morato Torres, C. A., Zafar, F., Tsai, Y. C., Vazquez, J. P., Gallagher, M. D., McLaughlin, I., Hong, K., Lai, J., Lee, J., Chirino-Perez, A., Romero-Molina, A. O., Torres, F., Fernandez-Ruiz, J., Ashizawa, T., Ziegle, J., Jiménez Gil, F. J., & Schüle, B. (2022). ATTCT and ATTCC repeat expansions in the ATXN10 gene affect disease penetrance of spinocerebellar ataxia type 10. HGG Advances , 3(4), 100137. https://doi.org/10.1016/j.xhgg.2022.100137
  • Neveling, K., Mantere, T., Vermeulen, S., Oorsprong, M., van Beek, R., Kater-Baats, E., Pauper, M., van der Zande, G., Smeets, D., Weghuis, D. O., Stevens-Kroef, M., & Hoischen, A. (2021). Next-generation cytogenetics: Comprehensive assessment of 52 hematological malignancy genomes by optical genome mapping. American Journal of Human Genetics , 108(8), 1423–1435. https://doi.org/10.1016/j.ajhg.2021.06.001
  • Otero, B. A., Poukalov, K., Hildebrandt, R. P., Thornton, C. A., Jinnai, K., Fujimura, H., Kimura, T., Hagerman, K. A., Sampson, J. B., Day, J. W., & Wang, E. T. (2021). Transcriptome alterations in myotonic dystrophy frontal cortex. Cell Reports , 34(3), 108634. https://doi.org/10.1016/j.celrep.2020.108634
  • Overend, G., Légaré, C., Mathieu, J., Bouchard, L., Gagnon, C., & Monckton, D. G. (2019). Allele length of the DMPK CTG repeat is a predictor of progressive myotonic dystrophy type 1 phenotypes. Human Molecular Genetics , 28(13), 2245–2254. https://doi.org/10.1093/hmg/ddz055
  • Owusu, R., & Savarese, M. (2023). Long-read sequencing improves diagnostic rate in neuromuscular disorders. Acta Myologica , 42(4), 123–128. https://doi.org/10.36185/2532-1900-394
  • Read, J. L., Davies, K. C., Thompson, G. C., Delatycki, M. B., & Lockhart, P. J. (2023). Challenges facing repeat expansion identification, characterisation, and the pathway to discovery. Emerging Topics in Life Sciences , 7(3), 339–348. https://doi.org/10.1042/etls20230019
  • Sabatella, M., Mantere, T., Waanders, E., Neveling, K., Mensenkamp, A. R., van Dijk, F., Hehir-Kwa, J. Y., Derks, R., Kwint, M., O'Gorman, L., Tropa Martins, M., Gidding, C. E., Lequin, M. H., Küsters, B., Wesseling, P., Nelen, M., Biegel, J. A., Hoischen, A., Jongmans, M. C., & Kuiper, R. P. (2021). Optical genome mapping identifies a germline retrotransposon insertion in SMARCB1 in two siblings with atypical teratoid rhabdoid tumors. Journal of Pathology , 255(2), 202–211. https://doi.org/10.1002/path.5755
  • Sahajpal, N. S., Barseghyan, H., Kolhe, R., Hastie, A., & Chaubey, A. (2021). Optical genome mapping as a next-generation cytogenomic tool for detection of structural and copy number variations for prenatal genomic analyses. Genes (Basel) , 12(3), 398. https://doi.org/10.3390/genes12030398
  • Sahajpal, N. S., Mondal, A. K., Hastie, A., Chaubey, A., & Kolhe, R. (2023). Optical genome mapping for oncology applications. Current Protocols , 3(10), e910. https://doi.org/10.1002/cpz1.910
  • Schwartz, D. C., Li, X., Hernandez, L. I., Ramnarain, S. P., Huff, E. J., & Wang, Y. K. (1993). Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science , 262(5130), 110–114. https://doi.org/10.1126/science.8211116
  • Soler, G., Ouedraogo, Z. G., Goumy, C., Lebecque, B., Aspas Requena, G., Ravinet, A., Kanold, J., Véronèse, L., & Tchirkov, A. (2023). Optical genome mapping in routine cytogenetic diagnosis of acute leukemia. Cancers (Basel) , 15(7), 2131. https://doi.org/10.3390/cancers15072131
  • Tankard, R. M., Bennett, M. F., Degorski, P., Delatycki, M. B., Lockhart, P. J., & Bahlo, M. (2018). Detecting expansions of tandem repeats in cohorts sequenced with short-read sequencing data. American Journal of Human Genetics , 103(6), 858–873. https://doi.org/10.1016/j.ajhg.2018.10.015
  • Tanudisastro, H. A., Deveson, I. W., Dashnow, H., & MacArthur, D. G. (2024). Sequencing and characterizing short tandem repeats in the human genome. Nature Reviews Genetics , 25, 460–475. https://doi.org/10.1038/s41576-024-00692-3
  • van der Sanden, B., Corominas, J., de Groot, M., Pennings, M., Meijer, R. P. P., Verbeek, N., van de Warrenburg, B., Schouten, M., Yntema, H. G., Vissers, L., Kamsteeg, E. J., & Gilissen, C. (2021). Systematic analysis of short tandem repeats in 38,095 exomes provides an additional diagnostic yield. Genetics in Medicine , 23(8), 1569–1573. https://doi.org/10.1038/s41436-021-01174-1
  • van der Sanden, B., Neveling, K., Shukor, S., Gallagher, M. D., Lee, J., Burke, S. L., Pennings, M., van Beek, R., Oorsprong, M., Kater-Baats, E., Kamping, E., Tieleman, A., Voermans, N., Scheffer, I. E., Gecz, J., Corbett, M. A., Vissers, L. E. L. M., Pang, A. W. C., Hastie, A., … Hoischen, A. (2024). Optical genome mapping enables accurate repeat expansion testing. BioRxiv , 2024.2004.2019.590273. https://doi.org/10.1101/2024.04.19.590273
  • Yu, J., Gallagher, M., Shukor, S., Hastie, A., & Chaubey, A. (2024). P573: Genome-wide short tandem repeat expansion screening using optical genome mapping. Genetics in Medicine Open , 2, 101479. https://doi.org/https://doi.org/10.1016/j.gimo.2024.101479

推荐阅读

Nature Protocols
Protocols IO
Current Protocols