Overview of Genotyping Technologies and Methods
Ingrid Kockum, Ingrid Kockum, Jesse Huang, Jesse Huang, Pernilla Stridh, Pernilla Stridh
Abstract
Genetics is a cornerstone of molecular biology, and there have been significant developments in genotyping technologies during the last decades. Genotyping can be used for a wide range of applications, such as genealogy, assessing risks and causes for common diseases and health conditions, animal and human research, and forensic investigations. So how do you perform a genetic study? This overview covers key concepts in genetics, the development of common genotyping methods, and a comparison of several techniques, including PCR, microarrays, and sequencing. A general process of the steps involved in genotyping, from DNA preparation to quality control, is described with relevant protocols referenced. Different types of DNA variants are illustrated, including mutations, SNP, insertions, deletions, microsatellites, and copy number variations, with examples of their involvement in disease. We discuss the utilities of genotyping, such as medical genetics, genome-wide association studies (GWAS), and forensic science. We also provide tips for quality control, analysis, and results interpretation to help the reader design and perform a genetic study or scrutinize such studies from the literature. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.
INTRODUCTION
Genetics is a keystone of molecular biology and an essential tool in unraveling the complexity of biological processes. This has led to significant investments and technological developments in genetics for academic and industrial applications. The advancement of genetics has led to significant research findings and clinical applications, improving our understanding of diseases and traits. This has also aided treatment and drug development, paving the way for personalized medicine based on genetics. This paper presents a comprehensive overview of genotyping to provide a foundation for academics (undergraduates/graduates) and industry professionals. Genotyping is the process of determining an individual's genetic makeup (genotype). We here cover key genetic concepts, the development of genotyping technologies, and a comparison of their advantages and disadvantages. We briefly review common real-world applications for genotyping in research, including genome-wide association studies (GWAS), along with broad commercial applications. Lastly, we detail the biological background and mechanisms that form the basis of commonly used genotyping methods, along with standard methods for data preprocessing and analysis. By the end, the reader will have a theoretical and practical introduction to genotyping and be able to further assess technologies for their studies and those published by others.
OVERVIEW OF GENETICS
DNA structure
Understanding how heritable information is transmitted, stored, and used provides the foundation for selecting an appropriate genotyping platform, correctly interpreting the results into genotypes, and understanding the limitations of the approach. Heritable information is stored in deoxyribonucleic acid (DNA) molecules, organized into chromosomes. Each species has a set number of autosomal chromosome pairs, sex chromosomes, and mitochondrial DNA. Francis Crick, who discovered the structure of DNA, stated that "the central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be transferred back from protein to either protein or nucleic acid" (Crick, 1970). The part of the sequence translated into proteins or other functional molecules is carried in genes. Each gene comprises regulatory regions, exons that are translated to messenger ribonucleic acid (mRNA), introns that provide buffer zones and splicing information, and upstream and downstream untranslated regions. However, most of the sequence lies outside of genes and is non-coding but essential for maintaining the structural and functional integrity of the genome. An allele is one of two or more different forms of a DNA variant (Fig. 1), which is a permanent change in the DNA sequence. If such a change occurs in the coding sequence of a gene, the change may affect the amino acid sequence and function of the protein product. Since sequence changes do not always cause disease or occur within genes, the term variant is preferred over gene mutation. A genotype is a combination of alleles in an individual for a given DNA variant, with one allele inherited from each parent.

Genetic variation
A strict definition of polymorphism is a site of sequence variation for which all alleles are present in at least 1% of the population. However, rare variants (<1%) are also often called polymorphisms. Polymorphisms include a range of variations, from those only affecting a single base pair (bp) to large sequence insertions or deletions. In addition, there can be variations in the number of chromosomes (e.g., chromosome 21 in Down syndrome). Definitions for common genetic terms are listed in Table 1. Single nucleotide polymorphisms (SNPs) or single nucleotide variants (SNVs) are variations at a single position in the genome (Fig. 1). There can be up to four different variants (A, C, T, or G), but for many SNPs, only two alleles are observed. In addition, there are several classes of insertion or deletion polymorphisms where a short (from 1 bp) to a long sequence of DNA has been inserted or deleted. These are called copy number variations (CNVs) when they consist of many base pairs and Indels when they refer to short insertions or deletions. Variable numbers of tandem repeats (VNTRs) are a special class where a sequence motif is repeated in tandem (next to each other), and different individuals have different numbers of repeats. An example is the VNTR in the insulin gene, a 14-15 bp repeat, where the number of repeats affects the susceptibility to type 1 diabetes (Barratt et al., 2004). Depending on the length of the repeated sequence, they are either referred to as microsatellites or minisatellites. Microsatellites consist of 2-6 bp motifs called simple sequence length polymorphisms (SSLP) or single sequence repeats (SSRs). A classic example is the number of CAG repeats within the Huntingtin (HTT) gene that determines the severity of Huntington's disease (McColgan & Tabrizi, 2018). Minisatellites have longer motifs, up to a few hundred bp, while CNVs consist of long stretches of DNA, sometimes up to several genes. An example is a CNV on chromosome 1 (1q21.1), where deletion leads to microcephaly and duplication leads to macrocephaly. The genetic variants that are most easily characterized and, therefore, most well-studied are SNPs/SNVs, which is why we will focus on these polymorphisms in this review.
Term | Definition |
---|---|
Admixture | Genetic admixture occurs when previously diverged or isolated genetic lineages/populations mix. |
Allele | Variants within the gene sequence. Alleles can be characterized by their frequency in a given population (minor/major allele) or if they match a reference genome (reference/alternative allele). |
Autosomal | A gene/polymorphism that exist in the same number of copies on both sexes. For humans, a gene/polymorphism on chromosome 1-22 |
Bottleneck | A genetic bottleneck occurs when a population is greatly reduced in size, limiting the genetic diversity of the species. |
Copy number variation (CNV) | A polymorphism consisting of different number of copies of a particular stretch of DNA. |
Crossover | The exchange of genetic material between two homologous chromosomes during sexual reproduction that results in recombinant chromosomes, i.e., chromosomes with part of the information coming from one parent and part from the other. |
DNA variant | A DNA variant is one of several versions of a particular DNA sequence |
Drift | Genetic drift is an evolutionary mechanism characterized by random fluctuations in the frequency of a particular version of a gene (allele) in a population. It primarily affects small, isolated populations. The effects of genetic drift can be strong, sometimes causing traits to become overwhelmingly frequent or to disappear from a population. |
Genome-wide association study (GWAS) | An investigation of genetic association between polymorphisms throughout the genome and a particular trait, i.e., an unbiased genetic association. |
Genotyping success rate | The proportion of genotype individuals that have a genotype call in an experimental run. |
Haplotype | A combination of allele variants that tend to be inherited together. Haplotype estimation, known as phasing, is the determination of individual haplotypes typically used for genetic imputation. |
Hardy-Weinberg Equilibrium (HWE) | Hardy-Weinberg equilibrium states that the genetic variation in a population will remain constant from one generation to the next. When mating is random in a large population with no disruptive circumstances, HWE predicts that both genotype and allele frequencies will remain constant from one generation to the next because they are in equilibrium. |
Heterozygosity rate | The frequency of heterozygous loci/markers in an individual. Low heterozygosity indicates inbreeding or sample quality issues. |
Heterozygous | When an individual carries two different alleles at a particular position in the genome. |
Homozygous | When an individual carries two identical alleles at a particular position in the genome. |
Hybridization | When two single-stranded complementary DNA molecules associate via hydrogen bonding. |
Identical by descent (IBD) | An identity by state segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common ancestor without recombination; that is, the segment has the same ancestral origin in these individuals. |
Identical by state (IBS) | A DNA segment is identical by state if two or more individuals share the same DNA sequence in this segment. |
Indel | Insertion or deletion of bases in the genome of an organism usually measuring from 1 to 10,000 base pairs in length |
Intron/Exons | Sections of a gene sequence that are either removed (introns, non-coding) or kept (exons, coding) during splicing prior to translation. |
Linkage disequilibrium (LD) | Association between genetic loci due to characteristics of genetic recombination. |
Locus | The physical site or location of a specific gene on a chromosome. |
Microsatellite | A set of short repeated DNA sequences at a particular locus on a chromosome, which vary in number in different individuals. Also called short tandem repeat polymorphisms (STRPs). |
Minor allele frequency (MAF) | The minor allele frequency is the frequency at which the least frequent allele occurs in a population. |
Oligonucleotides | Short DNA or RNA sequences, usually less than 20 bp. They are often used as probes for detecting complementary DNA- or RNA sequences and are usually <20 bp in length. |
Population selection | The selection of individuals who carry genes that make them more suited to survive and reproduce in their environment. Certain genes (and therefore characteristics) are passed on to new generations. There may be inter-population differences in frequency because they exist in different environments. |
Population stratification | Differences in allele frequencies between cases and controls due to systematic differences in ancestry rather than gene-disease association. This occurs, for example, when one ethnic group is overrepresented among cases compared to controls. A false-positive association can be observed for genes when the allele frequency differs between groups. Also known as population admixture. |
Principal component (PCA) | A statistical method for reducing the dimensionality of large datasets containing a high number of dimensions (variables). This is accomplished by linearly transforming the data to a new coordinate system, still capturing most of the variation in the data but in fewer dimensions than the initial data. Used to capture differences between populations and clustering individuals from different populations. |
Quantile-quantile (Q-Q) plot | A probability plot used to compare two probability distributions by plotting their quantiles against each other. If the distributions are similar, the points in the plot will lay on the identity line (y = x). It is used in GWAS to compare the significance of the observed association to that of the expected association. If the points do not fall close to the identity line, one should suspect issues with differences between the compared groups. |
Quencher | A molecule that suppresses an effect such as luminescence. |
r2 | Used to describe the correlation between alleles at two markers or genes and provides a measure of linkage disequilibrium between them. An r2 of 80% means that 80% of the variability in one of the markers can be explained by the genotypes in the other. |
Reference allele | Present in the reference genome. It is often, but not always, the most common allele. |
Restriction fragment length polymorphism (RFLP) | DNA sequence variations at sites recognized by restriction enzymes. Such variation results in different DNA fragment lengths after enzymatic digestion. |
Sex-linked | A trait is controlled by a gene or polymorphism on one of the sex chromosomes. |
Single nucleotide polymorphism (SNP) | Polymorphism (or variant) of a single base position in a gene sequence, defined by its position in a chromosome or RSID number. |
Strand | The double helix of DNA consists of two strands intertwined like a spiral staircase. |
tag SNP | A tag SNP, or tagging SNP, is a representative SNP in a region of the genome with high linkage disequilibrium that represents a group of SNPs called a haplotype. A tag SNP can be used to characterize the DNA variation in that region of the genome instead of genotyping all SNPs in the region. |
Whole exome sequencing (WES) | Next-generation sequencing of all the exomes in a genome. The exomes are enriched using probes complementary for exomes either on a microarray or magnetic beads in solution. Exomes are attached to the probes and retained for the sequencing steps while the rest of the genome is removed. |
Whole genome sequencing (WGS) | Next-generation sequencing of the whole genome. |
Effect of genetic variation
The effect of a genetic variation depends on its location and can broadly be divided into coding and non-coding. A coding variant occurs in an exon and can affect the amino acid sequence and potentially the protein function. There are several different types of non-coding variants. Those that occur in introns can affect splicing, determining which exons are translated, resulting in different protein properties, such as membrane-bound or soluble. Variations in the promoter or enhancer regions can affect the expression level of the gene itself or the expression of several genes, usually those located physically close to the variant on the same chromosome. Lastly, variations in small RNA molecules that are not translated could also affect the efficiency of translation to protein.
Concepts in genetic inheritance
Mendel's first law, the law of segregation, states that a pair of genes define each trait, and these genes are randomly separated into sex cells so that each sex cell only contains one from the pair. Offspring inherit one allele of the gene from each parent. Mendel's law of independent assortment states that genes for different traits are sorted into sex cells and inherited independently. This is true for genes coded on different chromosomes or far apart on the same chromosome. However, genes that are close to each other on the same chromosome are often inherited together, resulting in some traits being closely linked (Fig. 2A).

Not all alleles are inherited independently. SNPs close to each other on the same chromosome tend to have stronger associations with each other than those further away. This phenomenon is characterized as linkage disequilibrium (LD) (Fig. 2). LD between SNPs is directly affected by genetic recombination, the exchange of genetic material between the chromosome pairs during meiosis, the rate of which is generally proportional to the distance between the loci. Differences in genetic recombination also lead to regions of strong association between alleles of several SNPs, also known as LD or haplotype blocks. Within these blocks, particular alleles are usually inherited in set combinations, also called haplotypes, such as K-A-b-L or k-a-B-l (K/k, A/a, B/b, and L/l denote different alleles). LD and haplotypes can also differ between populations, particularly ethnic groups and genetic isolates. LD is a common issue in genetic studies, with phenotypes often associated with multiple variants within the same LD block. This complicates the identification of the causal variant because the phenotype will associate with several variants within the haplotype (K, A, b, and L), and it is impossible to distinguish the one responsible for the phenotype (K) from the rest (A, b, and L).
A common example is the human leukocyte antigen (HLA) family of genes, which encode antigen-presenting membrane proteins important for regulating immune responses in infectious and autoimmune diseases. Due to its genetic variability and importance in immune function, many genetic studies have focused on HLA allele variants. However, strong LD between certain HLA loci often hampers the identification of the causal genes or variants within an associated HLA haplotype. This may be overcome by utilizing different haplotypes more common in other ethnic populations, although differences in population characteristics may limit comparability.
A genetic variant with two alleles (K or k) can result in three genotypes, depending on whether the alleles inherited from mother and father are the same (KK and kk) or different (Kk, Fig. 2A). Hardy-Weinberg equilibrium (HWE) can be used to evaluate the expected and observed distributions of genotypes in a population. According to the HWE, the frequency of the three genotypes is determined by the binomial relationship (p + q)2 = 1, where p and q represent the frequency of two alternative alleles (Fig. 2B). This relationship is only true for a randomly breeding population of sufficient size. Situations that violate this relationship include selection of individuals carrying a particular allele (i.e., selection bias) and genetic drift resulting from fluctuations in allele frequency in small populations, such as genetic isolates. An allele can be introduced by migration (i.e., admixture) and spread as the population grows. Also, a population that is changing dramatically in size may experience drastic changes in allele frequency (e.g., genetic bottlenecks). Comparing the distribution of observed genotypes with the expected distribution may help detect variants with genotyping errors in a population that follows HWE (Kockum, Huang, and Stridh, manuscript in preparation).
APPLICATIONS OF GENOTYPING
Disease and trait association
Genomic data provides an important analytical tool for assessing gene/protein functions and plays an important role in clinical and experimental studies. GWAS is a standard method to detect genetic susceptibilities to traits or diseases by assessing the association to a broad set of genetic variants over the genome. Although such studies often measure anywhere from 100,000 to 2,000,000 variants, this only characterizes a fraction of the entire genome. However, due to LD, a select set of markers may be used to identify genetic effects even if the causal variant is not identified. LD also makes it difficult to identify the causal variants, as SNPs in LD are often associated in blocks. Therefore, associations with genetic variants do not typically indicate their causative nature.
Genetic association studies have also proven instrumental in addressing causality, a major limitation of many epidemiological and clinical observational studies due to challenges discerning the temporal relationship between exposure and outcome. Using the principles of Mendelian randomization, genetic variants that modify exposures are used to discern causal effects with an outcome that can be used to bypass potential confounding relationships. For example, Mendelian randomization has shown that a substantial part of the association between education level and cardiovascular traits is explained by BMI, systolic blood pressure, and smoking behavior (Carter et al., 2019). However, this method may be limited by the strength and reliability of available genetic instruments for assessing exposures of interest. In addition, interpretation may be affected by pleiotropy, where genetic instruments significantly influence multiple biological pathways.
Medical genetics
Genetics has played an important role in clinical practice since the latter half of the 20th century, starting with HLA tissue typing being used to assess histocompatibility during organ transplantation. However, with more accessible genotyping technologies, applied genetics have been more prominent in healthcare settings. Specific genes or polymorphisms may be assessed in clinical genetics to diagnose well-known hereditary diseases, such as Huntington's disease or Sickle cell anemia. For more exploratory cases, whole exome and genome trio sequencing (trio referring to an individual and their biological parents) may be performed to identify the root causes of suspected genetic disorders, such as point mutation. Identifying the causal genetic mutation, particularly the dysfunctional protein and associated biological mechanism, facilitates disease management and assessment of treatment options.
Recently, commercial entities such as 23andMe and selfDecode have promoted the use of genetic testing kits to provide self-assessment of genetic susceptibilities to disease. These often include well-established cancer risk genes such as BRCA1 /BRCA2 and numerous hereditary diseases such as multiple sclerosis, cystic fibrosis, and sickle cell anemia. However, ethical considerations have been raised due to the need to properly disseminate results, particularly with less established measures associated with traits and common diseases (e.g., weight, diabetes).
Experimental models
Studies of human traits and diseases are often constrained by natural heterogeneity in the population, lack of control over exposure to risk factors, and limited access to relevant tissues. Furthermore, human experimentation is often limited due to potential risks to participants and other ethical concerns. Therefore, human studies are often complemented by experimental models of human diseases, where the environment and disease induction can be controlled, and the system can be manipulated to determine the effect of the factor under study, including genetic variance. The central tenet of such studies is that large portions of the genome and its functions are conserved between species. Species commonly used to model genetic diseases include mice, rats, flies, fish, and worms. Species selection depends on similarities in genetics, anatomy, and physiology of the disease-related factors. The correlation between genotype and phenotype is assessed by genotyping the animals. Identified associations need to be translated back and tested in the human genome.
An added advantage of model organisms is that the genome can be manipulated to create gene knock-outs and knock-ins, transgenics, and conditional gene modifications. Recent advancements in genome editing techniques with the Clustered Regulatory Interspaced Short Palindromic Repeats (CRISPR)/Cas9 system have provided a tool for accurately, timely, and cost-effectively editing genes. The genotyping techniques are the same between species, but the specific sequences of primers and probes are often species-specific.
Genealogy and forensic science
Genetics is also important for identifying and establishing relatedness for genealogy, which has become particularly popular with the availability of commercial genotyping kits. It has also become an important tool for identification in criminal justice and forensic science. The specificity of genotypes provides a unique “fingerprint” for associating individuals with biological evidence. However, genetic testing is not always irrefutable, as inadequate handling and sample contamination can affect the reliability of DNA evidence. Contamination can also be problematic in medical or research settings, usually identified by higher genotyping failure and heterozygosity rate. The quality of both sample handling and testing may be used to justify the dismissal of DNA evidence or grounds for appeal. Furthermore, DNA may falsely implicate innocent individuals both unintentionally or due to mal-intent.
THE DEVELOPMENT OF GENOTYPING TECHNOLOGIES
An individual's genotype is determined by the combination of alleles inherited from both parents. The genotype can be presented as carriage or dosage. Carriage is the presence of a given allele regardless of the number of copies (0 or 1), while dosage specifies the number of copies of the allele (0, 1, or 2). Genotyping is the process of determining the DNA sequence at a specific position in the genome, i.e., the allele.
Introduction to genotyping technologies
Historically, a genotype was inferred by its expressed phenotype, illustrated by spotty yellow peas and wrinkly plain green peas in Figure 2. This later advanced into tissue typing when the advent of organ transplantation necessitated the identification of the individuals’ HLA antigens to ensure host compatibility. DNA sequencing methods were pioneered during the 1970s, enabling the development of tools that could determine genotypes. Variations in the DNA sequence recognized by bacterial restriction enzymes cause the DNA to be cleaved at different locations, resulting in differences in DNA fragment length. First, these restriction fragment length polymorphisms (RFLPs) were used for genotyping during the 1970-1980s. These early techniques were labor-intensive and time-consuming, with cumbersome protocols often spanning several days, including gel electrophoresis and hybridization with radioactively labeled probes to visualize the results. Later discovery of the polymerase chain reaction (PCR) in 1985 (Mullis et al., 1986) made possible the amplification of unlimited copies of a particular stretch of DNA, revolutionizing the field of genetics and medicine by enabling DNA comparison, diagnosis of genetic disorders, and detection of viruses in human cells.
Genotyping resolution dramatically increased with the development of DNA microarray technology. Microarrays are used to query a large number of variants simultaneously. Oligonucleotides with specific DNA sequences, known as probes, bind to the target DNA to detect sequence variants. Traditional solid-phase microarrays consist of probes physically spotted on a chip, with each location representing a specific sequence. More recently developed bead arrays allow for an extremely large number of probes to which DNA is hybridized on coded microscopic polystyrene beads.
Clinical whole genome sequencing (WGS) was introduced in 2014. WGS determines the entire DNA sequence of a genome in a single experiment, including mitochondrial and, for plants, chloroplastic DNA. This is the highest resolution of genotyping. WGS is pinpointing causal variants from association studies to help predict disease susceptibility and drug response. A simplified and less expensive version is whole exome sequencing (WES), whereby only 1% to 2% of the genome, representing the coding regions expressed into proteins, is sequenced. Sequencing whole genomes have recently become more accessible by the fast development and wide application of next-generation sequencing (NGS), which entails improvements in massively parallel analysis, high throughput, and reduced costs.
Factors affecting the choice of technologies
The selection of genotyping techniques depends on parameters such as the amount (i.e., the number of genetic markers) and type (i.e., precision) of genetic information needed, the size of the cohort (i.e., the number of individuals), the computational capacity available (e.g., pre-processing, post-processing), and the financial limitations.
The genetic information required for large-scale discovery studies in which many genetic variants are queried in large cohorts drastically differs from a targeted study of a particular gene or set of loci. In diagnosing cystic fibrosis, it is important to identify pathogenic variants in the CFTR gene (http://www.genet.sickkids.on.ca/), an example of a targeted investigation. An example of a large-scale study to discover variants affecting risk for common diseases is the identification of more than 200 risk SNPs for multiple sclerosis (International Multiple Sclerosis Genetics, 2019). The increase in genome-wide screens, where the entire genome is genotyped or sequenced, for common diseases in recent decades has promoted genotyping arrays, where several hundred thousand SNPs are assessed at once. This technique generates genotypes for SNPs representing a particular chunk of the genome. Although the sequence is unknown, the genetic region represented by the SNP can be tested for association with diseases. This level of information often represents a good balance between coverage and detail. The array data requires additional computational processing and quality assurance; however, standard methods and tools are available to perform such analyses (see Data Processing for Microarray Data). The costs calculated per genotype are low, but the total cost is higher than most targeted assays due to the many variants.
A targeted study is better served by allele-specific PCR or TaqMan PCR for low-resolution inquiries, such as genetic stratification while pyrosequencing or NGS is better suited if the genetic characterization of the loci for each individual is needed. Genetic stratification compares carriers and non-carriers of a particular variant or haplotype (e.g., the HBB variant that causes sickle cell anemia) (Carlice-Dos-Reis et al., 2017). The genetic risk of breast cancer is associated with several variants in the BRCA1 and BRCA2 genes, and therefore pyrosequencing and NGS are the methods of choice to investigate individual risk (Yoshimura et al., 2022). Sequencing (site-specific, exome, or whole genome) generates the most detailed genetic information, but it is also more expensive and computationally demanding, precluding novice investigators from readily implementing it. This level of detail is unnecessary for many studies. Pyrosequencing is advantageous for genotyping highly polymorphic loci and is also often used for epigenetic studies, but the technology is quite labor-intensive and time-consuming. Pyrosequencing and other genotyping technologies that involve manual processing of samples and inspection of results, including allele-specific PCR and TaqMan, work well for smaller cohorts but have limited use when genotyping larger cohorts.
To select the most effective and resource-efficient genotyping method, one must first define the scope of the study (local or genome-wide), the level of detail wanted (grouping variable or DNA sequence), and then identify the technique that provides the appropriate information based on available resources (personnel hours, equipment, money, and skill) (Table 2).
Name | Cost | # markers | Pro | Con | Ref. |
---|---|---|---|---|---|
PCR-RFLP | + | 1 | Easy to run in any lab, fast, flexible | Time consuming, manual inspection | (Saiki et al., 1985) |
Allele-specific PCR | + | 1 | Easy to run in any lab, fast, flexible | Time consuming, manual inspection | (Gaudet et al., 2009) |
TaqMan PCR | + | 1 | Standardized, more accurate | Manual inspection, requires specific equipment | (Hui et al., 2008) |
Microsatellite | + | 1 | Robust, do not require specific equipment, flexible | Low resolution, manual inspection, time consuming | (Weber & May, 1989) |
Pyrosequencing | ++ | 1 | Captures all potential alleles | Time consuming, manual inspection, requires specific equipment | (Kreutz et al., 2013) |
iPLEX | ++ | 40 | Multiplex assay | Manual inspection, requires specific equipment | (Gabriel et al., 2009; Tang et al., 1999) |
Multiplex TaqMan | ++ | up to 100 | Multiplex assay | Requires specific equipment | (Martínez-Cruz et al., 2011) |
Genotyping arrays | +++ | 50k-2 mil | Multiplex assay | Requires specific equipment and expertise | (Verlouw et al., 2021) |
NGS-Exome | ++++ | 25000 | captures all coding variants | Requires specific equipment and expertise, demanding processing | (Seaby et al., 2016) |
NGS-Whole Genome | +++++ | up to 40 mil | captures all variants | Requires specific equipment and expertise, demanding processing | (Slatko et al., 2018) |
- Rough relative cost per genotype is indicated by +, since prices vary over time.
GENERAL GENOTYPING METHODS
DNA preparation
Genomic DNA isolation involves extracting chromosomal DNA from the cellular nuclei with detergent and mechanical shearing, then removing proteins and cell debris to yield a purified DNA sample. Several methods can be used to extract DNA from biological samples, including blood, saliva, or paraffin-embedded biopsy tissues. Methods for DNA isolation have been described elsewhere; see Wiley Current Protocols regularly updated DNA Preparation and Analysis for details. Organic extraction of genomic DNA involves separating the DNA and proteins into different organic phases by adding phenol/chloroform. The cost is low and requires very little equipment. Commercial kits provide a faster and easier method that employs filter columns to isolate the DNA. These are more costly and difficult to troubleshoot in case of problems since the details on buffer contents are confidential. Magnetic beads can also be used to isolate and purify DNA in one step, with commercial kits available from several companies.
Hybridization
Hybridization between genomic DNA and the matching probe is the core principle behind many genotyping techniques, including PCR, microarray, and NGS. Hybridization is the process of bonding two complementary single-stranded DNA fragments to form a double-stranded molecule, which depends on base pair matching across the fragments. Tight noncovalent bonding between the strands is achieved when many base pairs are complementary, and nonspecific sequences are washed away before subsequent steps.
PCR-based methods
The introduction of PCR for amplifying a given stretch of DNA has led to rapid automation of genotyping methods. PCR is a cell-free, rapid, and sensitive method for generating many copies of a DNA sequence but relies on knowing the sequence surrounding the DNA of interest. There are three steps to the reaction, which are repeated in several cycles (rounds). The first step is denaturing, when the temperature is increased so DNA strands melt and become single-stranded. The second step is annealing, when the temperature is decreased to allow binding of short single-stranded stretches of DNA, called primers, complementary to the sequence of interest. The third step is DNA synthesis, when heat-stable DNA polymerase extends the DNA sequence from the primers along the target DNA. These steps are repeated several times, resulting in many copies of the region of interest (White et al., 1989). If the primers include the variation of interest, it is possible to design the conditions for amplification such that amplification only occurs for the allele of interest and not for other alleles in so-called allele-specific PCR. Another way to use PCR for genotyping is PCR-RFLP (Saiki et al., 1985), where the region of interest is PCR-amplified, then digested with a restriction enzyme chosen to recognize a DNA sequence that is only present in one of the alleles; thus, the size of the resulting products would distinguish the different alleles. PCR can easily detect microsatellites or short tandem repeat polymorphisms (STRPs) as the length of the amplified fragment will vary depending on how many times a microsatellite is repeated (Weber & May, 1989). Although allele-specific PCR and PCR RFLP were established many years ago, they are still used in certain settings.
TaqMan-PCR is a widely used genotyping method for determining genotype of candidate SNPs (Lee et al., 1993) (Fig. 3A). A region of 100-150 bp surrounding the SNP of interest is PCR-amplified in the presence of two allele-specific probes, one for each alternative allele. The probe sequence contains the SNP. These probes have different fluorescent labels at their 5′ end and a quencher bound to the 3′ end so that there is no signal from the probe while in solution. The probe binds DNA that contains the complementary allele. When the DNA polymerase unwinds the DNA during extension, the fluorescent label is released in solution, now separated from the quencher, resulting in detectable fluorescence. As each allele is assessed with its own probe and fluorescent label, it is possible to detect two alternative alleles simultaneously with the potential to multiplex up to 100 different genetic loci on a single array (Fig. 3A) (Martínez-Cruz et al., 2011). For further details and protocols, see Hui et al., 2008. Because TaqMan PCR is a quantitative method, it can be used to assess copy number variation by measuring the number of generated copies of a target during PCR and comparing it to a reference with a known number of copies (Hosono et al., 2009).

iPLEX
The iPLEX genotyping method (Gabriel et al., 2009; Tang et al., 1999) is based on the commercially available Sequenom MassARRAY platform. The region of interest is PCR-amplified and then locus-specific primers are annealed immediately upstream of the SNP in the presence of mass-modified dideoxynucleotide terminators. The primer will be extended by one base pair depending on the target sequence, and the mass of the primer will vary depending on the SNP allele. This difference in mass is detected using MALDI-TOF mass spectrometry. Custom software is used to translate the mass to genotypes. iPLEX has similar applications as TaqMan PCR but requires more specialized and expensive equipment and has therefore been replaced in some cases by microarray methods.
Pyrosequencing
Pyrosequencing is a method used for short stretches of DNA. In this genotyping method, new DNA is synthesized from the template by adding nucleotides (dNTP) one at a time (Ronaghi et al., 1998) (Fig. 3B). The incorporation of dNTP into the new DNA strand results in release of pyrophosphate, which is converted to ATP by ATP sulfurylase. Luciferase then uses ATP and oxygen to produce light, which can be detected. The target DNA is immobilized on beads, one dNTP (dATP, dTTP, dGTP, or dCTP) is added at a time, and incorporation is detected by light emission. Apyrase is used to degrade any unincorporated dNTP before adding the next one. A more detailed protocol for pyrosequencing can be found in Kreutz et al., 2013 (Kreutz et al., 2013). Pyrosequencing is currently used for targeted studies, but has partly been replaced by newer methods due to its complicated protocol and need for specialized equipment.
Microarray methods for genotyping
One of the most commonly used methods for genotyping is microarrays. A microarray is a solid surface on which microscopic spots of synthesized DNA have been applied. These can be used for genotyping by designing the DNA spots to contain stretches of DNA that overlap the targeted SNPs. Proper hybridization conditions ensure that the target DNA only hybridizes if it is complementary with the DNA in a particular spot. There are several types of genotyping microarray, the most common being the Affymetrix GeneChip microarray (Ragoussis & Elvidge, 2006) and the Illumina bead arrays (Fan et al., 2006). The bead array technology introduced by Illumina is an alternative microarray-based genotyping method (Fig. 3C). Bead libraries are randomly assembled into an etched microwell substrate. Each bead contains multiple copies of oligonucleotides targeting a specific locus as well as a mapping oligonucleotide designed not to have homology to any sequence from the species under study. This is used to map where on the array a particular bead has ended up with a series of hybridization steps carried out during the design of the chip. On each array, several hundred thousand to over a million genotypes can be assayed for one individual. During the genotyping step, the DNA being genotyped will bind to the beads with complementary DNA stopping one base pair before the sequence of interest, the position of the SNP. A single base extension incorporating one of four labeled nucleotides takes place. Which nucleotide was incorporated is detected as a signal is emitted when excited by a laser. The signal intensity confers information about the allelic ratio at a particular locus.
An alternative method based on the Affymetrix GeneChip involves sequentially adding one nucleotide at a time to a quartz wafer while masking different sites to generate synthetic 25-mer DNA with different sequences in different positions. Target DNA is labeled with biotin and hybridized to the GeneChip, and binding is detected by staining with phycoerythrin-streptavidin antibody complex followed by high-resolution scanning.
Next-generation sequencing
NGS, also referred to as second-generation sequencing, is rapidly replacing older methods as it allows massively parallel sequencing of millions of DNA fragments. First DNA (or RNA) is broken down into random short fragments (Slatko et al., 2018) (Fig. 3D). During sample preparation, adaptors, motifs used for sample identification (indices), sequencing bridges, and sequences complementary to the flow cell oligos are added to the DNA fragments. Each DNA fragment is isothermally amplified on the flow cell, a glass substrate containing millions of nanowells at fixed locations. Each well contains DNA probes used to capture DNA strands during hybridization to allow amplification during cluster generation. The flow cell contains two attached oligos binding the adaptors on each end of the target DNA. As the sample DNA passes over the flow cell, it hybridizes to one of the oligos, and a polymerase synthesizes a copy of the target DNA starting from the oligo. Bridge amplification is used to clonally amplify the DNA fragment. The target DNA is first washed away, the the newly synthesized copy bends over and the second adaptor end binds to the second type of oligo on the flow cell. New copies of the targeted DNA are generated with each round of amplification. Reverse strands are washed away, leaving only forward strands. Sequence detection utilizes sequencing-by-synthesis (SBS) technology, which monitors the addition of fluorescently labeled nucleotides, which emit unique signals for each nucleotide as they are incorporated. The newly synthesized read product is washed away and an index primer is added to sequence the index sequence next to one of the adaptors. Together, these two sequences constitute read 1. The sequencing process is repeated for the reverse strand in the same fashion generating the second read at the other end of the target sequence. This process is carried out for millions of DNA fragments in parallel, generating a large amount of sequence data. The index sequences can be used to separate DNA sequences from different samples. This is followed by mapping the fragments to a reference genome, or de novo assembly if sequencing a new species or samples from an unknown species. For genotyping purposes, differences in the DNA sequence between samples at the same position are identified. NGS methods can be applied to targeted regions, all exomes, or the whole genome. They can also quantify gene expression if the sample is RNA rather than DNA. An advantage of this genotyping method is that it allows the identification of novel polymorphisms/mutations, which is not possible for other genotyping methods (e.g., microarray) (Weissenkampen et al., 2019).
DATA PROCESSING FOR MICROARRAY DATA
Due to potential technological and sample handling issues, genetic datasets must be quality controlled by identifying and filtering out inaccurate genotypes to ensure the reliability of findings. These include potential problems beginning from sample quality to the quality of genotype calling, where raw intensities for each genotyping marker are often clustered automatically by calling algorithms to determine the alleles carried by each sample. Although such protocols are explained extensively elsewhere (Anderson et al., 2010), an overview of standard quality control steps is provided here, including common assessments by markers and individuals.
The quality of markers is often filtered for low call rates (<95%-98%), which is indicative of poor assay quality. Differences in call rates or missing genotypes between cases and controls are usually assessed to prevent study biases resulting from assay or sample preparation. Markers with a low minor allele frequency (MAF, <2%-5%) may be difficult to call accurately and have lower statistical power for association analyses. Rare variants can result in false positives due to their dependency on a few individuals and may be more prone to handling/processing issues and sampling bias. Therefore, these markers often require an even higher quality cutoff for call rates or, even when properly called, are usually excluded during quality control. Lastly, conditions of HWE, which is the relationship of homozygous/heterozygous frequencies, are assessed as discrepancies that are indicative of genotyping/calling failures. However, this may not always be the case, particularly if population selection favors one genotype over the others, or in the presence of a strong genetic association to the trait under study. Therefore, population controls are often used when assessing HWE with cutoffs ranging from P < 10-3 to 10-6.
The overall genotyping quality of individual samples must also be assessed to remove those with poor quality (e.g., low DNA concentration) or potential mishandling (e.g., contamination, misidentification). This begins by filtering out those with a high missingness or failure rate (>2%-5%), which is the percentage of markers with missing allele calls for an individual. This indicates poor quality and/or low DNA sample concentration. This may also be indicated by the proportion of heterozygous genotypes, i.e., one minus the observed homozygous genotypes over the total non-missing genotypes. High heterozygosity rate and sex discrepancies with X-chromosome imputed sex indicate sample contamination and/or misidentification. Although these samples are often excluded, these issues may also indicate plating and sample mishandling issues that can affect other neighboring samples on the plate. Relatedness between participants is usually assessed with identity-by-state (IBS), to remove potential duplicate samples and familial overrepresentation within population-based studies. Lastly, population stratification is assessed using a principal component analysis (PCA) to identify sub-population structures due to ethnic distributions within the sample population and potential outliers commonly attributable to immigrants. Population outliers are excluded from further analysis to prevent study bias. Secondary clusters may be further assessed through stratification or correction, although this may depend on the sampled population and study question (Kockum, Huang, and Stridh, manuscript in preparation).
ANALYSIS AND INTERPRETATION OF GENETIC FINDINGS
Once genetic data has been quality controlled, the association between genotype and phenotype can be investigated. The appropriate statistical analysis depends on the study question. For single-marker/single-locus analysis, the cohort can be stratified based on genotype and the phenotypes compared between groups with t-test, analysis of variance (ANOVA), or regression. This is the most common analysis in experimental design, including comparisons of knock-out or transgenic animals to wild type.
The majority of human studies involve case-control design, where groups with and without the trait are compared. In the case of GWAS, the frequency of the reference allele of each genotyped marker, usually captured by SNP microarrays, is compared between groups using logistic regression. An allele that is significantly more common among cases than controls confers a higher risk of developing the trait, while a significantly less common allele is deemed protective. Quantitative phenotypes that encompass more information on inter-individual trait variability can be a more powerful approach. For example, body mass index (BMI) can be analyzed as a continuous trait instead of the absence or presence of obesity (BMI>25). Linear regression tests the association between genotype and normally distributed continuous phenotypes. Multiple regression can be used to correct the model for the effect of confounding and modifying variables. Population structure and heterogeneity can confound the genetic effect estimates, which can be corrected for by principal components (see Section 6). The advantages of association studies are that they are relatively standardized, quick, and easy to perform. However, they cannot prove causality, randomization is not possible, and finding an appropriate control group can be challenging.
The results of genome-wide association analyses are usually displayed in a Manhattan plot, where the chromosome and location is displayed on the x-axis, and the significance of association indicated by the –log10 of the p-value is displayed on the y-axis (Fig. 4A). The association signals detected by GWAS, apart from the small number of truly associated SNPs, are expected to follow the distribution of the null hypothesis (no association). If the allele frequencies are contaminated by cryptic relatedness, population stratification, or genotyping errors, the association signals will be inflated across the genome (Cardon & Palmer, 2003; Marchini et al., 2004). Assessing inflation with lambda statistics and visualizing the expected and observed association signals in a quantile-quantile (Q-Q) plot (Fig. 4D) should be conducted as part of the quality assessment of GWAS studies. If present, inflation should be corrected before continuing. The functions of associated variants can then be investigated to elucidate the underlying biology of the trait. The objective of many GWAS studies is to understand the pathogenesis of the disease. Since most genetic variants, except for WES data, are located outside of genes and known functional units, it can be challenging to identify the mechanisms of associated variants. One approach to understanding the overall impact of such findings is to use pathway analysis.

Genetic data can be analyzed with most statistical software programs. However, several open source software products have been specifically developed to analyze this type of data. We recommend using Plink (https://www.cog-genomics.org/plink/, Table 3) for quality assurance of genetic data and GWAS analysis of case-control and quantitative traits. Another helpful tool for statistical analysis of group differences is R (https://www.r-project.org/, Table 3), which can also be used to plot results. There are also numerous databases and browsers with information regarding LD blocks, genes, variants, and how these are involved in traits and diseases, including NCBI (https://www.ncbi.nlm.nih.gov/, Table 3) and Ensembl (https://www.ensembl.org/index.html, Table 3).
Short name | Link | Description |
---|---|---|
Current Protocols DNA Preparation and Analysis | https://currentprotocols.onlinelibrary.wiley.com/doi/toc/10.1002/(ISSN)1934-3647.DNAPreparationandAnalysis | A set of lab protocols for DNA preparation and DNA analysis |
PLINK | https://www.cog-genomics.org/plink/ | Download and manual for Shaun Purcell's PLINK command-line program for genetic association analysis |
R | https://www.r-project.org/ | Download, manual, and links to tools developed in R, a free software environment for statistical computing and graphics |
NCBI | https://www.ncbi.nlm.nih.gov/ | Database of genetic variation, LD, gene and protein function, and much more. Similar information is found on the Ensembl homepage. |
Ensembl | https://www.ensembl.org/index.html | A resource of information on genetic variation, LD, gene, protein function, and much more. Similar information is found on the NCBI homepage. |
CONCLUSION
Genotyping can be used for a wide range of applications, including personal ancestry, health risks, research, and forensic investigations of crimes. The genetic information required for large-scale characterization or discovery studies is drastically different from a targeted study of a particular set of genetic variants, such as genetic stratification or forensic “fingerprinting”. Parameters to consider when choosing the genotyping method for a project are the number of markers and precision of genetic information needed, the number of individuals to be typed, the capacity for pre-processing, post-processing, and analysis, and finally any financial constraints. Sequencing generates the most detailed genetic information but is more expensive and computationally demanding, while low-resolution information is captured by less expensive allele-specific PCR or TaqMan PCR. The selected method should provide the best balance between coverage and detail that fits the purpose of the genetic study.
Other applications of genetics include making individualized predictions regarding prognosis, acceleration of disease, and best course of treatment, so called personalized or precision medicine, with the ultimate goal of achieving prevention and/or improved outcomes for common health conditions.
AUTHOR CONTRIBUTIONS
Ingrid Kockum : Conceptualization, funding acquisition, investigation, visualization, original draft writing, review, and editing; Jesse Huang : Conceptualization, investigation, visualization, original draft writing, review, and editing; Pernilla Stridh : Conceptualization, investigation, project administration, visualization, original draft writing, review, and editing.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
Open Research
DATA AVAILABILITY STATEMENT
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
LITERATURE CITED
- Anderson, C. A., Pettersson, F. H., Clarke, G. M., Cardon, L. R., Morris, A. P., & Zondervan, K. T. (2010). Data quality control in genetic case-control association studies. Nature Protocols , 5(9), 1564–1573. https://doi.org/10.1038/nprot.2010.116
- Barratt, B. J., Payne, F., Lowe, C. E., Hermann, R., Healy, B. C., Harold, D., Concannon, P., Gharani, N., McCarthy, M. I., Olavesen, M. G., McCormack, R., Guja, C., Ionescu-Tîrgovişte, C., Undlien, D. E., Rønningen, K. S., Gillespie, K. M., Tuomilehto-Wolf, E., Tuomilehto, J., Bennett, S. T., … Todd, J. A. (2004). Remapping the insulin gene/IDDM2 locus in type 1 diabetes. Diabetes , 53(7), 1884–1889. https://doi.org/10.2337/diabetes.53.7.1884
- Cardon, L. R., & Palmer, L. J. (2003). Population stratification and spurious allelic association. Lancet , 361(9357), 598–604. https://doi.org/10.1016/S0140-6736(03)12520-2
- Carlice-Dos-Reis, T., Viana, J., Moreira, F. C., Cardoso, G. L., Guerreiro, J., Santos, S., & Ribeiro-Dos-Santos, A. (2017). Investigation of mutations in the HBB gene using the 1,000 genomes database. PLoS ONE , 12(4), e0174637. https://doi.org/10.1371/journal.pone.0174637
- Carter, A. R., Gill, D., Davies, N. M., Taylor, A. E., Tillmann, T., Vaucher, J., Wootton, R. E., Munafò, M. R., Hemani, G., Malik, R., Seshadri, S., Woo, D., Burgess, S., Davey Smith, G., Holmes, M. V., Tzoulaki, I., Howe, L. D., & Dehghan, A. (2019). Understanding the consequences of education inequality on cardiovascular disease: Mendelian randomisation study. BMJ , 365, l1855. https://doi.org/10.1136/bmj.l1855
- Crick, F. (1970). Central dogma of molecular biology. Nature , 227(5258), 561–563. https://doi.org/10.1038/227561a0
- Fan, J. B., Gunderson, K. L., Bibikova, M., Yeakley, J. M., Chen, J., Wickham Garcia, E., Lebruska, L. L., Laurent, M., Shen, R., & Barker, D. (2006). Illumina universal bead arrays. Methods in Enzymology , 410, 57–73. https://doi.org/10.1016/S0076-6879(06)10003-8
- Gabriel, S., Ziaugra, L., & Tabbaa, D. (2009). SNP genotyping using the Sequenom MassARRAY iPLEX platform. Current Protocols in Human Genetics , 60, 2.12.1–2.12.18. https://doi.org/10.1002/0471142905.hg0212s60
- Gaudet, M., Fara, A. G., Beritognolo, I., & Sabatti, M. (2009). Allele-specific PCR in SNP genotyping. Methods in Molecular Biology , 578, 415–424. https://doi.org/10.1007/978-1-60327-411-1_26
- Hosono, N., Kato, M., Kiyotani, K., Mushiroda, T., Takata, S., Sato, H., Amitani, H., Tsuchiya, Y., Yamazaki, K., Tsunoda, T., Zembutsu, H., Nakamura, Y., & Kubo, M. (2009). CYP2D6 genotyping for functional-gene dosage analysis by allele copy number detection. Clinical Chemistry , 55(8), 1546–1554. https://doi.org/10.1373/clinchem.2009.123620
- Hui, L., DelMonte, T., & Ranade, K. (2008). Genotyping using the TaqMan assay. Current Protocols in Human Genetics , 56, 2.10.1–2.10.8. https://doi.org/10.1002/0471142905.hg0210s56
- International Multiple Sclerosis Genetics Consortium. (2019). Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science , 365(6460), eaav7188. https://doi.org/10.1126/science.aav7188
- Kreutz, M., Hochstein, N., Kaiser, J., Narz, F., & Peist, R. (2013). Pyrosequencing: Powerful and quantitative sequencing technology. Current Protocols in Molecular Biology , 104, 7.15.11–17.15.23. https://doi.org/10.1002/0471142727.mb0715s104
- Lee, L. G., Connell, C. R., & Bloch, W. (1993). Allelic discrimination by nick-translation PCR with fluorogenic probes. Nucleic Acids Research , 21(16), 3761–3766. https://doi.org/10.1093/nar/21.16.3761
- Marchini, J., Cardon, L. R., Phillips, M. S., & Donnelly, P. (2004). The effects of human population structure on large genetic association studies. Nature Genetics , 36(5), 512–517. https://doi.org/10.1038/ng1337
- Martínez-Cruz, B., Ziegle, J., Sanz, P., Sotelo, G., Anglada, R., Plaza, S., Comas, D., & Genographic Consortium. (2011). Multiplex single-nucleotide polymorphism typing of the human Y chromosome using TaqMan probes. Investigative Genetics , 2, 13. https://doi.org/10.1186/2041-2223-2-13
- McColgan, P., & Tabrizi, S. J. (2018). Huntington's disease: A clinical review. European Journal of Neurology , 25(1), 24–34. https://doi.org/10.1111/ene.13413
- Mullis, K., Faloona, F., Scharf, S., Saiki, R., Horn, G., & Erlich, H. (1986). Specific enzymatic amplification of DNA in vitro: The polymerase chain reaction. Cold Spring Harbor Symposia on Quantitative Biology , 51(Pt 1), 263–273. https://doi.org/10.1101/sqb.1986.051.01.032
- Ragoussis, J., & Elvidge, G. (2006). Affymetrix GeneChip system: Moving from research to the clinic. Expert Review of Molecular Diagnostics , 6(2), 145–152. https://doi.org/10.1586/14737159.6.2.145
- Ronaghi, M., Uhlen, M., & Nyren, P. (1998). A sequencing method based on real-time pyrophosphate. Science , 281(5375), 363–365. https://doi.org/10.1126/science.281.5375.363
- Saiki, R. K., Scharf, S., Faloona, F., Mullis, K. B., Horn, G. T., Erlich, H. A., & Arnheim, N. (1985). Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science , 230(4732), 1350–1354. https://doi.org/10.1126/science.2999980
- Seaby, E. G., Pengelly, R. J., & Ennis, S. (2016). Exome sequencing explained: A practical guide to its clinical application. Briefings in Functional Genomics , 15(5), 374–384. https://doi.org/10.1093/bfgp/elv054
- Slatko, B. E., Gardner, A. F., & Ausubel, F. M. (2018). Overview of Next-Generation Sequencing Technologies. Current Protocols in Molecular Biology , 122(1), e59. https://doi.org/10.1002/cpmb.59
- Tang, K., Fu, D. J., Julien, D., Braun, A., Cantor, C. R., & Koster, H. (1999). Chip-based genotyping by mass spectrometry. Proceedings of the National Academy of Sciences of the United States of America , 96(18), 10016–10020. https://doi.org/10.1073/pnas.96.18.10016
- Verlouw, J. A. M., Clemens, E., De Vries, J. H., Zolk, O., Verkerk, A. J. M. H., Am Zehnhoff-Dinnesen, A., Medina-Gomez, C., Lanvers-Kaminsky, C., Rivadeneira, F., Langer, T., Van Meurs, J. B. J., Van Den Heuvel-Eibrink, M. M., Uitterlinden, A. G., & Broer, L. (2021). A comparison of genotyping arrays. European Journal of Human Genetics , 29(11), 1611–1624. https://doi.org/10.1038/s41431-021-00917-7
- Weber, J. L., & May, P. E. (1989). Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. American Journal of Human Genetics , 44(3), 388–396.
- Weissenkampen, J. D., Jiang, Y., Eckert, S., Jiang, B., Li, B., & Liu, D. J. (2019). Methods for the analysis and interpretation for rare variants associated with complex traits. Current Protocols in Human Genetics , 101(1), e83. https://doi.org/10.1002/cphg.83
- White, T. J., Arnheim, N., & Erlich, H. A. (1989). The polymerase chain reaction. Trends in Genetics , 5(6), 185–189. https://doi.org/10.1016/0168-9525(89)90073-5
- Yoshimura, A., Imoto, I., & Iwata, H. (2022). Functions of breast cancer predisposition genes: Implications for clinical management. International Journal of Molecular Sciences , 23(13), 7481. https://doi.org/10.3390/ijms23137481
Citing Literature
Number of times cited according to CrossRef: 3
- Jesse Huang, Ingrid Kockum, Pernilla Stridh, Recovering Misidentified Samples Through Genetic Discordance Clustering, Current Protocols, 10.1002/cpz1.972, 4 , 1, (2024).
- Paolo Abondio, Elisabetta Cilli, Donata Luiselli, Human Pangenomics: Promises and Challenges of a Distributed Genomic Reference, Life, 10.3390/life13061360, 13 , 6, (1360), (2023).
- Yifan Liu, Yanju Shan, Yunjie Tu, Ming Zhang, Gaige Ji, Xiaojun Ju, Shiying Shi, Chenyu Fan, Yunlei Li, Jingting Shu, Designing and evaluating a cost‐effective single nucleotide polymorphism liquid array for Chinese native chickens, Animal Research and One Health, 10.1002/aro2.31, 1 , 2, (168-179), (2023).