Inferring human genomes at a fraction of the cost promises to boost biomedical research

January 13, 2021

Thousands of genetic markers have already been robustly associated with complex human traits, such as Alzheimer's disease, cancer, obesity, or height. To discover these associations, researchers need to compare the genomes of many individuals at millions of genetic locations or markers, and therefore require cost-effective genotyping technologies. A new statistical method, developed by Olivier Delaneau's group at the SIB Swiss Institute of Bioinformatics and the University of Lausanne (UNIL), offers game-changing possibilities. For less than $1 in computational cost, GLIMPSE is able to statistically infer a complete human genome from a very small amount of data. The method offers a first realistic alternative to current approaches relying on a predefined set of genetic markers, and so allows a wider inclusion of underrepresented populations. The study, which suggests a paradigm shift for data generation in biomedical research, is published in Nature Genetics.

A cost-effective approach to probing genetic markers

Low-coverage whole genome sequencing (LC-WGS) followed by genotype imputation is a method by which a whole genome can be inferred statistically from a very low sequencing effort. It has been proposed as a less biased and more powerful alternative to SNP arrays (see box), but its high computational cost has prevented it from becoming a widely used alternative. The team of scientists led by Olivier Delaneau, Group Leader at SIB and UNIL, has developed an open-source software, called GLIMPSE, that finally overcomes these issues. "GLIMPSE provides a framework that is 10-1,000 times faster, and thus cheaper, than other LC-WGS methods, while being much more accurate for rare genetic markers'' explains Olivier Delaneau. "GLIMPSE is able to greatly enhance a low-coverage genome at millions of markers for less than $1 in computational cost, making it the first real alternative to SNP arrays".

From unbiased data to unbiased healthcare Genome-wide association studies have so far mostly focused on Europeans: 80% of all GWAS participants are individuals of European descent, yet these make up only 16% of the world population. This is an important ethical issue in terms of healthcare inclusiveness and equitable access to the benefits of biomedical research, as the way genetic markers contribute to disease susceptibility varies across human populations. LC-WGS naturally circumvents the bias inherent to pre-established sets of genetic markers (SNP arrays). It can thus be successfully applied to underrepresented populations, as shown in this study for an African-American population as a proof-of-concept. "In addition to breaking down the financial barrier to enable GWAS studies based on LC-WGS, what is really exciting about this approach is that it enables researchers to efficiently uncover associations in understudied populations" says Simone Rubinacci, Postdoctoral Researcher in Olivier Delaneau's Group and first author of the paper.

Taking advantage of genomes already sequenced "Our original thinking was: can we make use of the wealth of sequenced genomes to improve those that are newly sequenced? In other words, more for less: this is exactly what GLIMPSE does," explains Diogo Ribeiro, Postdoctoral Researcher in Olivier Delaneau's Group and co-author of the paper. How does it work? By building on the idea that we all share relatively recent common ancestors, from which small portions of our DNA are inherited. Briefly, GLIMPSE mines large collections of human genomes that have been very accurately sequenced (high-coverage WGS) to identify portions of DNA that are shared with newly sequenced genomes. In this way, GLIMPSE can reliably fill in the gaps in the low-coverage data.

A new paradigm for future genomic studies with far-ranging applications

Made available as part of an open-source suite of tools, GLIMPSE paves the way for wide adoption of low-coverage WGS, promoting a paradigm shift in data generation for future genomic studies. Since the first release of the software as a preprint in April 2020, ongoing research has already started to use the tool, for instance to reconstruct the genomes of people living thousands of years ago from ancient DNA, or of COVID-19 patients from SARS-CoV-2 nasopharyngeal swabs as part of a GWAS study.
Box: Genotyping and genetic association studies Genetic markers are very short DNA sequences in the genome, such as single-nucleotide polymorphisms (SNP), known to vary between individuals. The procedure to determine them for an individual is called genotyping. So far, genotyping has mainly relied on SNP array technology which targets predefined panels of markers. Such sets of predefined markers are routinely used to find associations between genetic markers and complex traits in genome-wide association studies (GWAS), which contain medical records and genetic data for thousands of participants. However, SNP arrays, while relatively fast and inexpensive, also have major drawbacks, since new or rare variants, such as those present in understudied populations (read below), can go undetected.

Swiss Institute of Bioinformatics

Related Genome Articles from Brightsurf:

Genome evolution goes digital
Dr. Alan Herbert from InsideOutBio describes ground-breaking research in a paper published online by Royal Society Open Science.

Breakthrough in genome visualization
Kadir Dede and Dr. Enno Ohlebusch at Ulm University in Germany have devised a method for constructing pan-genome subgraphs at different granularities without having to wait hours and days on end for the software to process the entire genome.

Sturgeon genome sequenced
Sturgeons lived on earth already 300 million years ago and yet their external appearance seems to have undergone very little change.

A sea monster's genome
The giant squid is an elusive giant, but its secrets are about to be revealed.

Deciphering the walnut genome
New research could provide a major boost to the state's growing $1.6 billion walnut industry by making it easier to breed walnut trees better equipped to combat the soil-borne pathogens that now plague many of California's 4,800 growers.

Illuminating the genome
Development of a new molecular visualisation method, RNA-guided endonuclease -- in situ labelling (RGEN-ISL) for the CRISPR/Cas9-mediated labelling of genomic sequences in nuclei and chromosomes.

A genome under influence
References form the basis of our comprehension of the world: they enable us to measure the height of our children or the efficiency of a drug.

How a virus destabilizes the genome
New insights into how Kaposi's sarcoma-associated herpesvirus (KSHV) induces genome instability and promotes cell proliferation could lead to the development of novel antiviral therapies for KSHV-associated cancers, according to a study published Sept.

Better genome editing
Reich Group researchers develop a more efficient and precise method of in-cell genome editing.

Unlocking the genome
A team led by Prof. Stein Aerts (VIB-KU Leuven) uncovers how access to relevant DNA regions is orchestrated in epithelial cells.

Read More: Genome News and Genome Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to