NHGRI researchers generate complete human X chromosome sequence

July 14, 2020

Researchers at the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH), have produced the first end-to-end DNA sequence of a human chromosome. The results, published today in the journal Nature, show that generating a precise, base-by-base sequence of a human chromosome is now possible, and will enable researchers to produce a complete sequence of the human genome.

"This accomplishment begins a new era in genomics research," said Eric Green, M.D., Ph.D., NHGRI director. "The ability to generate truly complete sequences of chromosomes and genomes is a technical feat that will help us gain a comprehensive understanding of genome function and inform the use of genomic information in medical care."

After nearly two decades of improvements, the reference sequence of the human genome is the most accurate and complete vertebrate genome sequence ever produced. However, there are hundreds of gaps or missing DNA sequences that are unknown.

These gaps most often contain repetitive DNA segments that are exceptionally difficult to sequence. Yet, these repetitive segments include genes and other functional elements that may be relevant to human health and disease.

Because a human genome is incredibly long, consisting of about 6 billion bases, DNA sequencing machines cannot read all the bases at once. Instead, researchers chop the genome into smaller pieces, then analyze each piece to yield sequences of a few hundred bases at a time. Those shorter DNA sequences must then be put back together.

Senior author Adam Phillippy, Ph.D., at National Human Genome Research Institute (NHGRI) compared this issue to solving a puzzle.

"Imagine having to reconstruct a jigsaw puzzle. If you are working with smaller pieces, each contains less context for figuring out where it came from, especially in parts of the puzzle without any unique clues, like a blue sky," he said. "The same is true for sequencing the human genome. Until now, the pieces were too small, and there was no way to put the hardest parts of the genome puzzle together."

Of the 24 human chromosomes (including X and Y), study authors Phillippy and Karen Miga, Ph.D., at the University of California, Santa Cruz, chose to complete the X chromosome sequence first, due to its link with a myriad of diseases, including hemophilia, chronic granulomatous disease and Duchenne muscular dystrophy.

Humans have two sets of chromosomes, one set from each parent. For example, biologically female humans inherit two X chromosomes, one from their mother and one from their father. However, those two X chromosomes are not identical and will contain many differences in their DNA sequences.

In this study, researchers did not sequence the X chromosome from a normal human cell. Instead, they used a special cell type - one that has two identical X chromosomes. Such a cell provides more DNA for sequencing than a male cell, which has only a single copy of an X chromosome. It also avoids sequence differences encountered when analyzing two X chromosomes of a typical female cell.

The authors and their colleagues capitalized on new technologies that can sequence long segments of DNA. Instead of preparing and analyzing small pieces of DNA, they used a method that leaves DNA molecules largely intact. These large DNA molecules were then analyzed by two different instruments. Each of them generates very long DNA sequences - something previous instruments could not accomplish.

After analyzing the human X chromosome in this fashion, Phillippy and his team used their newly developed computer program to assemble the many segments of generated sequences. Miga's group led the effort to close the largest remaining sequence gap on the X chromosome, the roughly 3 million bases of repetitive DNA found at the middle portion of the chromosome, called the centromere.

There is no "gold standard" for researchers to critically evaluate the accuracy of assembling such highly repetitive DNA sequences. To help confirm the validity of the generated sequence, Miga and her collaborators performed several validation steps.

"We have never actually seen these sequences before in our genome, and do not have many tools to test if the predictions we are making are correct. This is why it is important to have specialists in the genomics community weigh in and ensure the final product is high-quality," Miga said.

The effort is part of a broader initiative by the Telomere-to-Telomere (T2T) consortium, partially funded by NHGRI. The consortium aims to generate a complete reference sequence of the human genome.

The T2T consortium is continuing its efforts with the remaining human chromosomes, aiming to generate a complete human genome sequence in 2020.

"We don't yet know what we'll find in the newly uncovered sequences. It is the exciting unknown of discovery. This is the era of complete genome sequences, and we are embracing it wholeheartedly," Phillippy said.

Potential challenges remain. Chromosomes 1 and 9, for example, have repetitive DNA segments that are much larger than the ones encountered on the X chromosome.

"We know these previously uncharted sites in our genome are very different among individuals, but it is important to start figuring out how these differences contribute to human biology and disease," Miga said. Both Phillippy and Miga agree that enhancing sequencing methods will continue to create new opportunities in human genetics and genomics.
-end-


NIH/National Human Genome Research Institute

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.