Nav: Home

New method reveals high similarity between gorilla and human Y chromosome

March 02, 2016

A new, less expensive, and faster method now has been developed and used to determine the DNA sequence of the male-specific Y chromosome in the gorilla. The technique will allow better access to genetic information of the Y chromosome of any species and thus can be used to study male infertility disorders and male-specific mutations. It also can aid in conservation genetics efforts by helping to trace paternity and to track how males move within and between populations in endangered species, like gorillas.

A paper describing the method and the discovery resulting from its use in comparing the sequence of the gorilla Y chromosome to the sequences of the human and chimpanzee Y chromosomes will be published on March 2, 2016 in the Advance Online edition of the journal Genome Research. The article also will be published in the April 2016 print issue of the journal.

"Surprisingly, we found that in many ways the gorilla Y chromosome is more similar to the human Y chromosome than either is to the chimpanzee Y chromosome," said Kateryna Makova, the Francis R. and Helen M. Pentz Professor of Science at Penn State and one of two corresponding authors of the paper. "In regions of the chromosome where we can align all three species, the sequence similarity fits with what we know about the evolutionary relationships among the species -- humans are more closely related to chimpanzees. However, the chimpanzee Y chromosome appears to have undergone more changes in the number of genes and contains a different amount of repetitive elements compared to the human or gorilla. Moreover, a greater proportion of the gorilla Y sequences can be aligned to the human than to the chimpanzee Y chromosome."

The Y chromosome of mammals is incredibly difficult to sequence for a number of reasons. One reason is that the Y chromosome is present in only one copy and makes up only about one to two percent of the total genetic material found in a cell of a male. To reduce this difficulty, the researchers used an experimental technique called flow-sorting to preferentially select the Y chromosome for sequencing based on the chromosome's size and genetic content.

"Flow-sorting increased the amount of the Y chromosome in our dataset to about thirty percent," said Paul Medvedev, assistant professor of computer science and engineering and of biochemistry and molecular biology at Penn State, the other corresponding author of the paper. "To further enrich our data for the Y chromosome, we developed a computational technique -- called RecoverY -- to sort the data into Y and non-Y sequences based on how frequently similar sequences appeared in our data."

The Y chromosome, like all DNA, is composed of a series of molecules called "bases" that are represented by the letters A, T, C, and G. Current genetic sequencing technologies produce "reads" of sequence that are much shorter than the entire length of the chromosome. These reads need to be placed in order and pieced together by finding places where they overlap into longer and longer chunks. The research team used two different sequencing technologies to help with this assembly of the DNA sequence of the Y chromosome.

One sequencing technology used by the researchers produces massive amounts of very short reads -- about 150 to 250 bases in length. Using this method, the researchers sequenced enough reads to cover the entire length of the Y chromosome about 450 times. The researchers assembled these short reads into longer chunks that they then further connected using the second sequencing technology that produces longer reads -- about seven thousand bases in length on average.

"By reducing non-Y chromosome reads from our data with flow sorting and the RecoverY technique that we developed, and by using this combination of sequencing technologies, we were able to assemble the gorilla Y chromosome so that more than half of the sequence data was in chunks longer than about 100,000 bases in length," said Medvedev.

Another reason that determining the genetic sequence of the Y chromosome is so difficult is that it is composed of an unusually high number of repeated sequences -- regions where the sequence of As, Ts, Cs, and Gs are identical, or nearly identical, for thousands or millions of bases in a row. Many of these repeats, including some genes, appear as back-to-back series of the same repeated sequence or as long palindromes which, like the word "racecar," read the same forward and backward. The researchers used an experimental technique -- "droplet digital polymerase chain reaction" -- to determine the number of copies of the genes that appear in these series.

"Sequencing the Y chromosome is like trying to put together a jigsaw puzzle, without knowing the final picture, from a pile of pieces where only about one out of every hundred is useful, and most of the pieces you do need look identical," said Makova. "We've developed a pipeline for sequencing the Y chromosome that is more efficient than previous methods and reduces a number of the difficulties associated with determining the genetic sequence of the Y chromosome. Our method will open the door for studying the Y chromosome for more labs, more species, and more individuals within those species."

To demonstrate the utility of the gorilla Y chromosome sequence they generated, the researchers designed genetic markers that can be used to differentiate the genetic relatedness among male gorillas and thus to aid in conservation genetics efforts targeted at preserving this endangered species.
In addition to Makova and Medvedev, the research team includes Marta Tomaszkiewicz, Samarth Rangavittal, Monika Cechova, Rebeca Campos-Sanchez, Howard W. Fescemyer, Robert Harris, Danling Ye, and Rayan Chikhi at Penn State; Malcom A. Ferguson-Smith and Patricia C. M. O'Brien at the University of Cambridge in the United Kingdom; and Oliver Ryder at the San Diego Zoo.

The research was funded by the National Science Foundation (award numbers DBI-ABI 0965596, DBI-1356529, IIS-1453527, IIS-1421908, and CCF-1439057); the Penn State Clinical and Translational Sciences Institute; the Pennsylvania Department of Health; Computation, Bioinformatics, and Statistics Predoctoral Training Program funded by the National Institutes of Health and Penn State; the John and Beverly Stauffer Foundation; the Alice B. Tyler Charitable Trust; and the Leverhulme Trust.


Kateryna Makova:, (+1) 814-863-1619

Barbara Kennedy (PIO):, (+1) 814-863-4682


A photo to illustrate this story is available for download at

PHOTO CAPTION: Jim (on the right), whose Y chromosome was sequenced, together with Dolly, his mother, and Binti, his sister.

PHOTO CREDIT: San Diego Zoo Global


After the journal's news embargo lifts, this press release will be archived at

Penn State

Related Dna Articles:

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.
In one direction or the other: That is how DNA is unwound
DNA is like a book, it needs to be opened to be read.
DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.
A new spin on DNA
For decades, researchers have chased ways to study biological machines.
From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.
Self-healing DNA nanostructures
DNA assembled into nanostructures such as tubes and origami-inspired shapes could someday find applications ranging from DNA computers to nanomedicine.
DNA design that anyone can do
Researchers at MIT and Arizona State University have designed a computer program that allows users to translate any free-form drawing into a two-dimensional, nanoscale structure made of DNA.
DNA find
A Queensland University of Technology-led collaboration with University of Adelaide reveals that Australia's pint-sized banded hare-wallaby is the closest living relative of the giant short-faced kangaroos which roamed the continent for millions of years, but died out about 40,000 years ago.
DNA structure impacts rate and accuracy of DNA synthesis
DNA sequences with the potential to form unusual conformations, which are frequently associated with cancer and neurological diseases, can in fact slow down or speed up the DNA synthesis process and cause more or fewer sequencing errors.
Changes in mitochondrial DNA control how nuclear DNA mutations are expressed in cardiomyopathy
Differences in the DNA within the mitochondria, the energy-producing structures within cells, can determine the severity and progression of heart disease caused by a nuclear DNA mutation.
More Dna News and Dna Current Events

Top Science Podcasts

We have hand picked the top science podcasts of 2019.
Now Playing: TED Radio Hour

In & Out Of Love
We think of love as a mysterious, unknowable force. Something that happens to us. But what if we could control it? This hour, TED speakers on whether we can decide to fall in — and out of — love. Guests include writer Mandy Len Catron, biological anthropologist Helen Fisher, musician Dessa, One Love CEO Katie Hood, and psychologist Guy Winch.
Now Playing: Science for the People

#543 Give a Nerd a Gift
Yup, you guessed it... it's Science for the People's annual holiday episode that helps you figure out what sciency books and gifts to get that special nerd on your list. Or maybe you're looking to build up your reading list for the holiday break and a geeky Christmas sweater to wear to an upcoming party. Returning are pop-science power-readers John Dupuis and Joanne Manaster to dish on the best science books they read this past year. And Rachelle Saunders and Bethany Brookshire squee in delight over some truly delightful science-themed non-book objects for those whose bookshelves are already full. Since...
Now Playing: Radiolab

An Announcement from Radiolab