Third-party genetic genealogy site is vulnerable to compromised data, impersonations

October 29, 2019

DNA testing services like 23andMe, and MyHeritage are making it easier for people to learn about their ethnic heritage and genetic makeup. People can also use genetic testing results to connect to potential relatives by using third-party sites, like GEDmatch, where they can compare their DNA sequences to others in the database who have uploaded test results.

But a less happy ending is also possible. Researchers at the University of Washington have found that GEDmatch is vulnerable to multiple kinds of security risks. An adversary can use only a small number of comparisons to extract someone's sensitive genetic markers. A malicious user could also construct a fake genetic profile to impersonate someone's relative.

The team posted its findings Oct. 29. The researchers have also had this research accepted at the Network and Distributed System Security Symposium and will present these results in February in San Diego.

"People think of genetic data as being personal -- and it is. It's literally part of their physical identity," said lead author Peter Ney, a postdoctoral researcher in the UW Paul G. Allen School of Computer Science & Engineering. "This makes the privacy of genetic data particularly important. You can change your credit card number but you can't change your DNA."

The mainstream use of genetic testing results for genealogy is a relatively recent phenomenon. The initial benefits may have obscured some underlying risks, the researchers say.

"When we have a new technology, whether it is smart automobiles or medical devices, we as a society start with 'What can this do for us?' Then we start looking at it from an adversarial perspective," said co-author Tadayoshi Kohno, a professor in the Allen School. "Here we're looking at this system and asking: 'What are the privacy issues associated with sharing genetic data online?'"

To look for security issues, the team created a research account on GEDmatch. The researchers uploaded experimental genetic profiles that they created by mixing and matching genetic data from multiple databases of anonymous profiles. GEDmatch assigned these profiles an ID that people can use to do one-to-one comparisons with their own profiles.

For the one-to-one comparisons, GEDmatch produces graphics with information about how much of the two profiles match. One graphic is a bar for each of the 22 non-sex chromosomes. Each bar changes length depending on how similar the two profiles are for that chromosome. A longer bar shows that there are more matching regions, while a series of shorter bars means that there are short regions of similarity interspersed with areas that are different.

The team wanted to know if an adversary could use that bar to find out a specific DNA sequence within one region of a target's profile, such as whether or not the target has a mutation that makes them susceptible to a disease. For this search, the team designed four "extraction profiles" that they could use for one-to-one comparisons with a target profile they created. Based on whether the bar stayed in one piece -- indicating that the extraction profile and the target matched -- or split into two bars -- indicating no match -- the team was able to deduce the target's specific sequence for that region.

"Genetic information correlates to medical conditions and potentially other deeply personal traits," said co-author Luis Ceze, a professor in the Allen School. "Even in the age of oversharing information, this is most likely the kind of information one doesn't want to share for legal, medical and mental health reasons. But as more genetic information goes digital, the risks increase."

Next the researchers wondered if an adversary could use a similar technique to acquire a target's entire profile. The team focused on another GEDmatch graphic that describes how well the profiles match by showing a line of colored pixels that mark how well each DNA segment in the query matches the target: green for a complete match, yellow for a half match -- when one strand of DNA matched but not the other -- and red for no match.

Then the team played a game of 20 questions: They created 20 extraction profiles that they used for one-to-one comparisons on a target profile that they created. Based on how the pixel colors changed, they were able to pull out information about the target sequence. For five test profiles, the researchers extracted about 92% of a test's unique sequences with about 98% accuracy.

"So basically, all the adversary needs to do is upload these 20 profiles and then make 20 one-to-one comparisons to the target," Ney said. "They could write a program that automatically makes these comparisons, downloads the data and returns the result. That would take 10 seconds."

Once someone's profile is exposed, the adversary can use that information to create a profile for a false relative. The team tested this by creating a fake child for one of their experimental profiles. Because children receive half their DNA from each parent, the fake child's profile had their DNA sequences half matching the parent profile. When the researchers did a one-to-one comparison of the two profiles, GEDmatch estimated a parent-child relationship.

An adversary could generate any false relationship they wanted by changing the fraction of shared DNA, the team said.

"If GEDmatch users have concerns about the privacy of their genetic data, they have the option to delete it from the site," Ney said. "The choice to share data is a personal decision, and users should be aware that there may be some risk whenever they share data. Security is a difficult problem for internet companies in every industry."

Prior to publishing their results, the researchers shared their findings with GEDMatch, which has been working to resolve these issues, according to the GEDmatch team. The UW researchers are not affiliated with GEDmatch, however, and can't comment on the details of any fixes.

"We're only beginning to scratch the surface," Kohno said. "These discoveries are so fundamental that people might already be doing this and we don't know about it. The responsible thing for us is to disclose our findings so that we can engage a community of scientists and policymakers in a discussion about how to mitigate this issue."
This research was funded in part by the University of Washington Tech Policy Lab, which receives support from: the William and Flora Hewlett Foundation, the John D. and Catherine T. MacArthur Foundation, Microsoft, and the Pierre and Pamela Omidyar Fund at the Silicon Valley Community Foundation. This research also was funded by a grant from the Defense Advanced Research Projects Agency Molecular Informatics Program.

For more information, contact the team at

University of Washington

Related DNA Articles from Brightsurf:

A new twist on DNA origami
A team* of scientists from ASU and Shanghai Jiao Tong University (SJTU) led by Hao Yan, ASU's Milton Glick Professor in the School of Molecular Sciences, and director of the ASU Biodesign Institute's Center for Molecular Design and Biomimetics, has just announced the creation of a new type of meta-DNA structures that will open up the fields of optoelectronics (including information storage and encryption) as well as synthetic biology.

Solving a DNA mystery
''A watched pot never boils,'' as the saying goes, but that was not the case for UC Santa Barbara researchers watching a ''pot'' of liquids formed from DNA.

Junk DNA might be really, really useful for biocomputing
When you don't understand how things work, it's not unusual to think of them as just plain old junk.

Designing DNA from scratch: Engineering the functions of micrometer-sized DNA droplets
Scientists at Tokyo Institute of Technology (Tokyo Tech) have constructed ''DNA droplets'' comprising designed DNA nanostructures.

Does DNA in the water tell us how many fish are there?
Researchers have developed a new non-invasive method to count individual fish by measuring the concentration of environmental DNA in the water, which could be applied for quantitative monitoring of aquatic ecosystems.

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.

Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.

DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.

A new spin on DNA
For decades, researchers have chased ways to study biological machines.

From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.

Read More: DNA News and DNA Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to