Nav: Home

Health records and genetic data from more than 100,000 Californians power medical research

June 19, 2015

By volunteering to mail saliva to researchers working with their health care provider, thousands of people in California have helped build one of the nation's most powerful medical research tools. The researchers have now published the first reports describing these volunteers' genetic characteristics, how their self-reported ethnicity relates to genetic ancestry, and details of the innovative methods that allowed them to complete DNA analysis within 14 months. The articles are published in the journal GENETICS.

"This is an incredible treasure trove of data. The information collected during medical care is much more comprehensive than the isolated measurements we would make in a traditional research study," says project co-principal investigator Neil Risch, University of California, San Francisco (UCSF). "By linking these clinical records with genomic data from each person, we now have the power to track down many genetic and environmental contributions to disease."

The data have already been used to investigate many diseases. For example, researchers have pinpointed genetic variants linked to prostate cancer, allergies, glaucoma, macular degeneration, diabetes, high cholesterol, and many more. "No matter which disease we've looked at, we found genetic variants that influence it. And the beauty of this dataset is that it covers countless diseases and traits, and the medical records are constantly being updated as the cohort grows older," says Risch.

The Genetic Epidemiology Research on Adult Health and Aging (GERA) resource was created in 2009 by a collaboration between the Kaiser Permanente Northern California Research Program on Genes, Environment, and Health (RPGEH) and the Institute for Human Genetics at UCSF.

The RPGEH is an ongoing study of more than 200,000 members of the Kaiser Permanente Medical Care Plan who have consented to share data from their electronic medical records with researchers, along with answers to survey questions on their behavior and background. The records include clinical, pharmacy, and laboratory test information. Participants also contributed saliva samples, and more than 100,000 of these samples were selected for genetic analyses performed at UCSF. These participants form the GERA cohort.

Because the average age of participants in the GERA cohort is 63, the GERA research team is focusing their efforts on aging-related diseases.

The data have been available to researchers through an application and review process managed by the RPGEH. Last year, the genetic data were also made available to the research community via the NIH program dbGaP. "The goal was to create a resource that many research groups could mine for genetic insights into a broad range of diseases," says Catherine Schaefer, GERA co-principal investigator and executive director of the RPGEH at Kaiser Permanente.

The new publications present crucial details of the methodology used to create the comprehensive genetic information on participants - including the length of their telomeres--chromosome caps that influence age-related diseases, as well as their genetic ancestry.

In one article, the team describes how they were able to process more than 100,000 samples--characterizing 70 billion genetic variants--all within the two years dictated by their funding. "In 2009 this was a huge task, it hadn't been done this fast before," says co-author Pui-Yan Kwok, of the Institute for Human Genetics, UCSF. "The assays ran 24/7, so we had to develop new processes for analyzing data in real time to alert us to any problems as soon as they happened. We also had to boost the analysis quality to make best use of the data."

Because approximately 20% of people in the study were from minority groups, the researchers improved the analysis by developing four separate ethnicity-specific gene analysis arrays, or "gene chips." Each chip was tailored to the genetic variants common in either non-Hispanic whites, African Americans, East Asians, or Latinos.

Part of the reason for ensuring the study group was ethnically diverse was to redress the traditional overrepresentation of people with European ancestry in genomic studies. One of the new articles presents a detailed genetic ancestry study, including the relationship between the genetic results and self-reported ethnicity in the cohort.

Participants indicated their race or ethnicity by selecting from as many as twenty three different race/ethnicity/nationality categories in a questionnaire. Across all possible combinations of these categories, over 50 different race/ethnicity identities were represented in the study.

"We were particularly interested in those who checked off more than one box," says Risch. "More and more people are identifying as multi-ethnic, which can pose some technical challenges for genomic studies. At the same time, it also presents opportunities for analyzing genetic and social contributions to disease differences between groups."

People who identified as multi-ethnic were younger on average than those who chose a single ethnicity, which likely reflects increasing intermarriage and social change. People who identified as a different ethnicity than their genetic siblings also tended to report a multi-ethnic identity.

The researchers have also published in GENETICS the methods they developed for automated measurement of telomere lengths. Telomeres are protective bundles of DNA and protein that cap the ends of chromosomes. Telomere DNA tends to erode with age, which leaves the chromosomes vulnerable to damage, and some disease risks have been linked with shorter telomere length.

The telomere work was led by the UCSF research group of Elizabeth Blackburn, who was awarded a Nobel Prize in 2009 for the discovery of telomeres.

The very large volume of samples to be processed meant the team had to develop a high-throughput robotic system that completed the laboratory tests in four months.

"This is the largest telomere length database ever constructed from a single study population," says Blackburn. "At the start, some were skeptical that we could get reliable data from saliva. But we had a 96 percent success rate, and the results are in fact highly consistent with conclusions from studies of blood."

The analysis confirmed that telomere lengths tended to be longer in women than men and to decline with age. And a remarkable surprise emerged: for those over 75, older people tended to have longer, not shorter telomeres. This suggests that in people older than 75, longer telomeres are associated with longer life. The team is also examining correlations between telomere length and disease, as well as behavioral and environmental factors.

"This project is one of the earliest examples of precision medicine in the US, an approach that takes into account differences in the genes, environment, and lifestyles of individuals and leverages large clinical datasets to identify these individual risk factors," says Schaefer. "The powerful GERA resource is just a taste of what is to come."

Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort

Yambazi Banda, Mark N Kvale, Thomas J Hoffmann, Stephanie E Hesselson, Dilrini Ranatunga, Hua Tang, Chiara Sabatti, Lisa A Croen, Brad P Dispensa, Mary Henderson, Carlos Iribarren, Eric Jorgenson, Lawrence H Kushi, Dana Ludwig, Diane Olberg, Charles P Quesenberry Jr, Sarah Rowell, Marianne Sadler, Lori C Sakoda, Stanley Sciortino, Ling Shen, David Smethurst, Carol P Somkin, Stephen K Van Den Eeden, Lawrence Walter, Rachel A Whitmer, Pui-Yan Kwok, Catherine Schaefer, and Neil Risch (2015). Genetics. Early Online June 19, 2015. doi: 10.1534/genetics.115.178616

Genotyping Informatics and Quality Control for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort

Mark N Kvale, Stephanie Hesselson,Thomas J Hoffmann, Yang Cao, David Chan, Sheryl Connell, Lisa A Croen, Brad P Dispensa, Jasmin Eshragh, Andrea Finn, Jeremy Gollub, Carlos Iribarren, Eric Jorgenson, Lawrence H Kushi, Richard Lao, Yontao Lu, Dana Ludwig, Gurpreet K Mathauda, William B. McGuire, Gangwu Mei, Sunita Miles, Michael Mittman, Mohini Patil, Charles P Quesenberry Jr, Dilrini Ranatunga, Sarah Rowell, Marianne Sadler, Lori C Sakoda, Michael Shapero, Ling Shen, Tanu Shenoy, David Smethurst, Carol P Somkin, Stephen K Van Den Eeden, Lawrence Walter, Eunice Wan, Teresa Webster, Rachel A Whitmer, Simon Wong, Chia Zau, Yiping Zhan, Catherine Schaefer, Pui-Yan Kwok, and Neil Risch (2015). Genetics. Early Online June 19, 2015, doi: doi:10.1534/genetics.115.178905

Automated assay of telomere length measurement and informatics for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.

Kyle Lapham, Mark N Kvale, Jue Lin, Sheryl Connell, Lisa A Croen, Brad P Dispensa, Lynn Fang, Stephanie Hesselson, Thomas J Hoffmann,Carlos Iribarren, Eric Jorgenson,Lawrence H Kushi, Dana Ludwig, Tetsuya Matsuguchi,William B McGuire , Sunita Miles, Charles P Quesenberry Jr, Sarah Rowell, Marianne Sadler, Lori C Sakoda, David Smethurst, Carol P Somkin, Stephen K Van Den Eeden, Lawrence Walter,Rachel A Whitmer, Pui-Yan Kwok, Neil Risch, Catherine Schaefer, and Elizabeth H. Blackburn (2015). Genetics. Early Online June 19, 2015 doi:10.1534/genetics.115.178624

Founded in 1931, the Genetics Society of America (GSA) is the professional scientific society for genetics researchers and educators. The Society's more than 5,000 members worldwide work to deepen our understanding of the living world by advancing the field of genetics, from the molecular to the population level. GSA promotes research and fosters communication through a number of GSA-sponsored conferences including regular meetings that focus on particular model organisms. GSA publishes two peer-reviewed, peer-edited scholarly journals:GENETICS, which has published high quality original research across the breadth of the field since 1916, and G3: Genes|Genomes|Genetics, an open-access journal launched in 2011 to disseminate high quality foundational research in genetics and genomics. The Society also has a deep commitment to education and fostering the next generation of scholars in the field. For more information about GSA, please visit

Genetics Society of America

Related Dna Articles:

Zigzag DNA
How the cell organizes DNA into tightly packed chromosomes. Nature publication by Delft University of Technology and EMBL Heidelberg.
Scientists now know what DNA's chaperone looks like
Researchers have discovered the structure of the FACT protein -- a mysterious protein central to the functioning of DNA.
DNA is like everything else: it's not what you have, but how you use it
A new paradigm for reading out genetic information in DNA is described by Dr.
A new spin on DNA
For decades, researchers have chased ways to study biological machines.
From face to DNA: New method aims to improve match between DNA sample and face database
Predicting what someone's face looks like based on a DNA sample remains a hard nut to crack for science.
Self-healing DNA nanostructures
DNA assembled into nanostructures such as tubes and origami-inspired shapes could someday find applications ranging from DNA computers to nanomedicine.
DNA design that anyone can do
Researchers at MIT and Arizona State University have designed a computer program that allows users to translate any free-form drawing into a two-dimensional, nanoscale structure made of DNA.
DNA find
A Queensland University of Technology-led collaboration with University of Adelaide reveals that Australia's pint-sized banded hare-wallaby is the closest living relative of the giant short-faced kangaroos which roamed the continent for millions of years, but died out about 40,000 years ago.
DNA structure impacts rate and accuracy of DNA synthesis
DNA sequences with the potential to form unusual conformations, which are frequently associated with cancer and neurological diseases, can in fact slow down or speed up the DNA synthesis process and cause more or fewer sequencing errors.
Changes in mitochondrial DNA control how nuclear DNA mutations are expressed in cardiomyopathy
Differences in the DNA within the mitochondria, the energy-producing structures within cells, can determine the severity and progression of heart disease caused by a nuclear DNA mutation.
More DNA News and DNA Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Climate Mindset
In the past few months, human beings have come together to fight a global threat. This hour, TED speakers explore how our response can be the catalyst to fight another global crisis: climate change. Guests include political strategist Tom Rivett-Carnac, diplomat Christiana Figueres, climate justice activist Xiye Bastida, and writer, illustrator, and artist Oliver Jeffers.
Now Playing: Science for the People

#562 Superbug to Bedside
By now we're all good and scared about antibiotic resistance, one of the many things coming to get us all. But there's good news, sort of. News antibiotics are coming out! How do they get tested? What does that kind of a trial look like and how does it happen? Host Bethany Brookeshire talks with Matt McCarthy, author of "Superbugs: The Race to Stop an Epidemic", about the ins and outs of testing a new antibiotic in the hospital.
Now Playing: Radiolab

Speedy Beet
There are few musical moments more well-worn than the first four notes of Beethoven's Fifth Symphony. But in this short, we find out that Beethoven might have made a last-ditch effort to keep his music from ever feeling familiar, to keep pushing his listeners to a kind of psychological limit. Big thanks to our Brooklyn Philharmonic musicians: Deborah Buck and Suzy Perelman on violin, Arash Amini on cello, and Ah Ling Neu on viola. And check out The First Four Notes, Matthew Guerrieri's book on Beethoven's Fifth. Support Radiolab today at