Nav: Home

Crowdsourced family tree yields new insights about humanity

March 01, 2018

Thanksgiving gatherings could get bigger --a lot bigger -- as science uncovers the familial bonds that bind us. From millions of interconnected online genealogy profiles, researchers have amassed the largest, scientifically-vetted family tree to date, which at 13 million people, is slightly bigger than a nation the size of Cuba or Belgium. Published in the journal Science, the new dataset offers fresh insights into the last 500 years of marriage and migration in Europe and North America, and the role of genes in longevity.

"Through the hard work of many genealogists curious about their family history, we crowdsourced an enormous family tree and boom, came up with something unique," said the study's senior author, Yaniv Erlich, a computer scientist at Columbia University and Chief Science Officer at MyHeritage, a genealogy and DNA testing company that owns Geni.com, the platform that hosts the data used in the study. "We hope that this dataset can be useful to scientists researching a range of other topics."

The researchers downloaded 86 million public profiles from Geni.com, one of the world's largest collaborative genealogy websites, and used mathematical graph theory to clean and organize the data. What emerged among other smaller family trees was a single tree of 13 million people spanning an average of 11 generations. Theoretically, they'd need to go back another 65 generations to converge on one common ancestor and complete the tree. Still, the dataset represents a milestone by moving family-history searches from newspaper obituaries and church archives into the digital era, making population-level investigations possible. The researchers also make it easy to overlay other datasets to study a range of socioeconomic trends at scale.

"It's an exciting moment for citizen science," said Melinda Mills, a demographer at University of Oxford who was not involved in the study "It demonstrates how millions of regular people in the form of genealogy enthusiasts can make a difference to science. Power to the people!"

The dataset details when and where each individual was born and died, and mirrors the demographics of Geni.com individuals, with 85 percent of profiles originating from Europe and North America. The researchers verified that the dataset was representative of the general U.S. population's education level by cross-checking a subset of Vermont Geni.com profiles against the state's detailed death registry.

"The reconstructed pedigrees show that we are all related to each other," said Peter Visscher, a quantitative geneticist at University of Queensland who was not involved in the study. "This fact is known from basic population history principles, but what the authors have achieved is still very impressive."

Marriage, Migration and Genetic Relatedness Industrialization profoundly altered work and family life, and these trends coincide with shifting marriage choices in the data. Before 1750, most Americans found a spouse within six miles (10 kilometers) of where they were born, but for those born in 1950, that distance had stretched to about 60 miles (100 kilometers), the researchers found. "It became harder to find the love of your life," Erlich jokes.

Before 1850, marrying in the family was common -- to someone who was, on average, a fourth cousin, compared to seventh cousins today, the researchers found. Curiously, the researchers found that between 1800 and 1850, people traveled farther than ever to find a mate -- nearly 12 miles (19 kilometers) on average --but were more likely to marry a fourth cousin or closer. Changing social norms, rather than rising mobility, may have led people to shun close kin as marriage partners, they hypothesize.

In a related observation, they found that women in Europe and North America have migrated more than men over the last 300 years, but when men did migrate, they traveled significantly farther on average.

Genes and Longevity To try and untangle the role of nature and nurture in longevity, the researchers built a model and trained it on a dataset of 3 million relatives born between 1600 and 1910 who had lived past the age of 30. They excluded twins, individuals who died in the U.S. Civil War, World War I and II, or in a natural disaster (inferred if relatives died within 10 days of each other).

They compared each individual's lifespan to that of their relatives and their degree of separation and found that genes explained about 16 percent of the longevity variation seen in their data -- on the low end of previous estimates which have ranged from about 15 percent to 30 percent.

The results indicate that good longevity genes can extend someone's life by an average of five years, said Erlich. "That's not a lot," he adds. "Previous studies have shown that smoking takes 10 years off of your life. That means some life choices could matter a lot more than genetics."

Significantly, the study also shows that the genes that influence longevity act independently rather than interacting with each other, a phenomenon called epistasis. Some scientists have used epistasis to explain why large-scale genomic studies have so far failed to find the genes that encode complex traits like intelligence or longevity.

If some genetic variants act together to influence longevity, the researchers would have seen a greater correlation among closely related individuals who share more DNA, and thus more genetic interactions. However, they found a linear link between longevity and genetic relatedness, ruling out widespread epistasis.

"This is important in the field because epistasis has been proposed as a source of 'missing heritability,'" said the study's lead author, Joanna Thornycroft, a former graduate student at the Whitehead Institute for Biomedical Research, now at Wellcome Sanger Institute.

Adds Visscher: "This is entirely in line with theory and previous inference from SNP [variant] data, yet for some reason many researchers in human genetics and epidemiology continue to believe that there is a lot of non-additive genetic variation for common diseases and quantitative traits."

The dataset is available for academic research via FamiLinx.org, a website created by Erlich and his colleagues. Though FamiLinx data is anonymized, curious readers can check Geni.com to see if a family member may have added them there. If so, there is a good chance that they may have made it into the 13 million-person family tree.

In addition to his position at MyHeritage, a company that allows consumers to discover their family history through genetic tests and its genealogy platform, Erlich is a computer science professor at Columbia Engineering, a member of Columbia's Data Science Institute, and an adjunct core member of the New York Genome Center (NYGC).

Other study authors are Assaf Gordon, of NYGC and the Whitehead Institute; Tal Shor, of MyHeritage and Technion; Omer Weissbrod of Israel's Weizmann Institute of Science; Dan Geiger of Technion; Mary Wahl of Whitehead Institute, NYGC and Harvard; Michael Gershovits, Barak Markus and Mona Sheikh of Whitehead Institute; Melissa Gymrek of University of California at San Diego; and Gaurav Bhatia, Daniel MacArthur and Alkes Price of Harvard and the Broad Institute.
-end-
Study: Quantitative analysis of population-scale family trees with millions of relatives.

Media contact

Kim Martineau klm32@columbia.edu 646-717-0134

Scientist contact

Yaniv Erlich ye2148@columbia.edu

About Columbia University

Among the world's leading research universities, Columbia University in the City of New York continuously seeks to advance the frontiers of scholarship and foster a campus community deeply engaged in the complex issues of our time through teaching, research, patient care and public service. The University is comprised of 16 undergraduate, graduate and professional schools, and four affiliated colleges and seminaries in Manhattan, and a wide array of research institutes and global centers around the world. More than 40,000 students, award-winning faculty and professional staff define the University's underlying values and commitment to pursuing new knowledge and educating informed, engaged citizens. Founded in 1754 as King's College, Columbia is the fifth oldest institution of higher learning in the United States. http://www.columbia.edu

About My Heritage

MyHeritage is the leading global destination for family history and DNA. As technology thought leaders, MyHeritage has transformed family history into an activity that is accessible and instantly rewarding. Its global user community enjoys access to a massive library of historical records, the most internationally diverse collection of family trees and groundbreaking search and matching technologies. Through MyHeritage DNA, the company offers technologically advanced, affordable DNA tests that reveal users' ethnic origins and previously unknown relatives. Trusted by millions of families, MyHeritage provides an easy way to find new family members, discover ethnic origins, and to share family stories, past and present, and to treasure them for generations to come. MyHeritage is available in 42 languages. http://www.myheritage.com

About the New York Genome Center

The New York Genome Center (NYGC) is an independent, nonprofit academic research institution at the forefront of transforming biomedical research and clinical care. Founded as a collaborative venture by the region's premier academic, medical and industry leaders, the New York Genome Center's goal is to translate genomic research into new diagnostics, therapeutics and treatments for human disease. NYGC member organizations and partners are united in this unprecedented collaboration of technology, science and medicine, designed to harness the power of innovation and discoveries to advance genomic services. Their shared objective is the acceleration of medical genomics and precision medicine to benefit patients around the world.

Member institutions include: Albert Einstein College of Medicine, American Museum of Natural History, Cold Spring Harbor Laboratory, Columbia University, Hospital for Special Surgery, The Jackson Laboratory, Memorial Sloan Kettering Cancer Center, Icahn School of Medicine at Mount Sinai, New York-Presbyterian Hospital, The New York Stem Cell Foundation, New York University, Northwell Health, Princeton University, The Rockefeller University, Roswell Park Cancer Institute, Stony Brook University, Weill Cornell Medicine and IBM. For more information on the NYGC, please visit http://www.nygenome.org.

About Columbia Engineering

Columbia Engineering, based in New York City, is one of the top engineering schools in the U.S. and one of the oldest in the nation. Also known as The Fu Foundation School of Engineering and Applied Science, the School expands knowledge and advances technology through the pioneering research of its more than 200 faculty, while educating undergraduate and graduate students in a collaborative environment to become leaders informed by a firm foundation in engineering. The School's faculty are at the center of the University's cross-disciplinary research, contributing to the Data Science Institute, Earth Institute, Zuckerman Mind Brain Behavior Institute, Precision Medicine Initiative, and the Columbia Nano Initiative. Guided by its strategic vision, "Columbia Engineering for Humanity," the School aims to translate ideas into innovations that foster a sustainable, healthy, secure, connected, and creative humanity. http://engineering.columbia.edu/

About Columbia's Data Science Institute

The Data Science Institute at Columbia University is training the next generation of data scientists and developing innovative technology to serve society. With more than 250 affiliated faculty working in a wide range of disciplines, the Institute seeks to foster collaboration in advancing techniques to gather and interpret data, and to address the urgent problems facing society. The Institute works closely with industry to bring promising ideas to market. http://datascience.columbia.edu/

Columbia University School of Engineering and Applied Science

Related Marriage Articles:

Do unmarried women face shortages of partners in the US marriage market?
One explanation for declines in marriage is a shortage of economically-attractive men for unmarried women to marry.
Could marriage stave off dementia?
Dementia and marital status could be linked, according to a new Michigan State University study that found married people are less likely to experience dementia as they age.
Happy in marriage? Genetics may play a role
People fall in love for many reasons -- similar interests, physical attraction, and shared values among them.
Your genes could impact the quality of your marriage
The quality of your marriage could be affected by your genes, according to new research conducted at Binghamton University, State University of New York.
Ideal marriage partners drive Waorani warriors to war
In a new study, a team of researchers examined the social composition of raiding parties and their relationship to marriage alliances in an Amazonian tribal society, the Waorani of Ecuador.
Is student debt keeping Americans away from marriage?
Having a student loan could influence whether America's young adults first union after college is marriage or cohabitation.
Recent trends of marriage in Iran
Data about marriages in Iran points to the declining number of formal (arranged) marriages in recent decades despite strong cultural and religious traditions favoring such marriages.
Marriage name game: What kind of guy would take his wife's last name?
The study looked at whether a man's level of education -- both his own and relative to his wife's -- influences the likelihood that he chooses a nontraditional surname in marriage.
Get a grip: What your hand strength says about your marriage prospects and mortality
Researchers found men with a stronger grip were more likely to be married than men with weaker grips.
'Marriage diversity' a must-have for rock bands to businesses
The rock n' roll lore says that once a bandmate gets married, the party's over for the group.
More Marriage News and Marriage Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Listen Again: Reinvention
Change is hard, but it's also an opportunity to discover and reimagine what you thought you knew. From our economy, to music, to even ourselves–this hour TED speakers explore the power of reinvention. Guests include OK Go lead singer Damian Kulash Jr., former college gymnastics coach Valorie Kondos Field, Stockton Mayor Michael Tubbs, and entrepreneur Nick Hanauer.
Now Playing: Science for the People

#562 Superbug to Bedside
By now we're all good and scared about antibiotic resistance, one of the many things coming to get us all. But there's good news, sort of. News antibiotics are coming out! How do they get tested? What does that kind of a trial look like and how does it happen? Host Bethany Brookeshire talks with Matt McCarthy, author of "Superbugs: The Race to Stop an Epidemic", about the ins and outs of testing a new antibiotic in the hospital.
Now Playing: Radiolab

Dispatch 6: Strange Times
Covid has disrupted the most basic routines of our days and nights. But in the middle of a conversation about how to fight the virus, we find a place impervious to the stalled plans and frenetic demands of the outside world. It's a very different kind of front line, where urgent work means moving slow, and time is marked out in tiny pre-planned steps. Then, on a walk through the woods, we consider how the tempo of our lives affects our minds and discover how the beats of biology shape our bodies. This episode was produced with help from Molly Webster and Tracie Hunte. Support Radiolab today at Radiolab.org/donate.