Using digitized books as 'cultural genome,' researchers unveil quantitative approach to humanities

December 16, 2010

CAMBRIDGE, Mass. -- Researchers have created a powerful new approach to scholarship, using approximately 4 percent of all books ever published as a digital "fossil record" of human culture. By tracking the frequency with which words appear in books over time, scholars can now precisely quantify a wide variety of cultural and historical trends.

The four-year effort, led by Harvard University's Jean-Baptiste Michel and Erez Lieberman Aiden, is described this week in the journal Science.

The team, comprising researchers from Harvard, Google, Encyclopaedia Britannica, and the American Heritage Dictionary, has already used their approach -- dubbed "culturomics," by analogy with genomics -- to gain insight into topics as diverse as humanity's collective memory, the adoption of technology, the dynamics of fame, and the effects of censorship and propaganda.

"Interest in computational approaches to the humanities and social sciences dates to the 1950s," says Michel, a postdoctoral researcher based in Harvard's Department of Psychology and Program for Evolutionary Dynamics. "But attempts to introduce quantitative methods into the study of culture have been hampered by the lack of suitable data. We now have a massive dataset, available through an interface that is user-friendly and freely available to anyone."

Google will release a new online tool to accompany the paper: a simple interface that enables users to type in a word or phrase and immediately see how its usage frequency has changed over the past few centuries.

"Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena in the social sciences and humanities," says Aiden, a junior fellow in Harvard's Society of Fellows and principal investigator of the Laboratory-at-Large, part of Harvard's School of Engineering and Applied Sciences. "While browsing this cultural record is fascinating for anyone interested in what's mattered to people over time, we hope that scholars of the humanities and social sciences will find this to be a useful and powerful tool."

This dataset, which is available for download, is thousands of times larger than any previous historical corpus. It is based on the full text of about 5.2 million books, with more than 500 billion words in total. About 72 percent of its text is in English, with smaller amounts in French, Spanish, German, Chinese, Russian, and Hebrew.

It is the largest data release in the history of the humanities, the authors note, a sequence of letters 1,000 times longer than the human genome. If written in a straight line, it would reach to the moon and back 10 times over.

"Now that a significant fraction of the world's books have been digitized, it's possible for computer-aided analysis to reveal undiscovered trends in history, culture, language, and thought," says Jon Orwant, engineering manager for Google Books.

The paper describes the development of this new approach and surveys a vast range of applications, focusing on the past two centuries. The team's findings include:
-end-
Michel, Aiden, and Orwant's co-authors are Aviva Presser Aiden, Adrian Veres, Steven Pinker, and Martin A. Nowak at Harvard; Google's Matthew K. Gray, Dan Clancy, Peter Norvig, and the Google Books Team; Yuan Kui Shen at the Massachusetts Institute of Technology; Joseph P. Pickett, executive editor of the American Heritage Dictionary; and Dale Hoiberg, editor-in-chief of Encyclopaedia Britannica.

The work was funded by Google, a Foundational Questions in Evolutionary Biology Prize Fellowship, Harvard Medical School, the Harvard Society of Fellows, a Fannie and John Hertz Foundation Graduate Fellowship, a National Defense Science and Engineering Graduate Fellowship, a National Science Foundation Graduate Fellowship, the National Space Biomedical Research Institute, the National Human Genome Research Institute, the Templeton Foundation, the National Institutes of Health, and the Bill and Melinda Gates Foundation.

Links to the data and browser are available at www.culturomics.org.

Harvard University

Related Social Sciences Articles from Brightsurf:

Which is more creative, the arts or the sciences?
International expert in creativity and innovation, UniSA's Professor David Cropley, is calling for Australian schools and universities to increase their emphasis on teaching creativity, as new research shows it is a core competency across all disciplines and critical for ensuring future job success.

'Social cells' related to social behavior identified in the brain
A research team led by Professor TAKUMI Toru of Kobe University's Graduate School of Medicine (also a Senior Visiting Scientist at RIKEN Center for Biosystems Dynamics Research) have identified 'social cells' in the brain that are related to social behavior.

Social media influencers could encourage adolescents to follow social distancing guidelines
Public health bodies should consider incentivizing social media influencers to encourage adolescents to follow social distancing guidelines, say researchers.

Social grooming factors influencing social media civility on COVID-19
A new study analyzing tweets about COVID-19 found that users with larger social networks tend to use fewer uncivil remarks when they have more positive responses from others.

Social isolation during adolescence drives long-term disruptions in social behavior
Mount Sinai Researchers find social isolation during key developmental windows drives long term changes to activity patterns of neurons involved in initiating social approach in an animal model.

Case Western Reserve social sciences researchers develop new tool to assess exposure to childhood violence, trauma
One in five children in Cuyahoga County, Ohio, are either exposed to, or are victims of, violence and trauma, according to a new study from the Jack, Joseph and Morton Mandel School of Applied Social Sciences at Case Western Reserve University.

Inter faculty -- Journal of Interdisciplinary Research in Human and Social Sciences, Vol.9
Volume 9 of Inter Faculty takes up the theme of patterns of confluence and influence in the context of the movements of history.

Behavioral sciences in the promotion of oral health
The importance and value of behavioral sciences in dentistry has long been recognized and over time behavioral sciences have expanded our understanding of oral health beyond 'disease' to a broader biopsychosocial concept of oral health.

'Big data' for life sciences
Scientists have produced a co-regulation map of the human proteome, which was able to capture relationships between proteins that do not physically interact or co-localize.

Plagiarism and inclusivity shown in new study into the arts, humanities and social sciences
A new study looking at the issues arising in publication ethics that journal editors face within the arts, humanities and social sciences has highlighted that detecting plagiarism in papers submitted to a journal is the most serious issue they tackle, something which over half of editors reported encountering.

Read More: Social Sciences News and Social Sciences Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.