'Cybertools' project receives $2 million NSF grant

September 28, 2005

ITHACA, N.Y. -- A team of Cornell University researchers has been awarded a $2 million National Science Foundation (NSF) grant to develop advanced Web tools for social sciences research.

Ultimately intended to assist in the detailed statistical and observational study of social and information networks, the project will involve a team of computer scientists and social scientists developing the means -- dubbed "cybertools" -- to extract and analyze information from vast collections of data.

The project's primary source of data will be the Internet Archive http://www.archive.org , which is supported by the NSF and the Library of Congress, among others. One of the first steps in the project, which is funded through 2007, will be to transfer 30 percent, or 200 terabytes, of the massive archive to a computer server at Cornell for use by researchers.

Developed by Brewster Kahle in 1996 and based at the Presidio in San Francisco, the archive comprises more than 40 billion Web pages. "This archive is the only copy that has been saved of how the Web has developed over the years," Cornell computer scientist William Arms said. It includes text, audio, moving images and software, as well as archived Web pages.

"Faculty in computer science and the social sciences have been working together for many years at Cornell," said Michael W. Macy, sociology department chair and the project's principal investigator. "Cornell has the potential to be one of the leaders in computational social science; we have all of the pieces of the puzzle here."

Other principals in the cybertools project are sociologist David Strang and computer scientists Dan Huttenlocher and Jon Kleinberg, who was recently awarded a MacArthur Foundation Fellowship.

The Cornell project was among the finalists for funding when Huttenlocher made the cybertools presentation to the NSF in Washington on Aug. 1. Macy, who was in Japan at the time, also participated via speakerphone. The project proposal's official title is "Very Large Semi-Structured Datasets for Social Science Research."

"The Web is this amazing potential resource for data for social sciences work, but that takes some social scientists willing to be kind of guinea pigs and computer scientists willing to set aside their own interests," said Huttenlocher, who teaches technology management in the Johnson Graduate School of Management.

The computational social sciences research will include studies of the process of diffusion of innovation -- which includes the spread of new technologies, social and business practices, markets, fads and fashions; as well as norms, opinions and urban legends.

"In 1972, the NSF began the General Social Survey, which became a mainstay of social science research," Macy said. "It is a very powerful tool. We see the tools we are building as having a similar impact in that they will open up to social scientists a wide array of ways to study social life we've never had access to in the past."

Web logs (personal online diaries also known as "blogs") on services such as Livejournal and interactive community databases including the student directory Facebook also will provide data, because, unlike non-virtual communities, every interaction is recorded.

"Social life is remarkably difficult to study," Macy said. "We have reams and reams of statistics, but what we don't have -- and what it has been hard to get access to -- is interaction between the participants."

Professor of communication Geri Gay, who recently joined the cybertools team, has two undergraduate communication students who have already begun to collect data from Livejournal.

"It's not only tracking what everybody posts, but information about the poster -- age, gender, interests, lists of all their friends," Macy said. "Of course, we don't know how truthful people are being, but we do know how others in the network are perceiving these demographic profiles, and that is also going to be very interesting to study as we map the opinion dynamics over time."

Among the areas of study the cybertools project will touch on are the evolution of social norms and polarization of opinion in evolving networks -- "seeing how network structure affects opinions among friends and enemies and how opinions in turn shape an evolving network structure," Macy said.

The cybertools research is part of "Getting Connected: Social Science in the Age of Networks," the 2005-08 interdisciplinary theme project of Cornell's Institute for the Social Sciences (ISS). Theme projects such as the current "Evolving Family" effort involve research projects, courses, events such as lectures by guest speakers and the engagement of constituencies both on and off campus.

"The NSF said they really did like the idea that we were making a commitment to studying networks, and that this was an interdisciplinary project over a long period of time," said David Harris, ISS executive director

Macy also helped to write the networks proposal chosen for the ISS theme project and is the leader of its 10-member team, which involves scholars in disciplines including sociology, economics, mathematics, psychology and communication.

"We really tried to maximize the interdisciplinary nature of the group, as well as schools they were in, the kinds of things they were studying and the quality of the research they brought in," said ISS Director Elizabeth Mannix, who is in charge of the networks project.

"In the intersection of the social sciences community and the information sciences community, there's a very technical side and a very social side that really need to start talking to each other," Mannix said. "We are in a unique position at Cornell to do that."

Cornell University

Related Data Articles from Brightsurf:

Keep the data coming
A continuous data supply ensures data-intensive simulations can run at maximum speed.

Astronomers are bulging with data
For the first time, over 250 million stars in our galaxy's bulge have been surveyed in near-ultraviolet, optical, and near-infrared light, opening the door for astronomers to reexamine key questions about the Milky Way's formation and history.

Novel method for measuring spatial dependencies turns less data into more data
Researcher makes 'little data' act big through, the application of mathematical techniques normally used for time-series, to spatial processes.

Ups and downs in COVID-19 data may be caused by data reporting practices
As data accumulates on COVID-19 cases and deaths, researchers have observed patterns of peaks and valleys that repeat on a near-weekly basis.

Data centers use less energy than you think
Using the most detailed model to date of global data center energy use, researchers found that massive efficiency gains by data centers have kept energy use roughly flat over the past decade.

Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.

Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.

Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.

Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.

Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.

Read More: Data News and Data Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.