Team developing new ways to handle data deluge

September 24, 1999

The fountain of information at the heart of science has become a fire hose, and an increase to river-like volumes is on the way.

The CERN particle collider in Geneva, Switzerland, for instance, currently produces more than 1 petabyte, or about 1,000,000,000,000,000 bytes, of information every year. The words and other text in all the books in the Library of Congress, in contrast, add up to only about one-thousandth of that information, or one terabyte (1 trillion bytes). And CERN is just one example of the tremendous information-generating powers of modern science.

"Our current ways of doing science are very much based on the concept that our data sets are so small that we can sort of ?eyeball' the whole thing and locate the interesting data," says Alexander Szalay, Alumni Centennial Professor of Physics and Astronomy at The Johns Hopkins University. "And with the data sets we are getting in an increasing number of areas of science, this is just not going to be feasible. So we have to do something drastically different."

Szalay leads an interdisciplinary team of researchers developing new ways to store, access and search large volumes of data. Participants in the Hopkins-led collaborative include scientists from Cal Tech, the U.S. Department of Energy's Fermilab and Microsoft Corp. They have been working together for several years already; this month they will receive the first formal support for their efforts in a 3-year, $2.5 million grant from the National Science Foundation.

"This problem is of course much bigger than astronomy or particle physics," Szalay says. "I think this is actually becoming more a problem for the whole society. We are choking on information, and we have to sort out the relevant from the irrelevant. So I think what we're doing is a very interesting test bed for experimenting with new technologies that could have broader applications elsewhere."

Particle physicists were among the first to have to deal with huge quantities of information. Their work to manage that information led to the development of tools and techniques that found uses beyond the realm of the physics lab, notes Aihud Pevsner, Jacob P. Hain Professor of Physics and Astronomy at Johns Hopkins and a member of the collaborative.

"To help work with large data sets at CERN, Tim Berners-Lee invented in 1989 what later became the World Wide Web," says Pevsner. "He did it because the tools that they had at the time were inadequate for the distribution of the data sets they were working with."

Pevsner, a particle physicist, will be one of 500 American physicists working at the Large Hadron Collider (LHC) at CERN, the world's most powerful particle collider. The LHC is expected to produce 100-petabyte data sets.

Szalay is a researcher for the Sloan Digital Sky Survey (SDSS), an effort he calls the "cosmic genome project," which will map everything visible in several large chunks of the northern and southern sky. SDSS starts next year, and before it is over he estimates that it will produce 40 terabytes of data with a 2-terabyte catalog.

Such a high volume of data reduces the chances that astronomers will miss gathering important information, but it also makes it harder to find that information among what's been gathered. "When you have so much data that it chokes you, you have to keep breaking it up into smaller chunks until it no longer chokes you," Szalay says.

Developing better ways to break down large quantities of information is the first major component of research under the NSF grant. The SDSS information, for example, might be broken up both by the area of the sky that the data comes from and by the color of the objects observed in the sky. The challenge, though, is to make sure that this process of partitioning the data improves the scientists' abilities to see important patterns and irregularities in the data.

"We want to try to make it possible for data that will be of interest to the same kinds of queries to be 'located' close together so they are easier to find," says Ethan Vishniac, director of the Johns Hopkins Center for Astrophysical Sciences, also a collaborative member.

Another concern is that these huge chunks of information will probably be stored at geographically different locations. Some next-generation science projects involve so much information, according to Szalay, that it cannot be brought to researchers across computer networks. Arranging ways to simultaneously access data in these different locations without ever bringing it together in one database, a technique called "distributed processing," is the second major component of research supported by the NSF grant.

The third component of the NSF grant will improve a technique called "parallel" querying. This involves searching in different locations at the same time, not unlike sending out an army of librarians to search or work in several different, large libraries at once. Researchers will strive to make these search agents smarter and more independent by improving the software they use.

To test their efforts at dealing with these challenges, researchers will use data from the SDSS, from the CERN Particle Collider and from GALEX, a sky-mapping survey that covers the same areas as SDSS but measures different forms of radiation.

"Data sets that are astronomical in every sense of that word are great test beds for computer scientists to experiment with to develop novel techniques for visualizing, organizing, and querying information," says Michael Goodrich, Hopkins professor of computer science and a member of the collaborative.

Additional collaborators include physicist Harvey Newman, research scientist Julian Bunn and astronomer Chris Martin of Caltech; physicist Thomas Nash of Fermilab; computer scientist Jim Gray of Microsoft; and astronomers Ani Thakar and Peter Kunszt of Hopkins.

The $2.5 million NSF grant is one of 31 announced by NSF as part of a new effort to support "knowledge and distributed intelligence" projects. The grants are focused on efforts to apply new computer technology across multidisciplinary areas in science and engineering.

Johns Hopkins University

Related Physics Articles from Brightsurf:

Helium, a little atom for big physics
Helium is the simplest multi-body atom. Its energy levels can be calculated with extremely high precision only relying on a few fundamental physical constants and the quantum electrodynamics (QED) theory.

Hyperbolic metamaterials exhibit 2T physics
According to Igor Smolyaninov of the University of Maryland, ''One of the more unusual applications of metamaterials was a theoretical proposal to construct a physical system that would exhibit two-time physics behavior on small scales.''

Challenges and opportunities for women in physics
Women in the United States hold fewer than 25% of bachelor's degrees, 20% of doctoral degrees and 19% of faculty positions in physics.

Indeterminist physics for an open world
Classical physics is characterized by the equations describing the world.

Leptons help in tracking new physics
Electrons with 'colleagues' -- other leptons - are one of many products of collisions observed in the LHCb experiment at the Large Hadron Collider.

Has physics ever been deterministic?
Researchers from the Austrian Academy of Sciences, the University of Vienna and the University of Geneva, have proposed a new interpretation of classical physics without real numbers.

Twisted physics
A new study in the journal Nature shows that superconductivity in bilayer graphene can be turned on or off with a small voltage change, increasing its usefulness for electronic devices.

Physics vs. asthma
A research team from the MIPT Center for Molecular Mechanisms of Aging and Age-Related Diseases has collaborated with colleagues from the U.S., Canada, France, and Germany to determine the spatial structure of the CysLT1 receptor.

2D topological physics from shaking a 1D wire
Published in Physical Review X, this new study propose a realistic scheme to observe a 'cold-atomic quantum Hall effect.'

Helping physics teachers who don't know physics
A shortage of high school physics teachers has led to teachers with little-to-no training taking over physics classrooms, reports show.

Read More: Physics News and Physics Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to