A New, Better Way To Search The Web

April 17, 1998

ITHACA, N.Y. -- The World Wide Web is an endless source of information, but with literally millions of pages posted by everyone from governments, universities and corporations to sixth-graders and conspiracy theorists, it's getting harder and harder to find precisely the right information.

Now a Cornell University researcher has come up with a method of searching the web that can return a list of the most valuable sites on a given topic, as well as a list of sites that index the subject. Early tests of the method have produced highly focused lists of sites on many topics, often comparable to lists carefully compiled by web search experts.

The method was developed by Jon Kleinberg, Cornell professor of computer science. An evaluation of the method was presented at the seventh International World Wide Web Conference held April 14-18 in Brisbane, Australia, in a paper by Kleinberg, David Gibson of the Department of Computer Science, University of California at Berkeley, and several IBM researchers.

Popular web-searching tools, known as engines, such as Yahoo! and AltaVista, work by hunting for keywords in the text of web pages. On some topics this can return hundreds or even thousands of pages. The algorithm (a set of rules specifying how to solve the problem) developed by Kleinberg instead works by analyzing the way web pages are linked to one another. The assumption behind this is that the most authoritative pages on a given subject will be those that are most often pointed to by other pages.

The web is annotated with "precisely the type of human judgment we need to identify authority," Kleinberg explains. "It almost says something about the way the web has evolved. I think it's about the way people link information in general, not just on the web."

Kleinberg's method does more than just identify pages with useful information about a topic, which he calls "authorities." The method also looks for pages that contain many links to pages with useful information on the topic, which he calls "hubs."

The best authorities, Kleinberg says, will be those that point to the best hubs, and the best hubs will be the ones that point to the best authorities. Kleinberg prevents this from becoming a circular definition by recalculating the relationship several times, each time moving closer to the ideal result.

He has written a search program using this technique called HITS (for Hyperlink-Induced Topic Search). HITS begins by conducting an ordinary text-based search on a topic using a search engine such as AltaVista. This collects a "root set" of about 200 pages that contain the entered keywords. It then expands the set to include all the pages linked to by pages in the root set. The expanded set might include from 1,000 to 3,000 pages.

From there on, text is ignored, and the application only looks at the way pages in the expanded set are linked to one another. The first time through, it identifies the pages that are pointed to most often by other pages, and assigns them a score, or "weight," indicating that they are more likely to be authorities. At the same time it notes the pages that contain more links to other pages and gives them more weight as hubs.

This calculation is repeated several times. Each time the program gives more authority weight to sites that link to sites with more hub weight, and more hub weight to sites that link to sites with more authority weight. Ten repetitions, Kleinberg says, are enough to return surprisingly focused lists of authorities and hubs.

The system overcomes several of the problems frequently identified with text-based searches. For example, at one time a text-based search for "Gates" didn't return the Microsoft Corp. home page because Microsoft chairman Bill Gates wasn't mentioned on the opening page. (He still isn't, but now his biography can be found by following the link "About Microsoft.") A search for "jaguar" returns a jumble of pages about cars, animals, the Jacksonville Jaguars NFL team, and the obsolete but still much-discussed Atari Jaguar computer.

In a case where a word represents more than one topic, Kleinberg's method automatically separates sites into "communities" of hubs and authorities, each representing one of the possible topics. Thus a HITS search on "jaguar" lists first a community of sites related to the Jaguar computer, because the number of web sites on this subject predominate. Further down, it listed communities relating to the football team and the car. Finally it finds sparse information relating to the animal, because this topic is simply not well represented on the web, Kleinberg says.

Communities also form when a topic is polarized: A search on "abortion" returns separate communities of pro-life and pro-choice sites, because the sites within each community link more densely to one other than to sites advocating an opposing view.

One disadvantage of the method, Kleinberg says, is that it doesn't always work for sharply focused queries. A search for "Netscape 4.04," for example, returns a general list of sites about web browsers.

The paper being presented in Brisbane is titled "Automatic Resource List Compilation by Analyzing Hyperlink Structure and Associated Text." Another paper by Kleinberg, "Authoritative Sources in a Hyperlinked Environment," was published in the Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. A related paper, "Inferring Web Communities from Link Topology," by Kleinberg, Gibson and Prabhakar Raghavan of the IBM Almaden Research Center, appears in the Proceedings of the 9th ACM Conference on Hypertext and Hypermedia, 1998.

The texts of these papers can be found on Kleinberg's web page at http://www.cs.cornell.edu/home/kleinber/.

Kleinberg developed the method while working as a visiting scientist at IBM's Almaden Research Center, on leave from Cornell. IBM has applied for a patent on the algorithm.
-end-


Cornell University

Related Weight Articles from Brightsurf:

How much postmenopause weight gain can be blamed on weight-promoting medications?
Abdominal weight gain, which is common during the postmenopause period, is associated with an array of health problems, including diabetes and heart disease.

Commercial weight management groups could support women to manage their weight after giving birth
Women who were overweight at the start of their pregnancy would welcome support after they have given birth in the form of commercial weight management groups, University of Warwick-led research has found.

Rollercoaster weight changes can repeat with second pregnancy, especially among normal-weight women
Everyone knows that gaining excess weight during one pregnancy is bad, but clinicians rarely consider weight gains and losses from one pregnancy to the next -- especially in normal-weight women.

Early and ongoing experiences of weight stigma linked to self-directed weight shaming
In a new study published today in Obesity Science and Practice, researchers at Penn Medicine and the University of Connecticut Rudd Center for Food Policy and Obesity surveyed more than 18,000 adults enrolled in the commercial weight management program WW International, and found that participants who internalized weight bias the most tended to be younger, female, have a higher body mass index (BMI), and have an earlier onset of their weight struggle

Being teased about weight linked to more weight gain among children, NIH study suggests
Youth who said they were teased or ridiculed about their weight increased their body mass by 33 percent more each year, compared to a similar group who had not been teased, according to researchers at the National Institutes of Health.

Association between weight before pregnancy, weight gain during pregnancy and adverse outcomes for mother, infant
An analysis that combined the results of 25 studies including nearly 197,000 women suggests prepregnancy body mass index (BMI) of the mother was more strongly associated with risk of adverse maternal and infant outcomes than the amount of gestational weight gain.

Study: Faster weight loss no better than slow weight loss for health benefits
Losing weight slowly or quickly won't tip the scale in your favor when it comes to overall health, according to new research.

What your choice of clothing says about your weight
It's commonly said that you can tell a great deal about a person by the clothes they wear.

Stand up -- it could help you lose weight
You might want to read this on your feet. A new study published today in the European Journal of Preventive Cardiology found that standing instead of sitting for six hours a day could prevent weight gain and help people to actually lose weight.

Cash for weight loss
A new study, published in the journal Social Science and Medicine, has shown that selling rewards programmes to participants entering a weight loss programme is a low cost strategy to increase both the magnitude and duration of weight loss.

Read More: Weight News and Weight Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.