Nav: Home

Does my algorithm work? There's no shortcut for community detection

May 03, 2017

Community detection is an important tool for scientists studying networks. It provides descriptions of the large-scale network by dividing its nodes into related communities. To test community detection algorithms, researchers run the algorithm on known data from a real-world network and check to see if their results match up with existing node labels -- metadata -- from that network.

But a new paper published this week in Science Advances calls that approach into question.

Real-world networks are large and complex. Food webs, social networks, or genetic relationships may consist of hundreds, or even millions, of nodes. To understand the overarching layout of a large network, scientists design algorithms to divide the network's nodes into significant groups, which make the network easier to understand. In other words, community detection allows a researcher to zoom out, seeing big patterns in the forest, instead of being caught up in the trees. In the past, researchers have used metadata as a sort of answer key or "ground truth" to verify that their community detection algorithms are performing well.

"Unfortunately, tempting as this practice is, with real-world data, there is no answer key, no ground truth," explains Daniel Larremore, one of two lead authors of the paper and an Omidyar Fellow at the Santa Fe Institute. "Our research rigorously shows that using metadata as ground truth to validate algorithms is fundamentally problematic and introduces biases without telling us what we really need to know: does my algorithm work?"

When scientists use metadata to validate algorithms, they limit the types of communities they can validate. Larremore likens this to a teacher leading a class discussion, and only responding to students who raise points the teacher is already familiar with.

"If we want creative algorithms that can handle all kinds of challenges, then restricting the answers to one set of "ground truth" metadata means we're pushing our algorithms through this bottleneck of low diversity, and low creativity," he says. "We'll only ever get algorithms that solve a small and restricted set of problems."

Having exposed the shortcomings of metadata as a test for community detection, Larremore and co-authors Leto Peel (Université Catholique de Louvain) and Aaron Clauset (SFI, CU Boulder) go on to quash any hope of creating a universal algorithm for detecting communities by their network structures. The paper mathematically proves the first No Free Lunch Theorem for community detection: any algorithm that's exceptionally good at finding communities in one type of network must be exceptionally bad at finding communities in another.

David Wolpert, also of the Santa Fe Institute, first posited a No Free Lunch Theorem for machine learning algorithms in 1997.

The authors hope that by mathematically proving the futility of universal detection algorithms, they can, according to Larremore "free people up to work on specialist algorithms."

The new paper curbs enthusiasm for finding any single, universally optimal approach to understanding complex network datasets. Still, the authors do see a constructive side to their findings. In the final section of their paper, they reverse the usual script. Instead of using metadata to validate an algorithm's performance, as in the past, they introduce two new statistical approaches that use metadata in conjunction with the network itself to probe the more fundamental questions of network science: what are the deeper patterns between the nodes, links, and metadata alike, and how can we use these to learn about the system that the network represents?

Santa Fe Institute

Related Algorithm Articles:

Algorithm personalizes which cancer mutations are best targets for immunotherapy
As tumor cells multiply, they often spawn tens of thousands of genetic mutations.
Universal algorithm set to boost microscopes
EPFL scientists have developed an algorithm that can determine whether a super-resolution microscope is operating at maximum resolution based on a single image.
Algorithm designed to map universe, solve mysteries
Cornell University researchers have developed an algorithm designed to visualize models of the universe in order to solve some of physics' greatest mysteries.
Algorithm tells robots where nearby humans are headed
A new tool for predicting a person's movement trajectory may help humans and robots work together in close proximity.
Algorithm to transform investment banking with higher returns
A University of Bath researcher has created an algorithm which aims to remove the elements of chance, bias or emotion from investment banking decisions, a development which has the potential to reduce errors in financial decision making and improve financial returns in global markets.
Algorithm provides customized caffeine strategy for alertness
A web-based caffeine optimization tool successfully designs effective strategies to maximize alertness while avoiding excessive caffeine consumption, according to preliminary results from a new study.
New algorithm optimizes quantum computing problem-solving
Tohoku University researchers have developed an algorithm that enhances the ability of a Canadian-designed quantum computer to more efficiently find the best solution for complicated problems, according to a study published in the journal Scientific Reports.
Machine learning algorithm helps in the search for new drugs
Researchers have designed a machine learning algorithm for drug discovery which has been shown to be twice as efficient as the industry standard, which could accelerate the process of developing new treatments for disease.
Researchers create algorithm to predict PEDV outbreaks
Researchers from North Carolina State University have developed an algorithm that could give pig farms advance notice of porcine epidemic diarrhea virus (PEDV) outbreaks.
New algorithm provides a more detailed look at urban heat islands
Urban areas are warmer than the adjacent undeveloped land, a phenomenon known as the urban heat island effect.
More Algorithm News and Algorithm Current Events

Top Science Podcasts

We have hand picked the top science podcasts of 2019.
Now Playing: TED Radio Hour

Why do we revere risk-takers, even when their actions terrify us? Why are some better at taking risks than others? This hour, TED speakers explore the alluring, dangerous, and calculated sides of risk. Guests include professional rock climber Alex Honnold, economist Mariana Mazzucato, psychology researcher Kashfia Rahman, structural engineer and bridge designer Ian Firth, and risk intelligence expert Dylan Evans.
Now Playing: Science for the People

#540 Specialize? Or Generalize?
Ever been called a "jack of all trades, master of none"? The world loves to elevate specialists, people who drill deep into a single topic. Those people are great. But there's a place for generalists too, argues David Epstein. Jacks of all trades are often more successful than specialists. And he's got science to back it up. We talk with Epstein about his latest book, "Range: Why Generalists Triumph in a Specialized World".
Now Playing: Radiolab

Dolly Parton's America: Neon Moss
Today on Radiolab, we're bringing you the fourth episode of Jad's special series, Dolly Parton's America. In this episode, Jad goes back up the mountain to visit Dolly's actual Tennessee mountain home, where she tells stories about her first trips out of the holler. Back on the mountaintop, standing under the rain by the Little Pigeon River, the trip triggers memories of Jad's first visit to his father's childhood home, and opens the gateway to dizzying stories of music and migration. Support Radiolab today at