Nav: Home

Does my algorithm work? There's no shortcut for community detection

May 03, 2017

Community detection is an important tool for scientists studying networks. It provides descriptions of the large-scale network by dividing its nodes into related communities. To test community detection algorithms, researchers run the algorithm on known data from a real-world network and check to see if their results match up with existing node labels -- metadata -- from that network.

But a new paper published this week in Science Advances calls that approach into question.

Real-world networks are large and complex. Food webs, social networks, or genetic relationships may consist of hundreds, or even millions, of nodes. To understand the overarching layout of a large network, scientists design algorithms to divide the network's nodes into significant groups, which make the network easier to understand. In other words, community detection allows a researcher to zoom out, seeing big patterns in the forest, instead of being caught up in the trees. In the past, researchers have used metadata as a sort of answer key or "ground truth" to verify that their community detection algorithms are performing well.

"Unfortunately, tempting as this practice is, with real-world data, there is no answer key, no ground truth," explains Daniel Larremore, one of two lead authors of the paper and an Omidyar Fellow at the Santa Fe Institute. "Our research rigorously shows that using metadata as ground truth to validate algorithms is fundamentally problematic and introduces biases without telling us what we really need to know: does my algorithm work?"

When scientists use metadata to validate algorithms, they limit the types of communities they can validate. Larremore likens this to a teacher leading a class discussion, and only responding to students who raise points the teacher is already familiar with.

"If we want creative algorithms that can handle all kinds of challenges, then restricting the answers to one set of "ground truth" metadata means we're pushing our algorithms through this bottleneck of low diversity, and low creativity," he says. "We'll only ever get algorithms that solve a small and restricted set of problems."

Having exposed the shortcomings of metadata as a test for community detection, Larremore and co-authors Leto Peel (Université Catholique de Louvain) and Aaron Clauset (SFI, CU Boulder) go on to quash any hope of creating a universal algorithm for detecting communities by their network structures. The paper mathematically proves the first No Free Lunch Theorem for community detection: any algorithm that's exceptionally good at finding communities in one type of network must be exceptionally bad at finding communities in another.

David Wolpert, also of the Santa Fe Institute, first posited a No Free Lunch Theorem for machine learning algorithms in 1997.

The authors hope that by mathematically proving the futility of universal detection algorithms, they can, according to Larremore "free people up to work on specialist algorithms."

The new paper curbs enthusiasm for finding any single, universally optimal approach to understanding complex network datasets. Still, the authors do see a constructive side to their findings. In the final section of their paper, they reverse the usual script. Instead of using metadata to validate an algorithm's performance, as in the past, they introduce two new statistical approaches that use metadata in conjunction with the network itself to probe the more fundamental questions of network science: what are the deeper patterns between the nodes, links, and metadata alike, and how can we use these to learn about the system that the network represents?
-end-


Santa Fe Institute

Related Algorithm Articles:

Scientists use algorithm to peer through opaque brains
A new algorithm helps scientists record the activity of individual neurons within a volume of brain tissue.
Algorithm generates origami folding patterns for any shape
A new algorithm generates practical paper-folding patterns to produce any 3-D structure.
New algorithm tracks neurons in bendy brain of freely crawling worm
Scientists at Princeton University have developed a new algorithm to track neurons in the brain of the worm Caenorhabditis elegans while it crawls.
Does my algorithm work? There's no shortcut for community detection
Community detection is an important tool for scientists studying networks, but a new paper published in Science Advances calls into question the common practice of using metadata for ground truth validation.
'Cyclops' algorithm spots daily rhythms in cells
Humans, like virtually all other complex organisms on Earth, have adapted to their planet's 24-hour cycle of sunlight and darkness.
An algorithm that knows when you'll get bored with your favorite mobile game
Researchers from the Tokyo-based company Silicon Studio, led by Spanish data scientist África Periáñez, have developed a new algorithm that predicts when a user will leave a mobile game.
Algorithm identified Trump as 'not-married'
Scientists from Russia and Singapore created an algorithm that predicts user marital status with 86% precision using data from three social networks instead of one.
A novel positioning algorithm based on self-adaptive algorithm
Much attention has been paid to the Taylor series expansion (TSE) method these years, which has been extensively used for solving nonlinear equations for its good robustness and accuracy of positioning.
Algorithm can create a bridge between Clinton and Trump supporters
The article that received the best student-paper award in the Tenth International Conference on Web Search and Data Mining (WSDM 2017) builds algorithmic techniques to mitigate the rising polarization by connecting people with opposing views -- and evaluates them on Twitter.
Deep learning algorithm does as well as dermatologists in identifying skin cancer
In hopes of creating better access to medical care, Stanford researchers have trained an algorithm to diagnose skin cancer.

Related Algorithm Reading:

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Moving Forward
When the life you've built slips out of your grasp, you're often told it's best to move on. But is that true? Instead of forgetting the past, TED speakers describe how we can move forward with it. Guests include writers Nora McInerny and Suleika Jaouad, and human rights advocate Lindy Lou Isonhood.
Now Playing: Science for the People

#527 Honey I CRISPR'd the Kids
This week we're coming to you from Awesome Con in Washington, D.C. There, host Bethany Brookshire led a panel of three amazing guests to talk about the promise and perils of CRISPR, and what happens now that CRISPR babies have (maybe?) been born. Featuring science writer Tina Saey, molecular biologist Anne Simon, and bioethicist Alan Regenberg. A Nobel Prize winner argues banning CRISPR babies won’t work Geneticists push for a 5-year global ban on gene-edited babies A CRISPR spin-off causes unintended typos in DNA News of the first gene-edited babies ignited a firestorm The researcher who created CRISPR twins defends...