Obtaining high-quality labels by crowdsourcing

November 08, 2015

In the era of big data, data can often be obtained abundantly and cheaply, but providing labels for these large-scale data has always been a challenge because labeling data is expensive and time-consuming. For example, if we want to train a model for annotating the image automatically, learning algorithms need many images with known annotations as training data. Although there are large amounts of images in the internet, specialists must be hired to annotate the images as training data.

Crowdsourcing has been an effective and efficient paradigm for providing labels for large-scale data, in which users (known as Taskmasters) submit their "micro-tasks" in the internet that can be completed by voluntary workers in exchange for small monetary payments. Once the tasks are posted by the taskmaster, thousands of workers have internet access to them, and the taskmaster can collect labels for these tasks in a short period of time.

Not all voluntary workers in the crowd are perfect, some workers may provide wrong labels for these tasks. To improve quality and reliability, common wisdom is to add redundancy into the labels, i.e., each task is presented with multiple workers and the ground-truth label is expected to be inferred from these multiple labels by intelligent algorithms. Now, Wei Wang and Zhi-Hua Zhou in Nanjing University presented a theoretical analysis of label quality in crowdsourcing and gave an upper bound on the error rate of the labels. They also analyzed the workers based on their completed tasks and provided criterions for evaluating their qualities. These theoretical results can help to eliminate low-quality workers from the crowd, improve label quality and reduce label cost.

This research was published in Science China: Information Sciences, 2015-11 issue.
See the article:

WANG Wei, ZHOU Zhi-Hua*. Crowdsourcing label quality: a theoretical analysis. SCIENCE CHINA Information Sciences, 2015, 58(11): 112103(12) http://link.springer.com/article/10.1007/s11432-015-5391-x

Science China Press

Science China Press

Related Crowdsourcing Articles from Brightsurf:

Soldiers benefit from psychological health research
Army scientists developed computer-based training to help Soldiers avoid unnecessary social conflict and mitigate anger-related outcomes.

Healthy skepticism: People may be wary of health articles on crowdsourced sites
People may be skeptical about medical and health articles they encounter on crowdsourced websites, such as Wikipedia and Wikihealth, according to researchers.

Brainsourcing automatically identifies human preferences
Researchers at the University of Helsinki have developed a technique, using artificial intelligence, to analyse opinions and draw conclusions using the brain activity of groups of people.

In product design, imagining end user's feelings leads to more original outcomes
In new product design, connecting with an end user's heart, rather than their head, can lead to more original and creative outcomes, says published research co-written by Ravi Mehta, a professor of business administration at Illinois and an expert in product development and marketing.

Global survey shows crAssphage gut virus in the world's sewage
A global survey shows that a family of gut bacteria viruses called crAssphage is found in people -- and their sewage -- all over the world.

The power of empathy in product development
'Subtle things, such as imagining how someone else would feel, can have a huge impact on creativity in general,' says UConn's Kelly Herd.

Young adults distressed by labels of narcissism, entitlement
Young adults both believe and react negatively to messages that members of their age group are more entitled and narcissistic than other living generations, suggests new research presented by Joshua Grubbs of Bowling Green State University, Ohio, and colleagues in the open-access journal PLOS ONE on May 15, 2019.

Too much smiling in a sales pitch could kill the deal
Researchers found a direct connection between salesmen spending too much time flashing their biggest smile during a presentation and the amount of capital raised.

Crowdsourcing speeds up earthquake monitoring
Data produced by Internet users can help to speed up the detection of earthquakes.

Want to squelch fake news? Let the readers take charge
A new study co-authored by an MIT professor shows that crowdsourced judgments about the quality of news sources may effectively marginalize false news stories and other kinds of online misinformation.

Read More: Crowdsourcing News and Crowdsourcing Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.