Tool helps clear biases from computer vision

October 01, 2020

Researchers at Princeton University have developed a tool that flags potential biases in sets of images used to train artificial intelligence (AI) systems. The work is part of a larger effort to remedy and prevent the biases that have crept into AI systems that influence everything from credit services to courtroom sentencing programs.

Although the sources of bias in AI systems are varied, one major cause is stereotypical images contained in large sets of images collected from online sources that engineers use to develop computer vision, a branch of AI that allows computers to recognize people, objects and actions. Because the foundation of computer vision is built on these data sets, images that reflect societal stereotypes and biases can unintentionally influence computer vision models.

To help stem this problem at its source, researchers in the Princeton Visual AI Lab have developed an open-source tool that automatically uncovers potential biases in visual data sets. The tool allows data set creators and users to correct issues of underrepresentation or stereotypical portrayals before image collections are used to train computer vision models. In related work, members of the Visual AI Lab published a comparison of existing methods for preventing biases in computer vision models themselves, and proposed a new, more effective approach to bias mitigation.

The first tool, called REVISE (REvealing VIsual biaSEs), uses statistical methods to inspect a data set for potential biases or issues of underrepresentation along three dimensions: object-based, gender-based and geography-based. A fully automated tool, REVISE builds on earlier work that involved filtering and balancing a data set's images in a way that required more direction from the user. The study was presented Aug. 24 at the virtual European Conference on Computer Vision.

REVISE takes stock of a data set's content using existing image annotations and measurements such as object counts, the co-occurrence of objects and people, and images' countries of origin. Among these measurements, the tool exposes patterns that differ from median distributions.

For example, in one of the tested data sets, REVISE showed that images including both people and flowers differed between males and females: Males more often appeared with flowers in ceremonies or meetings, while females tended to appear in staged settings or paintings. (The analysis was limited to annotations reflecting the perceived binary gender of people appearing in images.)

Once the tool reveals these sorts of discrepancies, "then there's the question of whether this is a totally innocuous fact, or if something deeper is happening, and that's very hard to automate," said Olga Russakovsky, an assistant professor of computer science and principal investigator of the Visual AI Lab. Russakovsky co-authored the paper with graduate student Angelina Wang and Arvind Narayanan, an associate professor of computer science.

For example, REVISE revealed that objects including airplanes, beds and pizzas were more likely to be large in the images including them than a typical object in one of the data sets. Such an issue might not perpetuate societal stereotypes, but could be problematic for training computer vision models. As a remedy, the researchers suggest collecting images of airplanes that also include the labels mountain, desert or sky.

The underrepresentation of regions of the globe in computer vision data sets, however, is likely to lead to biases in AI algorithms. Consistent with previous analyses, the researchers found that for images' countries of origin (normalized by population), the United States and European countries were vastly overrepresented in data sets. Beyond this, REVISE showed that for images from other parts of the world, image captions were often not in the local language, suggesting that many of them were captured by tourists and potentially leading to a skewed view of a country.

Researchers who focus on object detection may overlook issues of fairness in computer vision, said Russakovsky. "However, this geography analysis shows that object recognition can still can be quite biased and exclusionary, and can affect different regions and people unequally," she said.

"Data set collection practices in computer science haven't been scrutinized that thoroughly until recently," said co-author Angelina Wang, a graduate student in computer science. She said images are mostly "scraped from the internet, and people don't always realize that their images are being used [in data sets]. We should collect images from more diverse groups of people, but when we do, we should be careful that we're getting the images in a way that is respectful."

"Tools and benchmarks are an important step ... they allow us to capture these biases earlier in the pipeline and rethink our problem setup and assumptions as well as data collection practices," said Vicente Ordonez-Roman, an assistant professor of computer science at the University of Virginia who was not involved in the studies. "In computer vision there are some specific challenges regarding representation and the propagation of stereotypes. Works such as those by the Princeton Visual AI Lab help elucidate and bring to the attention of the computer vision community some of these issues and offer strategies to mitigate them."

A related study from the Visual AI Lab examined approaches to prevent computer vision models from learning spurious correlations that may reflect biases, such as overpredicting activities like cooking in images of women, or computer programming in images of men. Visual cues such as the fact that zebras are black and white, or basketball players often wear jerseys, contribute to the accuracy of the models, so developing effective models while avoiding problematic correlations is a significant challenge in the field.

In research presented in June at the virtual International Conference on Computer Vision and Pattern Recognition, electrical engineering graduate student Zeyu Wang and colleagues compared four different techniques for mitigating biases in computer vision models.

They found that a popular technique known as adversarial training, or "fairness through blindness," harmed the overall performance of image recognition models. In adversarial training, the model cannot consider information about the protected variable -- in the study, the researchers used gender as a test case. A different approach, known as domain-independent training, or "fairness through awareness," performed much better in the team's analysis.

"Essentially, this says we're going to have different frequencies of activities for different genders, and yes, this prediction is going to be gender-dependent, so we're just going to embrace that," said Russakovsky.

The technique outlined in the paper mitigates potential biases by considering the protected attribute separately from other visual cues.

"How we really address the bias issue is a deeper problem, because of course we can see it's in the data itself," said Zeyu Wang. "But in in the real world, humans can still make good judgments while being aware of our biases" -- and computer vision models can be set up to work in a similar way, he said.
In addition to Zeyu Wang and Russakovsky, other co-authors of the paper on strategies for bias mitigation were computer science graduate students Klint Qinami and Kyle Genova; Ioannis Christos Karakozis, an undergraduate from the Class of 2019; Prem Nair of the Class of 2018; and Kenji Hata, who earned a Ph.D. in computer science in 2017.

The work was supported in part by the U.S. National Science Foundation, Google Cloud, and a Yang Family Innovation Research Grant awarded by the Princeton School of Engineering and Applied Science.

Princeton University, Engineering School

Related Computer Vision Articles from Brightsurf:

Computer vision predicts congenital adrenal hyperplasia
Using computer vision, researchers have discovered strong correlations between facial morphology and congenital adrenal hyperplasia (CAH), a life-threatening genetic condition of the adrenal glands and one of the most common forms of adrenal insufficiency in children.

Computer vision app allows easier monitoring of diabetes
A computer vision technology developed by University of Cambridge engineers has now been developed into a free mobile phone app for regular monitoring of glucose levels in people with diabetes.

Computer vision helps find binding sites in drug targets
Scientists from the iMolecule group at Skoltech developed BiteNet, a machine learning (ML) algorithm that helps find drug binding sites, i.e. potential drug targets, in proteins.

Tool helps clear biases from computer vision
Researchers at Princeton University have developed a tool that flags potential biases in sets of images used to train artificial intelligence (AI) systems.

UCLA computer scientists set benchmarks to optimize quantum computer performance
Two UCLA computer scientists have shown that existing compilers, which tell quantum computers how to use their circuits to execute quantum programs, inhibit the computers' ability to achieve optimal performance.

School-based vision screening programs found 1 in 10 kids had vision problems
A school-based vision screening program in kindergarten, shown to be effective at identifying untreated vision problems in 1 in 10 students, could be useful to implement widely in diverse communities, according to new research in CMAJ (Canadian Medical Association Journal)

Researchers incorporate computer vision and uncertainty into AI for robotic prosthetics
Researchers have developed new software that can be integrated with existing hardware to enable people using robotic prosthetics or exoskeletons to walk in a safer, more natural manner on different types of terrain.

'Time is vision' after a stroke
University of Rochester researchers studied stroke patients who experienced vision loss and found that the patients retained some visual abilities immediately after the stroke but these abilities diminished gradually and eventually disappeared permanently after approximately six months.

Computer vision helps SLAC scientists study lithium ion batteries
New machine learning methods bring insights into how lithium ion batteries degrade, and show it's more complicated than many thought.

A new model of vision
MIT researchers have developed a computer model of face processing that could reveal how the brain produces richly detailed visual representations so quickly.

Read More: Computer Vision News and Computer Vision Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to