Context reduces racial bias in hate speech detection algorithms

July 06, 2020

Understanding what makes something harmful or offensive can be hard enough for humans, never mind artificial intelligence systems.

So, perhaps it's no surprise that social media hate speech detection algorithms, designed to stop the spread of hateful speech, can actually amplify racial bias by blocking inoffensive tweets by black people or other minority group members.

In fact, one previous study showed that AI models were 1.5 times more likely to flag tweets written by African Americans as "offensive"--in other words, a false positive--compared to other tweets.

Why? Because the current automatic detection models miss out on something vital: context. Specifically, hate speech classifiers are oversensitive to group identifiers like "black," "gay," or "transgender," which are only indicators of hate speech when used in some settings.

Now, a team of USC researchers has created a hate speech classifier that is more context-sensitive, and less likely to mistake a post containing a group identifier as hate speech.

To achieve this, the researchers programmed the algorithm to consider two additional factors: the context in which the group identifier is used, and whether specific features of hate speech are also present, such as dehumanizing and insulting language.

"We want to move hate speech detection closer to being ready for real-world application," said Brendan Kennedy, a computer science PhD student and co-lead author of the study, published at ACL 2020, July 6.

"Hate speech detection models often 'break,' or generate bad predictions, when introduced to real-world data, such as social media or other online text data, because they are biased by the data on which they are trained to associate the appearance of social identifying terms with hate speech."

Additional authors of the study, titled "Contextualizing Hate Speech Classifiers with Post-Hoc Explanation," are co-lead author Xisen Ji, a USC computer science PhD student, and co-authors Aida Mostafazadeh Davani, a PhD computer science student, Xiang Ren, an assistant professor of computer science and Morteza Dehghani, who holds joint appointments in psychology and computer science

Why AI bias happens

Hate speech detection is part of the ongoing effort against oppressive and abusive language on social media, using complex algorithms to flag racist or violent speech faster and better than human beings alone. But machine learning models are prone to learning human-like biases from the training data that feeds these algorithms.

For instance, algorithms struggle to determine if group identifiers like "gay" or "black" are used in offensive or prejudiced ways because they're trained on imbalanced datasets with unusually high rates of hate speech (white supremacist forums, for instance). As a result, the models find it hard to generalize to real-world applications.

"It is key for models to not ignore identifiers, but to match them with the right context," said Professor Xiang Ren, an expert in natural language processing.

"If you teach a model from an imbalanced dataset, the model starts picking up weird patterns and blocking users inappropriately."

To test the systems, the researchers accessed a large, random sample of text from "Gab," a social network with a high rate of hate speech, and "Stormfront," a white supremacist website. The text had been hand-flagged by humans as prejudiced or dehumanizing.

They then measured the state-of-the-art model's tendencies, versus their own model's, towards inappropriately flagging non-hate speech, using 12,500 New York Times articles devoid of hate speech, excepting quotation. State-of-the-art models achieved a 77 % accuracy of identifying hate versus non-hate. The USC model was able to boost this to 90 %.

"This work by itself does not make hate speech detection perfect, that is a huge project that many are working on, but it makes incremental progress," said Kennedy.

"In addition to preventing social media posts by members of protected groups from being inappropriately censored, we hope our work will help ensure that hate speech detection does not do unnecessary harm by reinforcing spurious associations of prejudice and dehumanization with social groups."
-end-


University of Southern California

Related Social Media Articles from Brightsurf:

it's not if, but how people use social media that impacts their well-being
New research from UBC Okanagan indicates what's most important for overall happiness is how a person uses social media.

Social media postings linked to hate crimes
A new paper in the Journal of the European Economic Association, published by Oxford University Press, explores the connection between social media and hate crimes.

How Steak-umm became a social media phenomenon during the pandemic
A new study outlines how a brand of frozen meat products took social media by storm - and what other brands can learn from the phenomenon.

COVID-19: Social media users more likely to believe false information
A new study led by researchers at McGill University finds that people who get their news from social media are more likely to have misperceptions about COVID-19.

Stemming the spread of misinformation on social media
New research reported in the journal Psychological Science finds that priming people to think about accuracy could make them more discerning in what they subsequently share on social media.

Looking for better customer engagement value? Be more strategic on social media
According to a new study from the University of Vaasa and University of Cyprus, the mere use of social media alone does not generate customer value, but rather, the connections and interactions between the firm and its customers -- as well as among customers themselves -- can be used strategically for resource transformation and exchanges between the interacting parties.

Exploring the use of 'stretchable' words in social media
An investigation of Twitter messages reveals new insights and tools for studying how people use stretched words, such as 'duuuuude,' 'heyyyyy,' or 'noooooooo.' Tyler Gray and colleagues at the University of Vermont in Burlington present these findings in the open-access journal PLOS ONE on May 27, 2020.

How social media platforms can contribute to dehumanizing people
A recent analysis of discourse on Facebook highlights how social media can be used to dehumanize entire groups of people.

Social media influencers could encourage adolescents to follow social distancing guidelines
Public health bodies should consider incentivizing social media influencers to encourage adolescents to follow social distancing guidelines, say researchers.

Social grooming factors influencing social media civility on COVID-19
A new study analyzing tweets about COVID-19 found that users with larger social networks tend to use fewer uncivil remarks when they have more positive responses from others.

Read More: Social Media News and Social Media Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.