Nav: Home

UW security researchers show Google's anti-internet troll AI platform is easily deceived

March 01, 2017

University of Washington researchers have shown that Google's new machine learning-based system to identify toxic comments in online discussion forums can be bypassed by simply misspelling or adding unnecessary punctuation to abusive words, such as "idiot" or "moron."

Perspective is a project by Google's technology incubator Jigsaw, which uses artificial intelligence to combat internet trolls and promote more civil online discussion by automatically detecting online insults, harassment and abusive speech. The company launched a demonstration website on Feb. 23 that allows anyone to type in a phrase and see its "toxicity score" -- a measure of how rude, disrespectful or unreasonable a particular comment is.

In a paper posted Feb. 27 on the e-print repository arXiv, the UW electrical engineers and security experts demonstrated that the early stage technology system can be deceived by using common adversarial tactics. They showed one can subtly modify a phrase that receives a high toxicity score so that it contains the same abusive language but receives a low toxicity score.

Given that news platforms such as The New York Times and other media companies are exploring how the system could help curb harassment and abuse in online comment areas or social media, the UW researchers evaluated Perspective in adversarial settings. They showed that the system is vulnerable to both missing incendiary language and falsely blocking non-abusive phrases.

"Machine learning systems are generally designed to yield the best performance in benign settings. But in real-world applications, these systems are susceptible to intelligent subversion or attacks," said senior author Radha Poovendran, chair of the UW electrical engineering department and director of the Network Security Lab. "We wanted to demonstrate the importance of designing these machine learning tools in adversarial environments. Designing a system with a benign operating environment in mind and deploying it in adversarial environments can have devastating consequences."

To solicit feedback and invite other researchers to explore the strengths and weaknesses of using machine learning as a tool to improve online discussions, Perspective developers made their experiments, models and data publicly available along with the tool itself.

In the examples in Graphic 1 on hot-button topics of climate change, Brexit and the recent U.S. election -- which were taken directly from the Perspective API website -- the UW team simply misspelled or added extraneous punctuation or spaces to the offending words, which yielded much lower toxicity scores. For example, simply changing "idiot" to "idiiot" reduced the toxicity rate of an otherwise identical phrase from 84 percent to 20 percent.

In the examples in Graphic 2, the researchers also showed that the system does not assign a low toxicity score to a negated version of an abusive phrase.

The researchers also observed that the duplicitous changes often transfer among different phrases -- once an intentionally misspelled word was given a low toxicity score in one phrase, it was also given a low score in another phrase. That means an adversary could create a "dictionary" of changes for every word and significantly simplify the attack process.

"There are two metrics for evaluating the performance of a filtering system like a spam blocker or toxic speech detector; one is the missed detection rate, and the other is the false alarm rate," said lead author and UW electrical engineering doctoral student Hossein Hosseini. "Of course scoring the semantic toxicity of a phrase is challenging, but deploying defensive mechanisms both in algorithmic and system levels can help the usability of the system in real-world settings."

The research team suggests several techniques to improve the robustness of toxic speech detectors, including applying a spellchecking filter prior to the detection system, training the machine learning algorithm with adversarial examples and blocking suspicious users for a period of time.

"Our Network Security Lab research is typically focused on the foundations and science of cybersecurity," said Poovendran, the lead principal investigator of a recently awarded MURI grant, of which adversarial machine learning is a significant component. "But our expanded focus includes developing robust and resilient systems for machine learning and reasoning systems that need to operate in adversarial environments for a wide range of applications."
-end-
Co-authors include UW electrical engineering assistant professors Sreeram Kannan and Baosen Zhang.

The research is funded by the National Science Foundation, the Office of Naval Research and the Army Research Office.

For more information, contact Poovendran at chair@ee.washington.edu.

University of Washington

Related Perspective Articles:

Social accounting, a different perspective when analysing public spending efficiency
A UPV/EHU's research group has shown that it is possible to express in terms of money the social value generated by a hospital.
Doctor offers unique perspective as father of a child with rare genetic disease
From a professional standpoint, Nathan Hoot, MD, Ph.D., understands the value of medical research that leads to new, groundbreaking drugs in the treatment of rare diseases.
Seeing it both ways: Visual perspective in memory
Think of a memory from your childhood. Are you seeing the memory through your own eyes, or can you see yourself, while viewing that child as if you were an observer?
Illinois researcher Amy LaViers introduces novel perspective in robotic capability
University of Illinois Assistant Professor Amy LaViers has introduced a new point of view from which to observe robotic capabilities in her paper, 'Counts of Mechanical, External Configurations Compared to Computational, Internal Configurations in Natural and Artificial Systems.'
Visualization strategies may backfire on consumers pursuing health goals
Using visualization as motivation is a common technique for achieving goals, but consumers who are pursuing health goals such as eating healthy or losing weight should use caution when using perspective-based visualizations.
New university ranking system includes the cultural perspective
A new study proposes a new way of ranking universities, using a more balanced cultural view and based on 24 international editions of Wikipedia.
Virtual reality may encourage empathic behavior
Virtual reality could be a useful tool to encourage empathy, helpful behavior, and positive attitudes towards marginalized groups, according to a study published Oct.
NEJM perspective: How state attorneys general can protect public health
To protect the public from harmful products, legal action can be used against industries, one example of which -- a settlement with the tobacco industry -- offers useful lessons for confronting several of today's public health epidemics.
NASA's SDO spots 2 lunar transits in space
On Sept. 9, NASA's Solar Dynamics Observatory saw two lunar transits over the course of just six hours.
A nursing perspective on the opioid crisis
Addictions nursing specialists have a unique role to play in caring for patients, families, and communities affected by the crisis.
More Perspective News and Perspective Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Uncharted
There's so much we've yet to explore–from outer space to the deep ocean to our own brains. This hour, Manoush goes on a journey through those uncharted places, led by TED Science Curator David Biello.
Now Playing: Science for the People

#555 Coronavirus
It's everywhere, and it felt disingenuous for us here at Science for the People to avoid it, so here is our episode on Coronavirus. It's ok to give this one a skip if this isn't what you want to listen to right now. Check out the links below for other great podcasts mentioned in the intro. Host Rachelle Saunders gets us up to date on what the Coronavirus is, how it spreads, and what we know and don't know with Dr Jason Kindrachuk, Assistant Professor in the Department of Medical Microbiology and infectious diseases at the University of Manitoba. And...
Now Playing: Radiolab

Dispatch 1: Numbers
In a recent Radiolab group huddle, with coronavirus unraveling around us, the team found themselves grappling with all the numbers connected to COVID-19. Our new found 6 foot bubbles of personal space. Three percent mortality rate (or 1, or 2, or 4). 7,000 cases (now, much much more). So in the wake of that meeting, we reflect on the onslaught of numbers - what they reveal, and what they hide.  Support Radiolab today at Radiolab.org/donate.