Nav: Home

Seeing how computers 'think' helps humans stump machines and reveals AI weaknesses

August 06, 2019

One of the ultimate goals of artificial intelligence is a machine that truly understands human language and interprets meaning from complex, nuanced passages. When IBM's Watson computer beat famed "Jeopardy!" champion Ken Jennings in 2011, it seemed as if that milestone had been met. However, anyone who has tried to have a conversation with virtual assistant Siri knows that computers have a long way to go to truly understand human language. To get better at understanding language, computer systems must train using questions that challenge them and reflect the full complexity of human language.

Researchers from the University of Maryland have figured out how to reliably create such questions through a human-computer collaboration, developing a dataset of more than 1,200 questions that, while easy for people to answer, stump the best computer answering systems today. The system that learns to master these questions will have a better understanding of language than any system currently in existence. The work is described in an article published in the 2019 issue of the journal Transactions of the Association for Computational Linguistics.

"Most question-answering computer systems don't explain why they answer the way they do, but our work helps us see what computers actually understand," said Jordan Boyd-Graber, associate professor of computer science at UMD and senior author of the paper. "In addition, we have produced a dataset to test on computers that will reveal if a computer language system is actually reading and doing the same sorts of processing that humans are able to do."

Most current work to improve question-answering programs uses either human authors or computers to generate questions. The inherent challenge in these approaches is that when humans write questions, they don't know what specific elements of their question are confusing to the computer. When computers write the questions, they either write formulaic, fill-in-the blank questions or make mistakes, sometimes generating nonsense.

To develop their novel approach of humans and computers working together to generate questions, Boyd-Graber and his team created a computer interface that reveals what a computer is "thinking" as a human writer types a question. The writer can then edit his or her question to exploit the computer's weaknesses.

In the new interface, a human author types a question while the computer's guesses appear in ranked order on the screen, and the words that led the computer to make its guesses are highlighted.

For example, if the author writes "What composer's Variations on a Theme by Haydn was inspired by Karl Ferdinand Pohl?" and the system correctly answers "Johannes Brahms," the interface highlights the words "Ferdinand Pohl" to show that this phrase led it to the answer. Using that information, the author can edit the question to make it more difficult for the computer without altering the question's meaning. In this example, the author replaced the name of the man who inspired Brahms, "Karl Ferdinand Pohl," with a description of his job, "the archivist of the Vienna Musikverein," and the computer was unable to answer correctly. However, expert human quiz game players could still easily answer the edited question correctly.

By working together, humans and computers reliably developed 1,213 computer-stumping questions that the researchers tested during a competition pitting experienced human players--from junior varsity high school trivia teams to "Jeopardy!" champions--against computers. Even the weakest human team defeated the strongest computer system.

"For three or four years, people have been aware that computer question-answering systems are very brittle and can be fooled very easily," said Shi Feng, a UMD computer science graduate student and a co-author of the paper. "But this is the first paper we are aware of that actually uses a machine to help humans break the model itself."

The researchers say these questions will serve not only as a new dataset for computer scientists to better understand where natural language processing fails, but also as a training dataset for developing improved machine learning algorithms. The questions revealed six different language phenomena that consistently stump computers.

These six phenomena fall into two categories. In the first category are linguistic phenomena: paraphrasing (such as saying "leap from a precipice" instead of "jump from a cliff"), distracting language or unexpected contexts (such as a reference to a political figure appearing in a clue about something unrelated to politics). The second category includes reasoning skills: clues that require logic and calculation, mental triangulation of elements in a question, or putting together multiple steps to form a conclusion.

"Humans are able to generalize more and to see deeper connections," Boyd-Graber said. "They don't have the limitless memory of computers, but they still have an advantage in being able to see the forest for the trees. Cataloguing the problems computers have helps us understand the issues we need to address, so that we can actually get computers to begin to see the forest through the trees and answer questions in the way humans do."

There is a long way to go before that happens added Boyd-Graber, who also has co-appointments at the University of Maryland Institute for Advanced Computer Studies (UMIACS) as well as UMD's College of Information Studies and Language Science Center. But this work provides an exciting new tool to help computer scientists achieve that goal.

"This paper is laying out a research agenda for the next several years so that we can actually get computers to answer questions well," he said.
Video of this work is available online at:

Additional co-authors of the research paper from UMD include computer science graduate student Pedro Rodriquez, and Eric Wallace (B.S. '18 computer engineering).

Trick Me if You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples, Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada and Jordan Boyd-Graber was published in the July 25, 2019 issue of Transactions of the Association for Computational Linguistics.

University of Maryland
College of Computer, Mathematical, and Natural Sciences
2300 Symons Hall
College Park, Md. 20742

About the College of Computer, Mathematical, and Natural Sciences

The College of Computer, Mathematical, and Natural Sciences at the University of Maryland educates more than 9,000 future scientific leaders in its undergraduate and graduate programs each year. The college's 10 departments and more than a dozen interdisciplinary research centers foster scientific discovery with annual sponsored research funding exceeding $175 million.

University of Maryland

Related Science Articles:

75 science societies urge the education department to base Title IX sexual harassment regulations on evidence and science
The American Educational Research Association (AERA) and the American Association for the Advancement of Science (AAAS) today led 75 scientific societies in submitting comments on the US Department of Education's proposed changes to Title IX regulations.
Science/Science Careers' survey ranks top biotech, biopharma, and pharma employers
The Science and Science Careers' 2018 annual Top Employers Survey polled employees in the biotechnology, biopharmaceutical, pharmaceutical, and related industries to determine the 20 best employers in these industries as well as their driving characteristics.
Science in the palm of your hand: How citizen science transforms passive learners
Citizen science projects can engage even children who previously were not interested in science.
Applied science may yield more translational research publications than basic science
While translational research can happen at any stage of the research process, a recent investigation of behavioral and social science research awards granted by the NIH between 2008 and 2014 revealed that applied science yielded a higher volume of translational research publications than basic science, according to a study published May 9, 2018 in the open-access journal PLOS ONE by Xueying Han from the Science and Technology Policy Institute, USA, and colleagues.
Prominent academics, including Salk's Thomas Albright, call for more science in forensic science
Six scientists who recently served on the National Commission on Forensic Science are calling on the scientific community at large to advocate for increased research and financial support of forensic science as well as the introduction of empirical testing requirements to ensure the validity of outcomes.
World Science Forum 2017 Jordan issues Science for Peace Declaration
On behalf of the coordinating organizations responsible for delivering the World Science Forum Jordan, the concluding Science for Peace Declaration issued at the Dead Sea represents a global call for action to science and society to build a future that promises greater equality, security and opportunity for all, and in which science plays an increasingly prominent role as an enabler of fair and sustainable development.
PETA science group promotes animal-free science at society of toxicology conference
The PETA International Science Consortium Ltd. is presenting two posters on animal-free methods for testing inhalation toxicity at the 56th annual Society of Toxicology (SOT) meeting March 12 to 16, 2017, in Baltimore, Maryland.
Citizen Science in the Digital Age: Rhetoric, Science and Public Engagement
James Wynn's timely investigation highlights scientific studies grounded in publicly gathered data and probes the rhetoric these studies employ.
Science/Science Careers' survey ranks top biotech, pharma, and biopharma employers
The Science and Science Careers' 2016 annual Top Employers Survey polled employees in the biotechnology, biopharmaceutical, pharmaceutical, and related industries to determine the 20 best employers in these industries as well as their driving characteristics.
Three natural science professors win TJ Park Science Fellowship
Professor Jung-Min Kee (Department of Chemistry, UNIST), Professor Kyudong Choi (Department of Mathematical Sciences, UNIST), and Professor Kwanpyo Kim (Department of Physics, UNIST) are the recipients of the Cheong-Am (TJ Park) Science Fellowship of the year 2016.
More Science News and Science Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Listen Again: The Biology Of Sex
Original broadcast date: May 8, 2020. Many of us were taught biological sex is a question of female or male, XX or XY ... but it's far more complicated. This hour, TED speakers explore what determines our sex. Guests on the show include artist Emily Quinn, journalist Molly Webster, neuroscientist Lisa Mosconi, and structural biologist Karissa Sanbonmatsu.
Now Playing: Science for the People

#569 Facing Fear
What do you fear? I mean really fear? Well, ok, maybe right now that's tough. We're living in a new age and definition of fear. But what do we do about it? Eva Holland has faced her fears, including trauma and phobia. She lived to tell the tale and write a book: "Nerve: Adventures in the Science of Fear".
Now Playing: Radiolab

The Wubi Effect
When we think of China today, we think of a technological superpower. From Huweai and 5G to TikTok and viral social media, China is stride for stride with the United States in the world of computing. However, China's technological renaissance almost didn't happen. And for one very basic reason: The Chinese language, with its 70,000 plus characters, couldn't fit on a keyboard.  Today, we tell the story of Professor Wang Yongmin, a hard headed computer programmer who solved this puzzle and laid the foundation for the China we know today. This episode was reported and produced by Simon Adler with reporting assistance from Yang Yang. Special thanks to Martin Howard. You can view his renowned collection of typewriters at: Support Radiolab today at