Bluesky Facebook Reddit Email

AI fails classic attention test

06.02.26 | PNAS Nexus

SAMSUNG T9 Portable SSD 2TB

SAMSUNG T9 Portable SSD 2TB transfers large imagery and model outputs quickly between field laptops, lab workstations, and secure archives.


Giving AI a classic psychological test reveals an inherent weakness in LLM decision-making abilities. Suketu Patel and colleagues explored how transformer-based machine attention differs from human attention by testing AI models on the “Stroop task,” in which words for colors are printed in colored ink, and participants are asked to name the ink color of each word while ignoring its meaning. The task is clinically used to assess executive control, especially a person’s ability to inhibit an automatic response. Although humans generally take longer to answer correctly when words and colors are mismatched than when they match, they can still perform stably and with high accuracy even on long word lists.

The authors found that when the word and ink color did not match, LLMs performed well with a list of five words. But as the list of words grew longer, AI performance degraded dramatically. GPT-4o dropped from 91% accuracy at 5 words to 57% accuracy at 10 words and 15% accuracy at 40 words. Claude 3.5 Sonnet was stable through 20 words, but crashed to 24% accuracy at 40 words. In trials with a list of words in both matching and mismatched colors, LLM performance was even worse, dropping to near 0% accuracy for the mismatched items. Similar results were found with GPT-5, Claude Opus 4.1, and Gemini 2.5. LLMs struggled to stay focused on naming the color rather than defaulting to word reading. As with humans, LLMs are better trained on word reading than on color naming, yet humans can suppress word reading in long lists and maintain focus on the task at hand. According to the authors, the performance collapse of LLMs suggests fundamental limitations compared with biological attention.

PNAS Nexus

Deficient executive control in transformer attention

2-Jun-2026

Keywords

Article Information

Contact Information

Jin Fan
The City University of New York
jin.fan@qc.cuny.edu

Source

How to Cite This Article

APA:
PNAS Nexus. (2026, June 2). AI fails classic attention test. Brightsurf News. https://www.brightsurf.com/news/1ZZYQ0D1/ai-fails-classic-attention-test.html
MLA:
"AI fails classic attention test." Brightsurf News, Jun. 2 2026, https://www.brightsurf.com/news/1ZZYQ0D1/ai-fails-classic-attention-test.html.