Bluesky Facebook Reddit Email

AI’s game-playing still has flaws: AlphaZero-style self-play tested on Nim

03.13.26 | Queen Mary University of London

Apple iPhone 17 Pro

Apple iPhone 17 Pro delivers top performance and advanced cameras for field documentation, data collection, and secure research communications.


Embargo: immediate.

New research published in Machine Learning shows pattern learning is not enough to train AI to tackle games – and abstract representations or hybrid approaches may help.

Many AI researchers describe game-playing as the “Formula 1” of AI: it’s a controlled test environment with clear rules and clear success criteria. This paper uses that idea as a diagnostic, by studying a very simple game Nim, a children’s matchstick game whose optimal strategy is known exactly.

Because the correct move is known for every position, we can measure whether an agent plays optimally across the state space. The research found that while small boards can work, despite heavy training and search, agents show blind spots and can miss optimal moves, and performance degrades as the board grows, with predictions approaching random. This suggests impartial games often need analytic representations, not pattern learning.

What does this mean for gaming with machines?
Self-play AIs can be very strong, but in games where both players share the “pieces” and the winning strategy is an abstract arithmetic rule, pattern-recognition from raw positions may not be enough on its own.

Wider implications:
The results don’t diminish the achievements of self-play AI in games like chess and Go. Rather, they help map where today’s methods can struggle, and where more abstract representations or hybrid approaches may be beneficial. More broadly, it’s a reminder that systems can perform well in common cases while remaining brittle in rare-but-important ones.

Dr Søren Riis Reader in Computer Science at Queen Mary University of London said: “Nim is a children’s game with a complete mathematical solution, yet AlphaZero-style self-play can still develop blind spots—becoming competitive while missing optimal moves across many positions.”

“This suggests that, for future work in AI, impressive performance alone is not proof that a system has learned the underlying principle: methods that capture abstract structure may be needed to reduce blind spots.”

--

“Impartial Games: A Challenge for Reinforcement Learning” by Dr Bei Zhou Research Associate at Imperial College London, and Dr Søren Riis Reader in Computer Science at Queen Mary University of London is published in Machine Learning

Machine Learning

10.1007/s10994-026-06996-1

Experimental study

People

Impartial Games: A Challenge for Reinforcement Learning

13-Mar-2026

no conflicts of interest

Keywords

Article Information

Contact Information

Lucia Graves
Queen Mary University of London
l.graves@qmul.ac.uk

Source

How to Cite This Article

APA:
Queen Mary University of London. (2026, March 13). AI’s game-playing still has flaws: AlphaZero-style self-play tested on Nim. Brightsurf News. https://www.brightsurf.com/news/LKNDPY3L/ais-game-playing-still-has-flaws-alphazero-style-self-play-tested-on-nim.html
MLA:
"AI’s game-playing still has flaws: AlphaZero-style self-play tested on Nim." Brightsurf News, Mar. 13 2026, https://www.brightsurf.com/news/LKNDPY3L/ais-game-playing-still-has-flaws-alphazero-style-self-play-tested-on-nim.html.