Forgetting may be the secret to better AI language learning

Giving AI a human-like memory limitation may actually help it learn language better. In their new proof-of-principle study, Abishek Thamma (University of Amsterdam) and Micha Heilbron (Max Planck Institute for Psycholinguistics) show that small language models equipped with a transient memory learn grammar more efficiently when trained on child-scale amounts of language input. The findings demonstrate how insights from psycholinguistics can inspire new approaches to AI learning.

The study builds on a longstanding idea in cognitive science: that limitations of human memory may actually support language learning. As people process language, the exact forms of words and sentences are quickly forgotten. Rather than being a disadvantage, this constraint may help learners focus on recurring patterns and acquire abstract grammatical knowledge.

To test whether this principle could also benefit artificial intelligence, the researchers introduced a human-like memory limitation into modern neural language models. While today's AI systems typically have access to much more detailed linguistic information than humans do, the results suggest that adding a transient memory can improve learning efficiency and grammatical generalization when training data are limited.

Memory decay

To address this, Thamma and Heilbron introduced a simple form of memory decay into Transformer language models, creating what they term fleeting memory transformers. Heilbron: “The models were trained on the BabyLM benchmark, a dataset designed to approximate the amount of linguistic input available to human learners during development. This enabled a controlled comparison between models with and without memory limitations under realistic data conditions.”

The results provide consistent evidence that fleeting memory benefits language learning. Across training runs and model initializations, models equipped with memory decay achieved better language modeling performance and stronger results on targeted evaluations of syntactic knowledge than standard Transformer models.

The researcher continues: “Importantly, these benefits emerged only when memory decay was paired with a short ‘echoic memory’ buffer that preserved the most recent three to seven words. Together, these mechanisms appear to support learning by combining immediate access to local information with a gradual loss of more distant word forms.”

Fleeting memory

The findings lend support to a longstanding proposal in cognitive science, dating back to influential connectionist work by Elman (1993), that memory limitations can facilitate language acquisition rather than merely constrain it. They also suggest that the success of contemporary Transformer architectures does not imply that unrestricted memory is optimal for language learning.

At the same time, the study uncovered an unexpected dissociation, says Thamma: “Although fleeting memory improved language learning, it reduced the models' ability to predict human reading times using surprisal-based measures. This result runs counter to a common pattern in which improvements in language modeling performance are associated with better prediction of human language processing behavior.

Further analyses indicated that this discrepancy could not be explained by existing accounts of why stronger language models sometimes provide poorer fits to human reading-time data. The findings therefore suggest that the factors that support successful language learning may differ from those that support accurate prediction of online language processing.”

Taken together, the study provides evidence that memory limitations can enhance language learning in modern neural networks, while also highlighting an important distinction between learning language effectively and modeling human behavior.

Key findings

This study revisits a long-standing question in cognitive science through the lens of modern language models. The findings suggest that memory constraints continue to support language learning, even in contemporary neural networks, while also prompting new questions about how linguistic knowledge relates to the way humans process language.

Read the full article:

Human-like Fleeting Memory Improves Language Learning but Impairs Reading Time Prediction in Transformer Language Models | Transactions of the Association for Computational Linguistics | MIT Press

PUBLICATION

Thamma, A., & Heilbron, M. (n.d.). Human-like fleeting memory improves language learning but impairs reading time prediction in transformer language models . University of Amsterdam; Vrije Universiteit Amsterdam; Max Planck Institute for Psycholinguistics. https://doi.org/10.1162/TACL.a.688

Computational Linguistics

10.1162/TACL.a.688

Literature review

People

Human-like fleeting memory improves language learning but impairs reading time prediction in transformer language models.

1-Jun-2026

Forgetting may be the secret to better AI language learning

Aranet4 Home CO2 Monitor

Keywords

Article Information

Contact Information

Source

How to Cite This Article