Technique to extract concepts from AI models can help steer and monitor model outputs

02.19.26 | American Association for the Advancement of Science (AAAS)

AI models have their own internal representations of knowledge or concepts that are often difficult to discern, even as they are critical to the models’ output. For instance, knowing more about a model’s representation of a concept would help explain why an AI model might “hallucinate” information, or why certain prompts can trick it into responses that dodge its built-in safeguards. Daniel Beaglehole and colleagues now introduce a robust method to extract these representations of concepts, which works across several large-scale language, reasoning, and vision AI models. Their technique uses a feature extraction algorithm called the Recursive Feature Machine. By extracting concept representations with this technique, Beaglehole et al. were able to monitor these models in ways that exposed some of their vulnerabilities to behaviors like hallucinations and to steer them toward improved output responses. Surprisingly, the concept representations were transferable between different languages and could be combined with other concept representations for multi-concept steering, the researchers noted. “Together, these results suggest that the models know more than they express in responses and that understanding internal representations could lead to fundamental performance and safety improvements,” the authors write.

Science

10.1126/science.aea6792

Toward universal steering and monitoring of AI models

19-Feb-2026

Article Information

Journal

Science

DOI

10.1126/science.aea6792

Article Publication Date

2026-02-19

Article Title

Toward universal steering and monitoring of AI models

Contact Information

Science Press Package Team

American Association for the Advancement of Science/AAAS

scipak@aaas.org

How to Cite This Article

Technique to extract concepts from AI models can help steer and monitor model outputs

SAMSUNG T9 Portable SSD 2TB

Keywords

Article Information

Contact Information

How to Cite This Article