Managed misalignment of AI and the impossibility of full AI-human agreement

04.14.26 | PNAS Nexus

Dr. Zenil showing on the screen a simulation of AI agents interacting and trying to infleunce one another, along with the various metrics associated with each agent in the arena. Credit: OIA

Perfect AI alignment with human values and interests is mathematically impossible, according to a study, but behavioral diversity among AI agents offers the promise of some control. Hector Zenil and colleagues used Gödel’s incompleteness theorem and Turing’s undecidability result for the Halting Problem to show that any LLM complex enough to exhibit general intelligence or superintelligence will also be computationally irreducible and produce unpredictable behavior, making forced alignment impossible. As an alternative, the authors propose a strategy of “managed misalignment,” in which competing AI agents with different cognitive styles and partially overlapping goals operate in distinct roles to check one another.

As each agent attempts to fulfill its own goals with its own modes of reasoning and ethical frameworks—what the authors dub “artificial agentic neurodivergence”—the agents will dynamically aid or thwart one another, preempting ultimate dominance by any single system. The authors simulated a “cognitive ecosystem” by prompting AI interacting agents to represent fully aligned behaviors such as optimizing human utility, partially aligned behaviors such as prioritizing the environment, or unaligned behaviors, pursuing arbitrary objectives.

The authors trialed this approach in ethical debates between a range of LLMs in which humans or prompted LLMs tried to disrupt emerging consensus. In these debates, open models showed a wider spectrum of perspectives than proprietary models, creating what the authors characterize as a more resilient AI ecosystem, one that is less likely to converge on a single opinion—which could be harmful in cases where that opinion is not aligned with human interests.

PNAS Nexus

Neurodivergent influenceability in agentic AI as a contingent solution to the AI alignment problem

14-Apr-2026

Article Information

Journal

PNAS Nexus

Article Publication Date

2026-04-14

Article Title

Neurodivergent influenceability in agentic AI as a contingent solution to the AI alignment problem

Contact Information

Hector Zenil

King's College London School of Biomedical Engineering & Imaging Sciences

hector.zenil@kcl.ac.uk

Olaf Witkowski

Cross Labs

olaf@cross-compass.com

How to Cite This Article

Managed misalignment of AI and the impossibility of full AI-human agreement

Rigol DP832 Triple-Output Bench Power Supply

Keywords

Article Information

Contact Information

Source

How to Cite This Article