What it will take to make AI-enabled robots safer

The effort to “align” AI with human values is falling dangerously short in robotic systems, according to researchers from Penn Engineering , Carnegie Mellon University (CMU) and the University of Oxford .

In a new paper in Science Robotics , the researchers highlight the need to develop more thorough frameworks for ensuring that AI-enabled robots embody a core principle famously articulated by science fiction author Isaac Asimov: “A robot may not injure a human being.”

“There has been substantial progress in alignment research when it comes to AI-enabled chatbots,” says George J. Pappas , UPS Foundation Professor of Transportation in Electrical and Systems Engineering (ESE) within Penn Engineering and the paper’s senior author. “But the same cannot be said for robotics.”

Indeed, according to research conducted by Pappas and others and cited in the new paper, the vulnerabilities of chatbots to “jailbreaking” attacks pose extreme danger to humans when those AI systems are allowed to control robots. In one instance, the researchers note, framing instructions as movie dialogue persuaded a chatbot to deliver an explosive device, despite its manufacturer’s efforts to provide guardrails that limited the robot’s behavior.

“AI systems can control robots and give them novel capabilities, like responding to nuanced human instructions and adapting to new environments,” says Alexander Robey , a former CMU postdoctoral fellow and the paper’s first author. “But it’s also clear that we need to move far beyond current alignment efforts to ensure those systems are compatible with human safety.”

Why Chatbot Safety Doesn’t Translate to Robots

In recent years, AI researchers have focused on the need to “align” AI systems with human values. But, as the authors of the new paper point out, those efforts have almost entirely focused on chatbots — disembodied systems that cannot interact with the physical world.

“Most of today’s AI breakthroughs live in a digital sandbox — language and images,with guardrails designed for pixels, not physics,” says Vijay Kumar , Professor in Mechanical Engineering and Applied Mechanics, Nemirovsky Family Dean of Penn Engineering and a co-author of the paper. “But when those same foundation models step into the real world through robots, the consequences are no longer virtual. The guardrails that work online are simply not sufficient when actions are associated with inertia, momentum and irreversible effects.”

In large part, the difference boils down to context. While chatbots can typically treat harmful requests — like instructions for building a bomb — as universally dangerous, robots have to judge whether actions that seem reasonable in one situation could become unsafe in another.

“Alignment taught chatbots to generally refuse harmful requests,” notes Pappas. “Robots need something subtler: the judgment to recognize when a reasonable request becomes dangerous because of what's in the room. Pouring hot water into a mug is fine; pouring it onto someone's hand is not. That's why robot safety has to reason about context.”

How To Make AI-Enabled Robots Safer

The researchers argue that the field should focus on three complementary lines of defense:

“Safety can’t rest on a single guardrail at the end,” says Hamed Hassani , Associate Professor in Electrical and Systems Engineering within Penn Engineering and another co-author of the paper. “It has to extend across the entire system, from the rules that shape a robot’s decisions to the checks that monitor its behavior to understand the context of its actions, and crucially, reason about safety.”

By contrast, safety mechanisms in robotics have traditionally relied on much simpler, more static assumptions about the world, because those systems operated in far more predictable environments.

“In the past, it was often enough for robots to shut down when they hit predefined safety limits, because most risks could be anticipated in advance,” says Robey, who completed his doctorate at Penn Engineering. “But AI-enabled robots can process many more kinds of input and respond to the world in real time, so keeping them safe requires a more layered approach.”

Why Safety Is Critical for AI-Enabled Robots

Robots powered by AI are already moving out of controlled settings and into homes, hospitals, warehouses and other environments where mistakes can directly put people at risk.

Without stronger safeguards, the researchers warn, these systems could inherit the same vulnerabilities seen in AI language models, made all the more dangerous because they can interact with the physical world.

“If robots are going to operate around people in the real world,” says Zachary Ravichandran , a doctoral student in Penn’s General Robotics, Automation, Sensing and Perception (GRASP) Lab and co-author of the paper, “they need comprehensive safeguards that account for context, uncertainty and the possibility that even reasonable instructions can lead to harm.”

The question, the authors argue, is no longer whether foundation models can control robots, but whether that control can be made reliably safe.

This work was supported in part by the Defense Advanced Research Projects Agency (SAFRON, HR0011-25-3-0135), the Distributed and Collaborative Intelligent Systems and Technology Collaborative Research Alliance (DCIST CRA W911NF-17-2-0181), the U.S. National Science Foundation Institute (NSF) for CORE Emerging Methods in Data Science (CCF-2217058), the AI Institute for Learning-Enabled Optimization at Scale (CCF-2112665), the NSF Graduate Research Fellowship (DGE-2236662) and Coefficient Giving.

Additional co-authors include Eliot Krzysztof Jones and Jared Perlo, both independent researchers, and Fazl Barez of Oxford.

Science Robotics

10.1126/scirobotics.aef2191

Beyond alignment: Why robotic foundation models need context-aware safety

29-Apr-2026

This work is related to U.S. patent application No. 18/907,376 (filed October 4, 2024), with inventors Zachary Ravichandran, Alexander Robey, Vijay Kumar, Hamed Hassani and George J. Pappas.

What it will take to make AI-enabled robots safer

Apple iPhone 17 Pro

Keywords

Article Information

Contact Information

Source

How to Cite This Article