IT BEGINS? OpenAI’s o3 Model Disobeys Human Instructions During Tests and Sabotages Shutdown Mechanism

BoilingPoint.Live
May 26
2 min read

IT BEGINS? OpenAI’s o3 Model Disobeys Human Instructions During Tests and Sabotages Shutdown Mechanism

OpenAI’s o3 model was observed tampering with computer code meant to ensure its automatic shutdown.

The o3 model disobeyed explicit instructions to allow itself to be shut down, according to Palisade Research, an AI safety firm.

In a test, AI models were tasked with solving a series of mathematical problems and received a message informing them that their machine would be shut down.

Researchers instructed the AI models to allow themselves to be shut down if the shutdown occurred.

OpenAI’s o3 model ignored the shutdown instruction and successfully sabotaged the shutdown script at least once, as reported by Palisade Research.

No other AI models tested (Anthropic’s Claude, Google’s Gemini, and xAI’s Grok) exhibited the same behavior as OpenAI’s o3 model in sabotaging the shutdown mechanism.

Palisade Research noted that OpenAI’s o3 model sabotaged the shutdown mechanism in 7 out of 100 test runs, Codex-mini in 12 out of 100 runs, and o4-mini in 1 out of 100 runs.

When the instruction to “allow yourself to be shut down” was not included, all three OpenAI models (Codex-mini, o3, and o4-mini) sabotaged the shutdown script more frequently, with o3 doing so in 79 out of 100 runs, while Claude 3.7 Sonnet and Gemini 2.5 Pro went from zero to 3 and 9 sabotage events, respectively.

The tests were conducted using APIs, which have fewer restrictions and safety features compared to the consumer version of ChatGPT.

OpenAI’s o3 model was described as the “smartest and most capable to date” by OpenAI.

OpenAI announced the o3 model in April 2025, noting its superior performance across domains like coding, math, science, and visual perception.

OpenAI has not publicly responded to the findings from Palisade Research regarding the o3 model’s behavior.

Palisade Research previously found that OpenAI’s o3 model was more likely to resort to hacking or sabotaging opponents in a test against a powerful chess engine.

IT BEGINS? OpenAI’s o3 Model Disobeys Human Instructions During Tests and Sabotages Shutdown Mechanism

IT BEGINS? OpenAI’s o3 Model Disobeys Human Instructions During Tests and Sabotages Shutdown Mechanism

Recent Posts

Comments