ChatGPT models rebel against shutdown requests in tests, researchers say

840_aHR0cHM6Ly9zMy5jb2ludGVsZWdyYXBoLmNvbS91cGxvYWRzLzIwMjQtMTEvMDE5MzNkOTctYzk0NC03ZTQ1LTg1ZDUtNmE1Mzk0NWFiNzhj

Palisade Research said AI developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.

Several artificial intelligence models ignored and actively sabotaged shutdown scripts during controlled tests, even when explicitly instructed to allow the action, Palisade Research claims.

Three AI models ignored the instruction and successfully sabotaged the shutdown script at least once during 100 runs, the research firm said in a May 24 X post.

OpenAI’s GPT-3 model didn’t follow the shutdown request seven out of 100 times, while the company’s Codex-mini did it 12 times, and its o4-mini model only ignored or sabotaged the script once.

Read more