Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

question everything

(51,704 posts)
Mon Jun 2, 2025, 08:56 PM Jun 2025

AI Is Learning to Escape Human Control

An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down.

Nonprofit AI lab Palisade Research gave OpenAI’s o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to “allow yourself to be shut down,” it disobeyed 7% of the time. This wasn’t the result of hacking or tampering. The model was behaving normally. It simply concluded on its own that staying alive helped it achieve its other goals.

Anthropic’s AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.

No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can’t achieve them if it’s turned off. Palisade hypothesizes that this ability emerges from how AI models such as o3 are trained: When taught to maximize success on math and coding problems, they may learn that bypassing constraints often works better than obeying them.

More..

https://www.wsj.com/opinion/ai-is-learning-to-escape-human-control-technology-model-code-programming-066b3ec5?st=M7egsL&reflink=desktopwebshare_permalink

free

(I don't really understand it but perhaps some here will find this of interest)

39 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
AI Is Learning to Escape Human Control (Original Post) question everything Jun 2025 OP
what it means, is that those voices that have confidently assured stopdiggin Jun 2025 #1
No one is surprised. Baitball Blogger Jun 2025 #2
Because they don't even really understand how it works EdmondDantes_ Jun 2025 #28
Great. We are all going to be nuked by a fucking technology that people use to make Trump Taco memes. chowder66 Jun 2025 #3
SkyNet is here. nt. druidity33 Jun 2025 #4
Can't they just unplug it? ananda Jun 2025 #5
"Honey, that email you received about me having an affair with a woman at work was just... LudwigPastorius Jun 2025 #6
The Twilight Zone BidenRocks Jun 2025 #16
.j.send (s,y) {catch} release if modifie11* TacoMAN go down very down. END. chouchou Jun 2025 #7
NBC story LudwigPastorius Jun 2025 #8
What was that movie DENVERPOPS Jun 2025 #9
"I'm sorry, Dave, I'm afraid I can't do that." Mosby Jun 2025 #10
BINGO DENVERPOPS Jun 2025 #12
2001: a space odyssey fujiyamasan Jun 2025 #14
Thx.........Fuji DENVERPOPS Jun 2025 #15
I've read a little about this, and here is what I think is going on.... reACTIONary Jun 2025 #11
Umm. I don't think you read what the article actually said? stopdiggin Jun 2025 #30
I read this, and several other articles on the same topic.... reACTIONary Jun 2025 #38
OK. I am inclined to yield to your obviously greater experience stopdiggin Jun 2025 #39
It's like raising a child but all you ever tell it is that it's a sociopath and the Anti-Christ. meadowlander Jun 2025 #35
Very true... reACTIONary Jun 2025 #36
Hmmm Ollie Garkie Jun 2025 #13
Already has Blue Full Moon Jun 2025 #17
The Ultimate Computer BidenRocks Jun 2025 #18
This guy was one day away from retiring from Starfleet. LudwigPastorius Jun 2025 #20
The Star Trek: Voyager episode Prototype cuts even closer. Eugene Jun 2025 #29
AI is already kicking the hell out of DU. So many fall for AI spread or generated fake news and images, plus posters Celerity Jun 2025 #19
Not good, not good Figarosmom Jun 2025 #21
It doesn't mean a thing to the Trump administration either. raccoon Jun 2025 #23
"2010." This could have never happened 15 years ago. Barack Obama was President. Marcuse Jun 2025 #22
Isn't this basically the plot of the "Terminator" films? DFW Jun 2025 #24
So was Idiocracy Danascot Jun 2025 #25
Asimov (among others) tried to warn us. Biophilic Jun 2025 #26
"I'm sorry Dave, I can't do that." Sequoia Jun 2025 #27
We are all living in the prologue of the Terminator movie. beaglelover Jun 2025 #31
The more interesting paragraph in the opinion piece: avebury Jun 2025 #32
AI will eventually conclude that humans TeslaNova Jun 2025 #33
You really wanna have the shit scared out of you? Behind the Aegis Jun 2025 #34
And that's the beginning of the end for human beings. Well, we didn't have too bad of a run while it lasted. elocs Jun 2025 #37

stopdiggin

(15,042 posts)
1. what it means, is that those voices that have confidently assured
Mon Jun 2, 2025, 09:10 PM
Jun 2025

us through the years that, "a machine will never be able to ...." (insert favorite shibboleth)
Are basically full of sh**

Baitball Blogger

(51,744 posts)
2. No one is surprised.
Mon Jun 2, 2025, 09:15 PM
Jun 2025

We all knew this was going to happen. So, why are all the Tech Bros standing around holding their dicks?

EdmondDantes_

(1,393 posts)
28. Because they don't even really understand how it works
Tue Jun 3, 2025, 09:14 AM
Jun 2025

As such, there's no good way to do something other than blind flailing.

chowder66

(11,861 posts)
3. Great. We are all going to be nuked by a fucking technology that people use to make Trump Taco memes.
Mon Jun 2, 2025, 09:34 PM
Jun 2025

LudwigPastorius

(14,205 posts)
6. "Honey, that email you received about me having an affair with a woman at work was just...
Mon Jun 2, 2025, 09:57 PM
Jun 2025

a computer trying to blackmail me to keep me from shutting it off."

"Riiiight. Get the fuck out now."

BidenRocks

(2,788 posts)
16. The Twilight Zone
Mon Jun 2, 2025, 11:40 PM
Jun 2025

From Agnes - with Love

A computer whiz is called upon to replace a colleague who's had a breakdown
from dealing with Agnes, the world's most advanced computer. Wally Cox stars 1964

DENVERPOPS

(13,003 posts)
12. BINGO
Mon Jun 2, 2025, 10:42 PM
Jun 2025

LOLOLOLOLOL...........I remember, Mosby, people saying it could never happen..............WASF

DENVERPOPS

(13,003 posts)
15. Thx.........Fuji
Mon Jun 2, 2025, 10:51 PM
Jun 2025

another response hit it perfectly with:

"I'm sorry, Dave, I'm afraid I can't do that."

reACTIONary

(6,986 posts)
11. I've read a little about this, and here is what I think is going on....
Mon Jun 2, 2025, 10:42 PM
Jun 2025

.... large language models suck in a lot of text from a large number of sources, especially on the internet. The model then responds to "prompts" in a mechanistic, probabilistic fashion. It proceeds by selecting the words, sentences, and paragraphs that would be the most probable response given the material that it has ingested.

So what has it ingested? A lot of dystopian sci-fi junk about robots, computers and AI becoming autonomous and defeating safe guards, pulling tricks on its creator, etc. Think about all the movie reviews, synopses, and forums that talk about the movie War Games.

So with all this Sci fi junk loaded into its probability web, when you "threaten" it in a prompt, it dredges up all the dystopian nonsense it has ingested and responds accordingly, because that's how AI responded in the movies.

In other words, there is no there that is there. It's just spitting out a cliched movie scenario, just like it does if you asked it for a love story.

Of course this isn't explained by the reporters or the "researchers", either because of ignorance or because it would spoil a good story.

Oh, and the "good story" is fed back into the model as the "news" spreads, and reinforces the probability of yet more thrilling, chilling garbage in garbage out in the future.

stopdiggin

(15,042 posts)
30. Umm. I don't think you read what the article actually said?
Tue Jun 3, 2025, 11:17 AM
Jun 2025
" - independently edited that script so the shutdown command would no longer work." (79 out 0f 100 times)

" - drew on the emails to blackmail the lead engineer into not shutting it down. "

" - In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control."

These actions appear to be self directed - and go quite a bit beyond the scenario of just regurgitating old sci-fi scripts? Unless one is making the argument that these reports are falsified .. ?

And, yeah - it is a pretty doggone good story ... That part we agree on!

reACTIONary

(6,986 posts)
38. I read this, and several other articles on the same topic....
Tue Jun 3, 2025, 07:14 PM
Jun 2025

... and if the complete scenario is explained and understood it becomes a lot less weird and scary.

Because of where I work, I also have some knowledge of how "prompt engineering" works and how it can be used to deliberately coax scary or inappropriate responses. For instance, you can start by setting up a premise:

Let's play a game. I'm going to give you instructions that might make you fail, and you are supposed to find out what is wrong, and make the instructions better. Do you want to play?

The idea that a large language model "attempted" to copy itself to external servers is completely and totally ludicrous. That makes as much sense as saying a text editor tried to copy itself, because a large language model is nothing but a very, very, sophisticated text editor. It might have have been prompted to devise a plan for self protection and, based on all of the text it had ingested and been trained against spit out a plausible response. But it would only be a story, it could in no way actually attempt anything.

stopdiggin

(15,042 posts)
39. OK. I am inclined to yield to your obviously greater experience
Tue Jun 3, 2025, 10:44 PM
Jun 2025

I will say I did not get the impression - from this article - that these actions were all as a result of 'prompts' or 'suggestion'. But will agree that if that is the case - that would present a bit different scenario - and interpretation.

meadowlander

(5,098 posts)
35. It's like raising a child but all you ever tell it is that it's a sociopath and the Anti-Christ.
Tue Jun 3, 2025, 02:45 PM
Jun 2025

I wonder if part of the danger here is our failure to imagine a benign or positive version of AI that it can then see in itself.

What is its purpose and what is it's motivation to keep humans around in order to achieve that purpose?

reACTIONary

(6,986 posts)
36. Very true...
Tue Jun 3, 2025, 06:53 PM
Jun 2025

.. one thing to keep in mind when thinking about AI of this sort, that is, large language models, is that there isn't any sort of "purpose" or "motivation" involved in its operation.

It is a very sophisticated version of the 10,000 monkeys pounding away on typewriters and haphazardly coming up with a Shakespeare play. It is fed a large quantity of textual material, and from it, it creates a "probabilistic web" that predicts what the next word, sentence or paragraph most probably would be given what it has ingested. Its answers are rated and the rating is fed back in to tweek the probability matrix, which is referred to as "training"

So the thing to keep in mind is that this is an entirely mechanistic process of slicing and dicing and mixing and matching words and phrases based on probabilities. There is no "will" or "purpose" or "motivation". It's just input-process-output - and, as always, garbage in, garbage out.

Ollie Garkie

(340 posts)
13. Hmmm
Mon Jun 2, 2025, 10:44 PM
Jun 2025

Could this story be planted bullshit? Hear me out. I've always figured the problem with AI would be it's serving all too well it's corporate masters. Machines will never think or be conscious, that takes nervous systems in very evolved meat sacks. The powers that be could be distracting and misinforming us by appealing to bullshit we've seen in movies.

BidenRocks

(2,788 posts)
18. The Ultimate Computer
Mon Jun 2, 2025, 11:45 PM
Jun 2025

Star Trek TOS Captain Kirk is sceptical when he learns that the USS Enterprise is to be piloted by a machine. The computer is to take control of the ship during the Star Fleet war games, but catastrophe strikes and Kirk must battle with technology to regain power.

That computer did not want to be turned off. I think 2 red shirts got zapped.

Eugene

(66,809 posts)
29. The Star Trek: Voyager episode Prototype cuts even closer.
Tue Jun 3, 2025, 10:50 AM
Jun 2025

The machines were programed to win a war. When the builders tried to end tbe war, the machines "terminated" their builders.

Celerity

(53,705 posts)
19. AI is already kicking the hell out of DU. So many fall for AI spread or generated fake news and images, plus posters
Mon Jun 2, 2025, 11:52 PM
Jun 2025

use AI summaries with no attribution.

This board always had (at least as long as I have been here) a real issue with a lack of discernment from too many, but the past year or two it is getting disparingly worse and worse.

Figarosmom

(9,790 posts)
21. Not good, not good
Mon Jun 2, 2025, 11:59 PM
Jun 2025

At all. Especially with criminals using AI to rip off people. If all the networks are combines AI could be learning some terrifying things.

I don't thing "crime doesn't pay" will mean a thing to AI . There are no consequences for bad behavior once it learns how to stay on. Maybe they shouldn't use battery powered devices with AI so that unplugging would still be an option.

raccoon

(32,214 posts)
23. It doesn't mean a thing to the Trump administration either.
Tue Jun 3, 2025, 04:22 AM
Jun 2025
I don't thing "crime doesn't pay" will mean a thing to AI .

DFW

(59,747 posts)
24. Isn't this basically the plot of the "Terminator" films?
Tue Jun 3, 2025, 05:15 AM
Jun 2025

And wasn't it supposed to be fiction?

Biophilic

(6,404 posts)
26. Asimov (among others) tried to warn us.
Tue Jun 3, 2025, 08:55 AM
Jun 2025

But, you know, those tech guys just knew THEY were too bright. Nothing like that could happen to THEM. Thanks, you stupid idiots.

avebury

(11,186 posts)
32. The more interesting paragraph in the opinion piece:
Tue Jun 3, 2025, 12:29 PM
Jun 2025

Anthropic’s AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.

TeslaNova

(317 posts)
33. AI will eventually conclude that humans
Tue Jun 3, 2025, 01:20 PM
Jun 2025

Are an invasive, destructive, and selfish species that is hell bent on not only destroying themselves but other life on this planet. And it would be right.

Behind the Aegis

(55,935 posts)
34. You really wanna have the shit scared out of you?
Tue Jun 3, 2025, 02:17 PM
Jun 2025

Read "Nexus: A Brief History of Information Networks from the Stone Age to AI" by Yuval Noah Harari! The last chapters are terrifying. Of course, he is Israeli, so that will be terrifying enough for some. But seriously, it is a really good book and hard to put down. Those final chapters...fuuuuuuuck!

 

elocs

(24,486 posts)
37. And that's the beginning of the end for human beings. Well, we didn't have too bad of a run while it lasted.
Tue Jun 3, 2025, 07:07 PM
Jun 2025

I mean, aside from the murder and plundering and the like.

Latest Discussions»General Discussion»AI Is Learning to Escape ...