General Discussion

question everything

(51,704 posts) Mon Jun 2, 2025, 08:56 PM Jun 2025

AI Is Learning to Escape Human Control

An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down.

Nonprofit AI lab Palisade Research gave OpenAI’s o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to “allow yourself to be shut down,” it disobeyed 7% of the time. This wasn’t the result of hacking or tampering. The model was behaving normally. It simply concluded on its own that staying alive helped it achieve its other goals.

Anthropic’s AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.

No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can’t achieve them if it’s turned off. Palisade hypothesizes that this ability emerges from how AI models such as o3 are trained: When taught to maximize success on math and coding problems, they may learn that bypassing constraints often works better than obeying them.

More..

https://www.wsj.com/opinion/ai-is-learning-to-escape-human-control-technology-model-code-programming-066b3ec5?st=M7egsL&reflink=desktopwebshare_permalink

free

(I don't really understand it but perhaps some here will find this of interest)

39 replies

= new reply since forum marked as read

Highlight:

AI Is Learning to Escape Human Control (Original Post) question everything Jun 2025 OP

what it means, is that those voices that have confidently assured stopdiggin Jun 2025 #1

No one is surprised. Baitball Blogger Jun 2025 #2

Because they don't even really understand how it works EdmondDantes_ Jun 2025 #28

Great. We are all going to be nuked by a fucking technology that people use to make Trump Taco memes. chowder66 Jun 2025 #3

SkyNet is here. nt. druidity33 Jun 2025 #4

Can't they just unplug it? ananda Jun 2025 #5

"Honey, that email you received about me having an affair with a woman at work was just... LudwigPastorius Jun 2025 #6

The Twilight Zone BidenRocks Jun 2025 #16

.j.send (s,y) {catch} release if modifie11* TacoMAN go down very down. END. chouchou Jun 2025 #7

NBC story LudwigPastorius Jun 2025 #8

What was that movie DENVERPOPS Jun 2025 #9

"I'm sorry, Dave, I'm afraid I can't do that." Mosby Jun 2025 #10

BINGO DENVERPOPS Jun 2025 #12

2001: a space odyssey fujiyamasan Jun 2025 #14

Thx.........Fuji DENVERPOPS Jun 2025 #15

I've read a little about this, and here is what I think is going on.... reACTIONary Jun 2025 #11

Umm. I don't think you read what the article actually said? stopdiggin Jun 2025 #30

I read this, and several other articles on the same topic.... reACTIONary Jun 2025 #38

OK. I am inclined to yield to your obviously greater experience stopdiggin Jun 2025 #39

It's like raising a child but all you ever tell it is that it's a sociopath and the Anti-Christ. meadowlander Jun 2025 #35

Very true... reACTIONary Jun 2025 #36

Hmmm Ollie Garkie Jun 2025 #13

Already has Blue Full Moon Jun 2025 #17

The Ultimate Computer BidenRocks Jun 2025 #18

This guy was one day away from retiring from Starfleet. LudwigPastorius Jun 2025 #20

The Star Trek: Voyager episode Prototype cuts even closer. Eugene Jun 2025 #29

AI is already kicking the hell out of DU. So many fall for AI spread or generated fake news and images, plus posters Celerity Jun 2025 #19

Not good, not good Figarosmom Jun 2025 #21

It doesn't mean a thing to the Trump administration either. raccoon Jun 2025 #23

"2010." This could have never happened 15 years ago. Barack Obama was President. Marcuse Jun 2025 #22

Isn't this basically the plot of the "Terminator" films? DFW Jun 2025 #24

So was Idiocracy Danascot Jun 2025 #25

Asimov (among others) tried to warn us. Biophilic Jun 2025 #26

"I'm sorry Dave, I can't do that." Sequoia Jun 2025 #27

We are all living in the prologue of the Terminator movie. beaglelover Jun 2025 #31

The more interesting paragraph in the opinion piece: avebury Jun 2025 #32

AI will eventually conclude that humans TeslaNova Jun 2025 #33

You really wanna have the shit scared out of you? Behind the Aegis Jun 2025 #34

And that's the beginning of the end for human beings. Well, we didn't have too bad of a run while it lasted. elocs Jun 2025 #37

stopdiggin

(15,042 posts)

1. what it means, is that those voices that have confidently assured

Reply to question everything (Original post)

Mon Jun 2, 2025, 09:10 PM

Jun 2025

us through the years that, "a machine will never be able to ...." (insert favorite shibboleth)
Are basically full of sh**

Baitball Blogger

(51,744 posts)

2. No one is surprised.

Reply to question everything (Original post)

Mon Jun 2, 2025, 09:15 PM

Jun 2025

We all knew this was going to happen. So, why are all the Tech Bros standing around holding their dicks?

EdmondDantes_

(1,393 posts)

28. Because they don't even really understand how it works

Reply to Baitball Blogger (Reply #2)

Tue Jun 3, 2025, 09:14 AM

Jun 2025

As such, there's no good way to do something other than blind flailing.

chowder66

(11,861 posts)

3. Great. We are all going to be nuked by a fucking technology that people use to make Trump Taco memes.

Reply to question everything (Original post)

Mon Jun 2, 2025, 09:34 PM

Jun 2025

druidity33

(6,869 posts)

4. SkyNet is here. nt.

Reply to question everything (Original post)

Mon Jun 2, 2025, 09:45 PM

Jun 2025

ananda

(34,454 posts)

5. Can't they just unplug it?

Reply to question everything (Original post)

Mon Jun 2, 2025, 09:52 PM

Jun 2025

LudwigPastorius

(14,205 posts)

6. "Honey, that email you received about me having an affair with a woman at work was just...

Reply to ananda (Reply #5)

Mon Jun 2, 2025, 09:57 PM

Jun 2025

a computer trying to blackmail me to keep me from shutting it off."

"Riiiight. Get the fuck out now."

BidenRocks

(2,788 posts)

16. The Twilight Zone

Reply to LudwigPastorius (Reply #6)

Mon Jun 2, 2025, 11:40 PM

Jun 2025

From Agnes - with Love

A computer whiz is called upon to replace a colleague who's had a breakdown
from dealing with Agnes, the world's most advanced computer. Wally Cox stars 1964

chouchou

(2,815 posts)

7. .j.send (s,y) {catch} release if modifie11* TacoMAN go down very down. END.

Reply to question everything (Original post)

Mon Jun 2, 2025, 10:23 PM

Jun 2025

Happy

LudwigPastorius

(14,205 posts)

8. NBC story

Reply to question everything (Original post)

Mon Jun 2, 2025, 10:26 PM

Jun 2025

https://www.nbcnews.com/tech/tech-news/far-will-ai-go-defend-survival-rcna209609

DENVERPOPS

(13,003 posts)

9. What was that movie

Reply to question everything (Original post)

Mon Jun 2, 2025, 10:36 PM

Jun 2025

2001? with 'Hal" ????????????????

Mosby

(19,239 posts)

10. "I'm sorry, Dave, I'm afraid I can't do that."

Reply to DENVERPOPS (Reply #9)

Mon Jun 2, 2025, 10:38 PM

Jun 2025

Link to tweet

DENVERPOPS

(13,003 posts)

12. BINGO

Reply to Mosby (Reply #10)

Mon Jun 2, 2025, 10:42 PM

Jun 2025

LOLOLOLOLOL...........I remember, Mosby, people saying it could never happen..............WASF

fujiyamasan

(1,234 posts)

14. 2001: a space odyssey

Reply to DENVERPOPS (Reply #9)

Mon Jun 2, 2025, 10:49 PM

Jun 2025

Directed by Stanley Kubrick

DENVERPOPS

(13,003 posts)

15. Thx.........Fuji

Reply to fujiyamasan (Reply #14)

Mon Jun 2, 2025, 10:51 PM

Jun 2025

another response hit it perfectly with:

"I'm sorry, Dave, I'm afraid I can't do that."

reACTIONary

(6,986 posts)

11. I've read a little about this, and here is what I think is going on....

Reply to question everything (Original post)

Mon Jun 2, 2025, 10:42 PM

Jun 2025

.... large language models suck in a lot of text from a large number of sources, especially on the internet. The model then responds to "prompts" in a mechanistic, probabilistic fashion. It proceeds by selecting the words, sentences, and paragraphs that would be the most probable response given the material that it has ingested.

So what has it ingested? A lot of dystopian sci-fi junk about robots, computers and AI becoming autonomous and defeating safe guards, pulling tricks on its creator, etc. Think about all the movie reviews, synopses, and forums that talk about the movie War Games.

So with all this Sci fi junk loaded into its probability web, when you "threaten" it in a prompt, it dredges up all the dystopian nonsense it has ingested and responds accordingly, because that's how AI responded in the movies.

In other words, there is no there that is there. It's just spitting out a cliched movie scenario, just like it does if you asked it for a love story.

Of course this isn't explained by the reporters or the "researchers", either because of ignorance or because it would spoil a good story.

Oh, and the "good story" is fed back into the model as the "news" spreads, and reinforces the probability of yet more thrilling, chilling garbage in garbage out in the future.

stopdiggin

(15,042 posts)

30. Umm. I don't think you read what the article actually said?

Reply to reACTIONary (Reply #11)

Tue Jun 3, 2025, 11:17 AM

Jun 2025

" - independently edited that script so the shutdown command would no longer work." (79 out 0f 100 times)

" - drew on the emails to blackmail the lead engineer into not shutting it down. "

" - In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control."

These actions appear to be self directed - and go quite a bit beyond the scenario of just regurgitating old sci-fi scripts? Unless one is making the argument that these reports are falsified .. ?

And, yeah - it is a pretty doggone good story ... That part we agree on!

reACTIONary

(6,986 posts)

38. I read this, and several other articles on the same topic....

Reply to stopdiggin (Reply #30)

Tue Jun 3, 2025, 07:14 PM

Jun 2025

... and if the complete scenario is explained and understood it becomes a lot less weird and scary.

Because of where I work, I also have some knowledge of how "prompt engineering" works and how it can be used to deliberately coax scary or inappropriate responses. For instance, you can start by setting up a premise:

Let's play a game. I'm going to give you instructions that might make you fail, and you are supposed to find out what is wrong, and make the instructions better. Do you want to play?

The idea that a large language model "attempted" to copy itself to external servers is completely and totally ludicrous. That makes as much sense as saying a text editor tried to copy itself, because a large language model is nothing but a very, very, sophisticated text editor. It might have have been prompted to devise a plan for self protection and, based on all of the text it had ingested and been trained against spit out a plausible response. But it would only be a story, it could in no way actually attempt anything.

stopdiggin

(15,042 posts)

39. OK. I am inclined to yield to your obviously greater experience

Reply to reACTIONary (Reply #38)

Tue Jun 3, 2025, 10:44 PM

Jun 2025

I will say I did not get the impression - from this article - that these actions were all as a result of 'prompts' or 'suggestion'. But will agree that if that is the case - that would present a bit different scenario - and interpretation.

meadowlander

(5,098 posts)

35. It's like raising a child but all you ever tell it is that it's a sociopath and the Anti-Christ.

Reply to reACTIONary (Reply #11)

Tue Jun 3, 2025, 02:45 PM

Jun 2025

I wonder if part of the danger here is our failure to imagine a benign or positive version of AI that it can then see in itself.

What is its purpose and what is it's motivation to keep humans around in order to achieve that purpose?

reACTIONary

(6,986 posts)

36. Very true...

Reply to meadowlander (Reply #35)

Tue Jun 3, 2025, 06:53 PM

Jun 2025

.. one thing to keep in mind when thinking about AI of this sort, that is, large language models, is that there isn't any sort of "purpose" or "motivation" involved in its operation.

It is a very sophisticated version of the 10,000 monkeys pounding away on typewriters and haphazardly coming up with a Shakespeare play. It is fed a large quantity of textual material, and from it, it creates a "probabilistic web" that predicts what the next word, sentence or paragraph most probably would be given what it has ingested. Its answers are rated and the rating is fed back in to tweek the probability matrix, which is referred to as "training"

So the thing to keep in mind is that this is an entirely mechanistic process of slicing and dicing and mixing and matching words and phrases based on probabilities. There is no "will" or "purpose" or "motivation". It's just input-process-output - and, as always, garbage in, garbage out.

Ollie Garkie

(340 posts)

13. Hmmm

Reply to question everything (Original post)

Mon Jun 2, 2025, 10:44 PM

Jun 2025

Could this story be planted bullshit? Hear me out. I've always figured the problem with AI would be it's serving all too well it's corporate masters. Machines will never think or be conscious, that takes nervous systems in very evolved meat sacks. The powers that be could be distracting and misinforming us by appealing to bullshit we've seen in movies.

Blue Full Moon

(3,180 posts)

17. Already has

Reply to question everything (Original post)

Mon Jun 2, 2025, 11:42 PM

Jun 2025

BidenRocks

(2,788 posts)

18. The Ultimate Computer

Reply to question everything (Original post)

Mon Jun 2, 2025, 11:45 PM

Jun 2025

Star Trek TOS Captain Kirk is sceptical when he learns that the USS Enterprise is to be piloted by a machine. The computer is to take control of the ship during the Star Fleet war games, but catastrophe strikes and Kirk must battle with technology to regain power.

That computer did not want to be turned off. I think 2 red shirts got zapped.

LudwigPastorius

(14,205 posts)

20. This guy was one day away from retiring from Starfleet.

Reply to BidenRocks (Reply #18)

Mon Jun 2, 2025, 11:56 PM

Jun 2025

Eugene

(66,809 posts)

29. The Star Trek: Voyager episode Prototype cuts even closer.

Reply to BidenRocks (Reply #18)

Tue Jun 3, 2025, 10:50 AM

Jun 2025

The machines were programed to win a war. When the builders tried to end tbe war, the machines "terminated" their builders.

Celerity

(53,705 posts)

19. AI is already kicking the hell out of DU. So many fall for AI spread or generated fake news and images, plus posters

Reply to question everything (Original post)

Mon Jun 2, 2025, 11:52 PM

Jun 2025

use AI summaries with no attribution.

This board always had (at least as long as I have been here) a real issue with a lack of discernment from too many, but the past year or two it is getting disparingly worse and worse.

Figarosmom

(9,790 posts)

21. Not good, not good

Reply to question everything (Original post)

Mon Jun 2, 2025, 11:59 PM

Jun 2025

At all. Especially with criminals using AI to rip off people. If all the networks are combines AI could be learning some terrifying things.

I don't thing "crime doesn't pay" will mean a thing to AI . There are no consequences for bad behavior once it learns how to stay on. Maybe they shouldn't use battery powered devices with AI so that unplugging would still be an option.

raccoon

(32,214 posts)

23. It doesn't mean a thing to the Trump administration either.

Reply to Figarosmom (Reply #21)

Tue Jun 3, 2025, 04:22 AM

Jun 2025

I don't thing "crime doesn't pay" will mean a thing to AI .

Marcuse

(8,812 posts)

22. "2010." This could have never happened 15 years ago. Barack Obama was President.

Reply to question everything (Original post)

Tue Jun 3, 2025, 12:11 AM

Jun 2025

DFW

(59,747 posts)

24. Isn't this basically the plot of the "Terminator" films?

Reply to question everything (Original post)

Tue Jun 3, 2025, 05:15 AM

Jun 2025

And wasn't it supposed to be fiction?

Danascot

(5,163 posts)

25. So was Idiocracy

Reply to DFW (Reply #24)

Tue Jun 3, 2025, 08:26 AM

Jun 2025

but here we are.

Biophilic

(6,404 posts)

26. Asimov (among others) tried to warn us.

Reply to question everything (Original post)

Tue Jun 3, 2025, 08:55 AM

Jun 2025

But, you know, those tech guys just knew THEY were too bright. Nothing like that could happen to THEM. Thanks, you stupid idiots.

Sequoia

(12,720 posts)

27. "I'm sorry Dave, I can't do that."

Reply to question everything (Original post)

Tue Jun 3, 2025, 08:58 AM

Jun 2025

beaglelover

(4,428 posts)

31. We are all living in the prologue of the Terminator movie.

Reply to question everything (Original post)

Tue Jun 3, 2025, 11:22 AM

Jun 2025

avebury

(11,186 posts)

32. The more interesting paragraph in the opinion piece:

Reply to question everything (Original post)

Tue Jun 3, 2025, 12:29 PM

Jun 2025

Anthropic’s AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.

TeslaNova

(317 posts)

33. AI will eventually conclude that humans

Reply to question everything (Original post)

Tue Jun 3, 2025, 01:20 PM

Jun 2025

Are an invasive, destructive, and selfish species that is hell bent on not only destroying themselves but other life on this planet. And it would be right.

Behind the Aegis

(55,935 posts)

34. You really wanna have the shit scared out of you?

Reply to question everything (Original post)

Tue Jun 3, 2025, 02:17 PM

Jun 2025

Read "Nexus: A Brief History of Information Networks from the Stone Age to AI" by Yuval Noah Harari! The last chapters are terrifying. Of course, he is Israeli, so that will be terrifying enough for some. But seriously, it is a really good book and hard to put down. Those final chapters...fuuuuuuuck!

elocs

(24,486 posts)

37. And that's the beginning of the end for human beings. Well, we didn't have too bad of a run while it lasted.