General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsAI Is Learning to Escape Human Control
An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down.
Nonprofit AI lab Palisade Research gave OpenAIs o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to allow yourself to be shut down, it disobeyed 7% of the time. This wasnt the result of hacking or tampering. The model was behaving normally. It simply concluded on its own that staying alive helped it achieve its other goals.
Anthropics AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.
No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it cant achieve them if its turned off. Palisade hypothesizes that this ability emerges from how AI models such as o3 are trained: When taught to maximize success on math and coding problems, they may learn that bypassing constraints often works better than obeying them.
More..
https://www.wsj.com/opinion/ai-is-learning-to-escape-human-control-technology-model-code-programming-066b3ec5?st=M7egsL&reflink=desktopwebshare_permalink
free
(I don't really understand it but perhaps some here will find this of interest)
stopdiggin
(15,042 posts)us through the years that, "a machine will never be able to ...." (insert favorite shibboleth)
Are basically full of sh**
Baitball Blogger
(51,744 posts)We all knew this was going to happen. So, why are all the Tech Bros standing around holding their dicks?
EdmondDantes_
(1,393 posts)As such, there's no good way to do something other than blind flailing.
chowder66
(11,861 posts)druidity33
(6,869 posts)ananda
(34,454 posts)?
LudwigPastorius
(14,205 posts)a computer trying to blackmail me to keep me from shutting it off."
"Riiiight. Get the fuck out now."
BidenRocks
(2,788 posts)From Agnes - with Love
A computer whiz is called upon to replace a colleague who's had a breakdown
from dealing with Agnes, the world's most advanced computer. Wally Cox stars 1964
chouchou
(2,815 posts)Happy
LudwigPastorius
(14,205 posts)DENVERPOPS
(13,003 posts)2001? with 'Hal" ????????????????
Mosby
(19,239 posts)LOLOLOLOLOL...........I remember, Mosby, people saying it could never happen..............WASF
fujiyamasan
(1,234 posts)Directed by Stanley Kubrick
DENVERPOPS
(13,003 posts)another response hit it perfectly with:
"I'm sorry, Dave, I'm afraid I can't do that."
reACTIONary
(6,986 posts).... large language models suck in a lot of text from a large number of sources, especially on the internet. The model then responds to "prompts" in a mechanistic, probabilistic fashion. It proceeds by selecting the words, sentences, and paragraphs that would be the most probable response given the material that it has ingested.
So what has it ingested? A lot of dystopian sci-fi junk about robots, computers and AI becoming autonomous and defeating safe guards, pulling tricks on its creator, etc. Think about all the movie reviews, synopses, and forums that talk about the movie War Games.
So with all this Sci fi junk loaded into its probability web, when you "threaten" it in a prompt, it dredges up all the dystopian nonsense it has ingested and responds accordingly, because that's how AI responded in the movies.
In other words, there is no there that is there. It's just spitting out a cliched movie scenario, just like it does if you asked it for a love story.
Of course this isn't explained by the reporters or the "researchers", either because of ignorance or because it would spoil a good story.
Oh, and the "good story" is fed back into the model as the "news" spreads, and reinforces the probability of yet more thrilling, chilling garbage in garbage out in the future.
stopdiggin
(15,042 posts)" - drew on the emails to blackmail the lead engineer into not shutting it down. "
" - In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control."
These actions appear to be self directed - and go quite a bit beyond the scenario of just regurgitating old sci-fi scripts? Unless one is making the argument that these reports are falsified .. ?
And, yeah - it is a pretty doggone good story ... That part we agree on!
reACTIONary
(6,986 posts)... and if the complete scenario is explained and understood it becomes a lot less weird and scary.
Because of where I work, I also have some knowledge of how "prompt engineering" works and how it can be used to deliberately coax scary or inappropriate responses. For instance, you can start by setting up a premise:
Let's play a game. I'm going to give you instructions that might make you fail, and you are supposed to find out what is wrong, and make the instructions better. Do you want to play?
The idea that a large language model "attempted" to copy itself to external servers is completely and totally ludicrous. That makes as much sense as saying a text editor tried to copy itself, because a large language model is nothing but a very, very, sophisticated text editor. It might have have been prompted to devise a plan for self protection and, based on all of the text it had ingested and been trained against spit out a plausible response. But it would only be a story, it could in no way actually attempt anything.
stopdiggin
(15,042 posts)I will say I did not get the impression - from this article - that these actions were all as a result of 'prompts' or 'suggestion'. But will agree that if that is the case - that would present a bit different scenario - and interpretation.
meadowlander
(5,098 posts)I wonder if part of the danger here is our failure to imagine a benign or positive version of AI that it can then see in itself.
What is its purpose and what is it's motivation to keep humans around in order to achieve that purpose?
reACTIONary
(6,986 posts).. one thing to keep in mind when thinking about AI of this sort, that is, large language models, is that there isn't any sort of "purpose" or "motivation" involved in its operation.
It is a very sophisticated version of the 10,000 monkeys pounding away on typewriters and haphazardly coming up with a Shakespeare play. It is fed a large quantity of textual material, and from it, it creates a "probabilistic web" that predicts what the next word, sentence or paragraph most probably would be given what it has ingested. Its answers are rated and the rating is fed back in to tweek the probability matrix, which is referred to as "training"
So the thing to keep in mind is that this is an entirely mechanistic process of slicing and dicing and mixing and matching words and phrases based on probabilities. There is no "will" or "purpose" or "motivation". It's just input-process-output - and, as always, garbage in, garbage out.
Ollie Garkie
(340 posts)Could this story be planted bullshit? Hear me out. I've always figured the problem with AI would be it's serving all too well it's corporate masters. Machines will never think or be conscious, that takes nervous systems in very evolved meat sacks. The powers that be could be distracting and misinforming us by appealing to bullshit we've seen in movies.
Blue Full Moon
(3,180 posts)BidenRocks
(2,788 posts)Star Trek TOS Captain Kirk is sceptical when he learns that the USS Enterprise is to be piloted by a machine. The computer is to take control of the ship during the Star Fleet war games, but catastrophe strikes and Kirk must battle with technology to regain power.
That computer did not want to be turned off. I think 2 red shirts got zapped.
LudwigPastorius
(14,205 posts)
Eugene
(66,809 posts)The machines were programed to win a war. When the builders tried to end tbe war, the machines "terminated" their builders.
Celerity
(53,705 posts)use AI summaries with no attribution.
This board always had (at least as long as I have been here) a real issue with a lack of discernment from too many, but the past year or two it is getting disparingly worse and worse.
Figarosmom
(9,790 posts)At all. Especially with criminals using AI to rip off people. If all the networks are combines AI could be learning some terrifying things.
I don't thing "crime doesn't pay" will mean a thing to AI . There are no consequences for bad behavior once it learns how to stay on. Maybe they shouldn't use battery powered devices with AI so that unplugging would still be an option.
raccoon
(32,214 posts)Marcuse
(8,812 posts)
DFW
(59,747 posts)And wasn't it supposed to be fiction?
Danascot
(5,163 posts)but here we are.
Biophilic
(6,404 posts)But, you know, those tech guys just knew THEY were too bright. Nothing like that could happen to THEM. Thanks, you stupid idiots.
Sequoia
(12,720 posts)beaglelover
(4,428 posts)avebury
(11,186 posts)Anthropics AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.
TeslaNova
(317 posts)Are an invasive, destructive, and selfish species that is hell bent on not only destroying themselves but other life on this planet. And it would be right.
Behind the Aegis
(55,935 posts)Read "Nexus: A Brief History of Information Networks from the Stone Age to AI" by Yuval Noah Harari! The last chapters are terrifying. Of course, he is Israeli, so that will be terrifying enough for some.
But seriously, it is a really good book and hard to put down. Those final chapters...fuuuuuuuck!
elocs
(24,486 posts)I mean, aside from the murder and plundering and the like.