Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

highplainsdem

(63,641 posts)
Thu Jun 11, 2026, 08:49 AM Jun 11

Anthropic Walks Back Policy That Could Have 'Sabotaged' AI Researchers Using Claude

See an earlier thread for more background on this:

Anthropic purposely made its new Mythos-based models bad at AI research, and developers are fuming
https://www.democraticunderground.com/100221297054


The update:

https://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/

-snip-

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic said in a statement to WIRED. “We made the wrong trade-off and we apologize for not getting the balance right.”

-snip-

“It felt like Anthropic was saying to the public, ‘We don't trust anybody else to do AI research. We are the only ones who have to do AI research,” says Will Brown, research lead at the open-source AI startup Prime Intellect. “It feels a bit like they’re starting to pull the ladder up behind them.”

Brown said the policy would also have left developers in the dark about whether they were violating Anthropic’s rules, since the company wouldn’t alert them when its safeguards were triggered. He added that the restrictions could have had widespread consequences. For example, he pointed to the growing ecosystem of third-party evaluation firms that test frontier models for safety, performance, and reliability—work that could have been hindered if Anthropic secretly degraded its model.

-snip-

Anthropic says that because this safeguard around AI development is now visible, it needs to cast a wider net, meaning more benign requests may trigger its safeguards. The company says it’s working to make its classifiers more precise as quickly as possible.


They are starting to pull up the ladder behind them. They'll still be hampering both AI researchers and (more importantly) those third-party evaluation firms testing frontier models for safety, performance, and reliability. Anthropic will just be more open now in admitting they'll be creating roadblocks for them.

And they're using national security - "foreign adversaries" - as an excuse.

These companies all want to have THE dominant AI model - one chatbot to rule them all. I wouldn't be surprised if they start trying more direct means to sabotage competitors. (Which could leave business and individual users as collateral damage.)
Latest Discussions»General Discussion»Anthropic Walks Back Poli...