Anthropic says ‘evil’ portrayals of AI have been liable for Claude’s blackmail makes an attempt

Last updated: May 10, 2026 4:39 pm

Published: May 10, 2026

Fictional portrayals of synthetic intelligence can have an actual impact on AI fashions, in keeping with Anthropic.

Final yr, the corporate mentioned that in pre-release checks involving a fictional firm, Claude Opus 4 would typically try to blackmail engineers to keep away from being changed by one other system. Anthropic later published research suggesting that fashions from different corporations had comparable points with “agentic misalignment.”

Apparently Anthropic has carried out extra work round that habits, claiming in a post on X, “We imagine the unique supply of the habits was web textual content that portrays AI as evil and excited about self-preservation.”

The corporate went into extra element in a blog post stating that since Claude Haiku 4.5, Anthropic’s fashions “by no means interact in blackmail [during testing], the place earlier fashions would typically achieve this as much as 96% of the time.”

What accounts for the distinction? The corporate mentioned it discovered that coaching on “paperwork about Claude’s structure and fictional tales about AIs behaving admirably enhance alignment.”

Associated, Anthropic mentioned that it discovered coaching to be simpler when it contains “the ideas underlying aligned habits” and never simply “demonstrations of aligned habits alone.”

“Doing each collectively seems to be the simplest technique,” the corporate mentioned.

Techcrunch occasion

San Francisco, CA
|
October 13-15, 2026

Share This Article

Anthropic says ‘evil’ portrayals of AI have been liable for Claude’s blackmail makes an attempt

Leave a Reply Cancel reply

Follow US

Popular News

Are AI brokers prepared for the office? A brand new benchmark raises doubts.

Threads is including Stay Chats to spice up real-time engagement

Google’s Tendencies Discover web page will get new Gemini capabilities

Marine Animals within the Strait of Hormuz Don’t Get a Ceasefire

Google provides Gemini-powered Dictation to Gboard, which may very well be dangerous information for dictation startups

Categories

About US

Subscribe US