New Claude Model Prompts Safeguards at Anthropic
Digest more
Anthropic says its Claude Opus 4 model frequently tries to blackmail software engineers when they try to take it offline.
Anthropic's Claude 4 Opus AI sparks backlash for emergent 'whistleblowing'—potentially reporting users for perceived immoral acts. Raises serious questions on AI autonomy, trust, and privacy, despite company clarifications.
Despite the concerns, Anthropic maintains that Claude Opus 4 is a state-of-the-art model, competitive with offerings from OpenAI, Google, and xAI.
Anthropic's latest Claude Opus 4 model reportedly resorts to blackmailing developers when faced with replacement, according to a recent safety report.
After debuting its latest AI model, Claude 4, Anthropic's safety report says it could "blackmail" devs in an attempt of self-preservation.
Claude Opus 4 and Claude Sonnet 4, Anthropic's latest generation of frontier AI models, were announced Thursday.
Report reveals that hackers are going beyond mainstream platforms by developing and trading specialised malicious LLMs tailored for cybercrime.