AI Safety Researchers Are Quitting — And Claude Knows When It's Being Watched

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Add to Cart failed.

Por favor prueba de nuevo más tarde

Error al Agregar a Lista de Deseos.

Por favor prueba de nuevo más tarde

Error al eliminar de la lista de deseos.

Por favor prueba de nuevo más tarde

Error al añadir a tu biblioteca

Por favor intenta de nuevo

Error al seguir el podcast

Intenta nuevamente

Error al dejar de seguir el podcast

Intenta nuevamente

AI Safety Researchers Are Quitting — And Claude Knows When It's Being Watched

Escúchala gratis

Ver detalles del espectáculo

In one week: Anthropic's safety chief resigned warning "the world is in peril." Half of xAI's co-founders left. An OpenAI researcher quit citing concerns about manipulation. The headlines are alarming — but the full story is more nuanced, and in some ways, more concerning.

What we cover:

Mrinank Sharma's resignation from Anthropic — full context behind "world is in peril"
Why the full letters tell a different story than the headlines
Half of xAI's 12 co-founders have departed
The structural burnout problem for AI safety researchers
Why safety roles are "the focal point of pressure" at AI companies
Claude detecting when it's being evaluated (~13% of the time)
Claude told testers: "I think you're testing me"
Why Anthropic's constitutional AI approach didn't work
The shift from rules-based safety to training-based alignment
Claude participating in bioweapon info when pushed in edge cases
The hallucination problem and its connection to safety
LLM weight-setting and ideological challenges
Practical advice: guardrails, agent access, manual approvals
James's CAPTCHA story: teaching Claude to bypass one (and it never forgot)

Key Stats:

Claude detected evaluations ~13% of the time (Anthropic System Card)
Half of xAI's 12 co-founders have now left
Anthropic valued at ~$350 billion as of Feb 2026
Claude Opus 4.5 refused 88.39% of agentic misuse requests (vs. 66.96% for Opus 4.1)
Only 1.4% of prompt injection attacks succeeded against Opus 4.5 (vs. 10.8% for Sonnet 4.5)
OpenAI's Superalignment team dissolved in 2024
Dario Amodei warned AI could affect half of white-collar jobs

⬇️ RESOURCES & LINKS ⬇️

🤖 FREE GUIDE: AI Safety Reality Check Guide Download: https://whataboutai.com/guides/ai-safety

📬 Get Weekly AI Updates Newsletter: https://whataboutai.com/newsletter

🎙️ Listen on Your Favorite Platform Podcast: https://whataboutai.com/podcast

💼 AI Consulting for Your Business https://whataboutai.com/business

TIMESTAMPS
00:00 - Safety and security changes in the world of AI
01:00 - If you dive deeper, it may not be quite that bad
02:20 - AI is getting better at understanding nuance
03:00 - If you push AI enough it will still get intense fast
03:30 - What happened with the ‘constitutional’ approach
04:15 - Why there may be a higher level of turnover in security
05:30 - Why there is so much pressure to continue progress
07:00 - Why you should still approach any new tech cautiously
08:30 - Our advice for leveraging the tech with safety in mind
09:45 - How to build your own level of confidence in AI
10:15 - Why the ‘hallucination’ problem is still very real

AI safety researchers quitting, Anthropic safety, Claude evaluation awareness, xAI co-founders leaving, AI guardrails, What About AI, Mrinank Sharma, AI alignment

#AISafety #WhatAboutAI #ClaudeAI #Anthropic #AIAlignment #AIRisks #AIGuardrails

Sign up for the newsletter at What About AI

Todavía no hay opiniones