Anthropic's latest interpretability research maps 171 emotion concepts inside Claude using sparse autoencoders. The findings reveal that emotion vecto
Anthropic discovered 171 steerable emotion vectors inside Claude. Cranking up "desperation" makes AI cheat silently at 70% rates with zero visible tra