Anthropic's Natural Language Autoencoders convert opaque LLM activations into human-readable text. This deep dive covers the architecture, safety appl