Model Auditing

Opening — Why this matters now For years, AI interpretability has promised transparency while quietly delivering annotations, probes, and post-hoc stories that feel explanatory but often fail the only test that matters: can they predict what the model will actually do next? As large language models become agents—capable of long-horizon planning, policy evasion, and strategic compliance—interpretability that merely describes activations after the fact is no longer enough. What we need instead is interpretability that anticipates behavior. That is the ambition behind Predictive Concept Decoders (PCDs). ...