Cover image

How Sparse is Your Thought? Cracking the Inner Logic of Chain-of-Thought Prompts

Chain-of-Thought (CoT) prompting has become a go-to technique for improving multi-step reasoning in large language models (LLMs). But is it really helping models think better—or just encouraging them to bluff more convincingly? A new paper from Leiden University, “How does Chain of Thought Think?”, delivers a mechanistic deep dive into this question. By combining sparse autoencoders (SAEs) with activation patching, the authors dissect whether CoT actually changes what a model internally computes—or merely helps its outputs look better. ...

August 1, 2025 · 3 min · Zelina
Cover image

Circuits of Understanding: A Formal Path to Transformer Interpretability

Can we prove that we understand how a transformer works? Not just describe it heuristically, or highlight patterns—but actually trace its computations with the rigor of a math proof? That’s the ambition behind the recent paper Mechanistic Interpretability for Transformers: A Formal Framework and Case Study on Indirect Object Identification. The authors propose the first comprehensive mathematical framework for mechanistic interpretability, and they use it to dissect how a small transformer solves the Indirect Object Identification (IOI) task. What results is not just a technical tour de force, but a conceptual upgrade for the interpretability field. ...

July 30, 2025 · 3 min · Zelina