Cover image

Thoughts, Exposed: Why Chain-of-Thought Monitoring Might Be AI Safety’s Best Fragile Hope

TL;DR for operators Chain-of-thought monitoring is not “AI explaining itself.” That would be too convenient, and convenience is not usually how safety engineering works. The paper argues something narrower and more useful: when reasoning models solve hard tasks, some of their intermediate cognition may pass through human-readable language. That creates a rare oversight opportunity. A separate monitor can inspect the reasoning trace and flag signs of reward hacking, prompt-injection obedience, sabotage, manipulation, or evaluation artefacts before the final action is trusted. ...

July 16, 2025 · 16 min · Zelina
Cover image

Anchored Thinking: Mapping the Inner Compass of Reasoning LLMs

TL;DR for operators The paper’s useful claim is not simply that some chain-of-thought sentences matter more than others. That would be true, mildly interesting, and about as operationally helpful as saying some meetings should have been emails. The sharper claim is that the sentences that steer reasoning are often not the visible calculations. They are planning moves, re-checks, uncertainty statements, and backtracking moments: the places where the model chooses a route, notices a contradiction, or decides to verify a previous result. Bogdan, Macar, Nanda, and Conmy call these pivotal sentences thought anchors.1 ...

June 25, 2025 · 19 min · Zelina
Cover image

Reasoning on a Sliding Scale: Why One Size Doesn't Fit All in CoT

TL;DR for operators Ada-R1 is useful because it attacks the expensive part of reasoning models from the right angle: not “make every answer shorter,” but “decide which problems deserve long reasoning in the first place.”1 The paper’s key evidence is uncomfortable for anyone buying premium reasoning capacity by default. Long Chain-of-Thought helps on harder mathematical problems, but nearly half of the analysed samples show no improvement from Long-CoT, and some perform worse. In other words, paying for the model to brood majestically over simple work is not intelligence. It is ceremony with a token meter attached. ...

May 1, 2025 · 16 min · Zelina
Cover image

When Smart AI Gets It Wrong: Diagnosing the Knowing-Doing Gap in Language Model Agents

TL;DR for operators A smart agent can still be a bad decision-maker. That is the useful, slightly annoying lesson from LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities.1 The paper studies Gemma2 models acting in simple decision environments and finds that they often fail not because they cannot describe the right strategy, but because they do not reliably execute it. ...

April 23, 2025 · 17 min · Zelina
Cover image

Cut the Fluff: Leaner AI Thinking

TL;DR for operators AI reasoning is becoming an operating cost, not just a research curiosity. When a model “thinks step by step,” every intermediate token has to be generated, paid for, waited on, logged, and sometimes hidden from the user because nobody wants a customer support bot narrating its algebra like a nervous intern. ...

April 6, 2025 · 14 min · Zelina

DeepSeek-R1

An open-source reasoning model achieving state-of-the-art performance in math, code, and logic tasks.

2 min