Cover image

Picture This: When AI Reasoning Leaves the Text Box

Reasoning usually arrives as text. A model explains itself in sentences, equations, bullet points, and the occasional theatrical “therefore.” We have learned to call this chain-of-thought, or CoT, because “the model wrote a long scratchpad and we hope it helped” sounded insufficiently scientific. The paper Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text asks a sharper question: what if the intermediate reasoning medium does not have to be text at all?1 ...

June 9, 2026 · 17 min · Zelina
Cover image

Thinking Isn’t Free: Why Chain-of-Thought Hits a Hard Wall

Reasoning budgets look harmless until they become a line item. A user asks an AI system to reconcile a long contract, inspect a transaction trail, trace dependencies in a knowledge graph, or verify whether one operational event can lead to another. The model “thinks.” The answer improves. The invoice also improves, in the less charming direction. The usual response is to ask for shorter reasoning: compress the chain of thought, use fewer tokens, impose a budget, maybe add a prompt that says “be concise,” because apparently invoices can be negotiated with adjectives. ...

February 5, 2026 · 15 min · Zelina
Cover image

When Agents Stop Talking to the Wrong People

Communication sounds harmless until the wrong person gets the microphone. That is true in meetings. It is also true in multi-agent AI systems. The polite version says agents “collaborate,” “debate,” and “refine each other’s reasoning.” The less decorative version is that one agent’s output becomes another agent’s input. If the first agent is wrong, confused, strategically misleading, or simply having one of those tiny synthetic breakdowns that LLMs have with impressive confidence, the system has just created a distribution channel for bad judgment. ...

February 4, 2026 · 15 min · Zelina
Cover image

Train Long, Think Short: How Curriculum Learning Makes LLMs Think Smarter, Not Longer

TL;DR for operators The paper behind this article proposes Curriculum GRPO: a reinforcement-learning training method that starts a reasoning model with a larger token budget, then gradually shrinks that budget until the model learns to solve problems in shorter traces.1 The important point is not “ask the model to be brief.” We have tried that. It works roughly as well as asking a committee to be concise, which is to say: occasionally, under duress. The paper instead changes the training trajectory. The model is first allowed to explore longer reasoning paths, then is forced to compress successful strategies into a tighter token budget. ...

August 13, 2025 · 13 min · Zelina