Cover image

Thinking Isn’t Free: Why Chain-of-Thought Hits a Hard Wall

Opening — Why this matters now Inference-time reasoning has quietly become the dominant performance lever for frontier language models. When benchmarks get hard, we don’t retrain—we let models think longer. More tokens, more scratchpad, more compute. The industry narrative is simple: reasoning scales, so accuracy scales. This paper asks an uncomfortable question: how long must a model think, at minimum, as problems grow? And the answer, grounded in theory rather than vibes, is not encouraging. ...

February 5, 2026 · 3 min · Zelina
Cover image

Seeing Is Thinking: When Images Do the Reasoning

Opening — Why this matters now Large language models have learned to talk their way through reasoning. But the real world does not speak in tokens. It moves, collides, folds, and occludes. As multimodal models mature, a quiet question has become unavoidable: is language really the best internal medium for thinking about physical reality? ...

February 2, 2026 · 3 min · Zelina
Cover image

When the Answer Matters More Than the Thinking

Opening — Why this matters now Chain-of-thought (CoT) has quietly become the default crutch of modern LLM training. When models fail, we add more reasoning steps; when benchmarks stagnate, we stretch the explanations even further. The assumption is implicit and rarely questioned: better thinking inevitably leads to better answers. The paper “Rethinking Supervised Fine-Tuning: Emphasizing Key Answer Tokens for Improved LLM Accuracy” challenges that assumption with a refreshingly blunt observation: in supervised fine-tuning, the answer itself is often the shortest—and most under-optimized—part of the output. ...

December 26, 2025 · 4 min · Zelina
Cover image

Truth Machines: VeriCoT and the Next Frontier of AI Self-Verification

Why this matters now Large language models have grown remarkably persuasive—but not necessarily reliable. They often arrive at correct answers through logically unsound reasoning, a phenomenon both amusing in games and catastrophic in legal, biomedical, or policy contexts. The research paper VeriCoT: Neuro-Symbolic Chain-of-Thought Validation via Logical Consistency Checks proposes a decisive step toward addressing that flaw: a hybrid system where symbolic logic checks the reasoning of a neural model, not just its answers. ...

November 7, 2025 · 4 min · Zelina
Cover image

Reason, Reveal, Resist: The Persuasion Duality in Multi‑Agent AI

TL;DR In LLM multi‑agent systems, how a model thinks matters more than how big it is. Explicit reasoning (thinking mode / CoT) creates a Persuasion Duality: sharing a model’s reasoning makes it far better at convincing others, while enabling the model’s own reasoning mode makes it far harder to convince. This shifts best practices for agent design, governance, and product UX. Why this paper matters Cognition—not just parameter count—now drives the social dynamics of agent swarms. For Cognaptus clients building agent workers (ops, compliance, research, trading), the result is practical: toggling reasoning changes not just accuracy, but influence. Your deployment choices can tilt a network toward consensus, stalemate, or resilient truth‑seeking. ...

October 2, 2025 · 5 min · Zelina
Cover image

Keys to the Kingdom: How LLMs Can Audit Crypto Logic Before It Breaks

We’ve gotten good at spotting API misuse in crypto code (think “don’t use ECB,” “don’t hardcode IVs”). But many production failures don’t come from the obvious API call—they’re born in the logic that surrounds it: the parameter checks, corner-case math, and brittle “optimizations.” That’s where CryptoScope steps in: an LLM-powered framework that reads crypto code like a human auditor, guided by a domain corpus and structured prompts, to uncover logic-level vulnerabilities without executing the code. ...

August 18, 2025 · 4 min · Zelina
Cover image

From Black Box to Glass Box: DeepVIS Makes Data Visualization Explain Itself

When business leaders ask for a “quick chart,” they rarely expect to become detectives in the aftermath—trying to work out why the AI picked that chart type, grouped the data that way, or left out important categories. Yet that’s exactly the frustration with most Natural Language to Visualization (NL2VIS) tools today: they generate results like a magician pulling a rabbit from a hat, with no insight into how the trick was done. ...

August 9, 2025 · 3 min · Zelina
Cover image

Reasoning with Both Eyes Open: Why Multimodal Chain-of-Thought Still Trips Up LLMs

If today’s AI models can ace bar exams, explain astrophysics, and generate functional code from a napkin sketch, why do they still fail at seemingly simple questions that require looking and thinking? A new benchmark called MCORE (Multimodal Chain-of-Reasoning Evaluation) answers that question with a resounding: because reasoning across modalities is hard—and we’re not as far along as we thought. Beyond Pattern Matching: What MCORE Tests The majority of multimodal evaluations today rely on either: ...

August 6, 2025 · 3 min · Zelina
Cover image

Thinking Without Talking: How SynAdapt Lets LLMs Reason in Silence

When large language models (LLMs) reason step-by-step using Chain-of-Thought (CoT) prompting, they think out loud. That verbosity improves accuracy—but it’s also a luxury many applications can’t afford. From real-time voice assistants to robotics, excessive token generation slows everything down. The result is a fundamental bottleneck: performance versus efficiency. The paper SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought offers a clever solution. Rather than generating verbose natural language steps, SynAdapt trains LLMs to reason silently, using internal vectors called synthetic continuous CoT (CCoT). And for harder problems—where silence isn’t enough—it smartly reroutes the model back into verbal reasoning mode. This hybrid, adaptive strategy achieves the best of both worlds. ...

August 4, 2025 · 4 min · Zelina
Cover image

How Sparse is Your Thought? Cracking the Inner Logic of Chain-of-Thought Prompts

Chain-of-Thought (CoT) prompting has become a go-to technique for improving multi-step reasoning in large language models (LLMs). But is it really helping models think better—or just encouraging them to bluff more convincingly? A new paper from Leiden University, “How does Chain of Thought Think?”, delivers a mechanistic deep dive into this question. By combining sparse autoencoders (SAEs) with activation patching, the authors dissect whether CoT actually changes what a model internally computes—or merely helps its outputs look better. ...

August 1, 2025 · 3 min · Zelina