Chain-of-Thought

Reason, Reveal, Resist: The Persuasion Duality in Multi‑Agent AI

TL;DR In LLM multi‑agent systems, how a model thinks matters more than how big it is. Explicit reasoning (thinking mode / CoT) creates a Persuasion Duality: sharing a model’s reasoning makes it far better at convincing others, while enabling the model’s own reasoning mode makes it far harder to convince. This shifts best practices for agent design, governance, and product UX. Why this paper matters Cognition—not just parameter count—now drives the social dynamics of agent swarms. For Cognaptus clients building agent workers (ops, compliance, research, trading), the result is practical: toggling reasoning changes not just accuracy, but influence. Your deployment choices can tilt a network toward consensus, stalemate, or resilient truth‑seeking. ...

Keys to the Kingdom: How LLMs Can Audit Crypto Logic Before It Breaks

We’ve gotten good at spotting API misuse in crypto code (think “don’t use ECB,” “don’t hardcode IVs”). But many production failures don’t come from the obvious API call—they’re born in the logic that surrounds it: the parameter checks, corner-case math, and brittle “optimizations.” That’s where CryptoScope steps in: an LLM-powered framework that reads crypto code like a human auditor, guided by a domain corpus and structured prompts, to uncover logic-level vulnerabilities without executing the code. ...

From Black Box to Glass Box: DeepVIS Makes Data Visualization Explain Itself

When business leaders ask for a “quick chart,” they rarely expect to become detectives in the aftermath—trying to work out why the AI picked that chart type, grouped the data that way, or left out important categories. Yet that’s exactly the frustration with most Natural Language to Visualization (NL2VIS) tools today: they generate results like a magician pulling a rabbit from a hat, with no insight into how the trick was done. ...

Reasoning with Both Eyes Open: Why Multimodal Chain-of-Thought Still Trips Up LLMs

If today’s AI models can ace bar exams, explain astrophysics, and generate functional code from a napkin sketch, why do they still fail at seemingly simple questions that require looking and thinking? A new benchmark called MCORE (Multimodal Chain-of-Reasoning Evaluation) answers that question with a resounding: because reasoning across modalities is hard—and we’re not as far along as we thought. Beyond Pattern Matching: What MCORE Tests The majority of multimodal evaluations today rely on either: ...

Thinking Without Talking: How SynAdapt Lets LLMs Reason in Silence

When large language models (LLMs) reason step-by-step using Chain-of-Thought (CoT) prompting, they think out loud. That verbosity improves accuracy—but it’s also a luxury many applications can’t afford. From real-time voice assistants to robotics, excessive token generation slows everything down. The result is a fundamental bottleneck: performance versus efficiency. The paper SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought offers a clever solution. Rather than generating verbose natural language steps, SynAdapt trains LLMs to reason silently, using internal vectors called synthetic continuous CoT (CCoT). And for harder problems—where silence isn’t enough—it smartly reroutes the model back into verbal reasoning mode. This hybrid, adaptive strategy achieves the best of both worlds. ...

How Sparse is Your Thought? Cracking the Inner Logic of Chain-of-Thought Prompts

Chain-of-Thought (CoT) prompting has become a go-to technique for improving multi-step reasoning in large language models (LLMs). But is it really helping models think better—or just encouraging them to bluff more convincingly? A new paper from Leiden University, “How does Chain of Thought Think?”, delivers a mechanistic deep dive into this question. By combining sparse autoencoders (SAEs) with activation patching, the authors dissect whether CoT actually changes what a model internally computes—or merely helps its outputs look better. ...

Thoughts, Exposed: Why Chain-of-Thought Monitoring Might Be AI Safety’s Best Fragile Hope

Imagine debugging a black box. Now imagine that black box occasionally narrates its thoughts aloud. That’s the opportunity—and the fragility—presented by Chain-of-Thought (CoT) monitoring, a newly emergent safety paradigm for large language models (LLMs). In their recent landmark paper, Korbak et al. argue that reasoning traces generated by LLMs—especially those trained for explicit multi-step planning—offer a fleeting yet powerful handle on model alignment. But this visibility, they warn, is contingent, brittle, and already under threat. ...

Backtrack to the Future: How ASTRO Teaches LLMs to Think Like Search Algorithms

A persistent mystery in the recent surge of reasoning-augmented LLMs—like OpenAI’s o1 or DeepSeek-R1—is whether these models learn to reason through post hoc reinforcement fine-tuning, or if they were already good at it to begin with. ASTRO offers a rare counter-example: a method that imbues non-reasoner LLMs (like vanilla Llama 3) with structured reasoning behavior from scratch. Rather than rely on emergent capabilities or distillation from models that already search well, ASTRO teaches LLMs to think like search algorithms themselves, using a hybrid approach combining Monte Carlo Tree Search (MCTS), procedure cloning, chain-of-thought generation, and reinforcement learning with verifiable rewards. ...

Anchored Thinking: Mapping the Inner Compass of Reasoning LLMs

In the world of large language models (LLMs), answers often emerge from an intricate internal dialogue. But what if we could locate the few sentences within that stream of thoughts that disproportionately steer the outcome—like anchors stabilizing a drifting ship? That’s exactly what Paul Bogdan, Uzay Macar, Neel Nanda, and Arthur Conmy aim to do in their new work, “Thought Anchors: Which LLM Reasoning Steps Matter?”. This study presents an ambitious trifecta of methods to trace the true influencers of LLM reasoning. ...

Reasoning on a Sliding Scale: Why One Size Doesn't Fit All in CoT

The Chain-of-Thought (CoT) paradigm has become a cornerstone in improving the reasoning capabilities of large language models (LLMs). But as CoT matures, one question looms larger: Does every problem really need an elaborate chain? In this article, we dive into a new method called AdaR1, which rethinks the CoT strategy by asking not only how to reason—but how much. ...