LLM Reasoning

RAudit: When Models Think Too Much and Still Get It Wrong

The model is not always confused. Sometimes it has already done the work, reached the right answer, and then politely walks away from it because the user sounded confident. That is the quietly irritating problem behind RAudit, a paper that studies how large language models behave when their reasoning is audited without giving the auditor the correct answer.1 The paper is not just another “LLMs can be sycophantic” warning. We have enough of those. At this point, saying models flatter users is like saying spreadsheets contain hidden errors. True, useful, and somehow still not enough to change deployment practice. ...

Think-with-Me: When LLMs Learn to Stop Thinking

A model can be wrong because it did not think enough. That part is easy to understand. The more annoying failure is when the model already had the answer, kept going, second-guessed itself into a ditch, and then presented the ditch with confidence. This is the special comedy of large reasoning models: sometimes the expensive part is not the intelligence, but the hesitation after the intelligence has already done its job. ...

Batch of Thought, Not Chain of Thought: Why LLMs Reason Better Together

Fraud review is not a solo sport. A risk analyst looking at one suspicious seller can notice a strange product description, a vague company name, or a price range that feels wrong. But the real signal often appears only when several sellers are placed side by side. One shop looks unusual. Ten shops with the same naming pattern, same product mismatch, and same pricing behavior start to look less like noise and more like a system. ...

Adversaries, Slices, and the Art of Teaching LLMs to Think

A math tutor does not wait until the end of a two-page solution, circle the final answer, and say “wrong.” At least, not a good one. The useful tutor interrupts earlier. This line follows. That parity condition does not. This factorization is legal, but the conclusion you drew from it is not. The feedback is local, not theatrical. It tells the student where the reasoning began to rot, before the final answer becomes merely the visible corpse. ...

Thinking in Branches: Why LLM Reasoning Needs an Algorithmic Theory

A manager asks an AI system for a risk assessment. It gives a plausible answer. The manager asks again with a slightly different prompt. Another plausible answer appears, with different reasoning. Ask five more times and the system scatters clues across the attempts like a consultant who has read the documents but refuses to assemble the memo in one draft. ...

Stuck on Repeat: Why LLMs Reinforce Their Own Bad Ideas

Meetings have a familiar failure mode. Someone states an early opinion, then spends the next thirty minutes “thinking through the issue” in a way that somehow makes the original opinion look increasingly inevitable. Evidence enters the room. Counterarguments are acknowledged. The conclusion remains suspiciously loyal to the opening bid. Apparently, large language models have been attending the same meetings. ...

From Building Blocks to Breakthroughs: Why RL Finally Teaches Models to Think

Training an AI model is often sold like a kitchen renovation: add more data, add reinforcement learning, install the shiny reasoning countertop, and suddenly the whole thing looks expensive enough to be intelligent. This paper is useful because it ruins that brochure. The authors of Atomic Skills are the Prerequisite: When Reinforcement Learning Synthesizes Compositional Reasoning, and When It Only Amplifies ask a deceptively simple question: does reinforcement learning create new reasoning ability, or does it only increase the probability of behaviors the model could already produce?1 Their answer is not the clean slogan either camp wants. RL can synthesize new compositional reasoning, but only when the model has already learned the right underlying atomic skills. Without that foundation, RL mostly polishes whatever behavior already exists. Sometimes that is reasoning. Sometimes it is just a better-trained shortcut wearing a lab coat. ...

Mind the Gaps: Why LLMs Reason Like Brilliant Amnesiacs

A model can write a flawless explanation, check its own work, announce a correction, and then make the same mistake three paragraphs later. This is the familiar enterprise horror show: the AI appears to reason, but its reasoning has no working memory of its own commitments. It is articulate, capable, and sometimes genuinely useful. It is also, in the wrong setting, a brilliant amnesiac. ...

Plan>Then>Profit: Reinforcement Learning That Teaches LLMs to Outline Before They Think

Planning is usually the part of work everybody claims to value and nobody wants to inspect. The deck has a roadmap. The project has a strategy. The model has a chain of thought. Splendid. Now, does the plan actually make the execution better, or is it just theatre with bullet points? That is the useful question behind Plan Then Action: High-Level Planning Guidance Reinforcement Learning for LLM Reasoning, which introduces PTA-GRPO, a reinforcement-learning method that trains language models to generate an explicit analytic plan before detailed reasoning and then rewards the quality of that plan, not merely the final answer.1 ...

Parallel Minds, Shorter Time: ParaThinker’s Native Thought Width

A familiar enterprise AI failure looks less like stupidity and more like stubbornness. Ask a model to solve a hard problem, and it may begin confidently in the wrong direction. Then it keeps going. It adds details. It self-reflects. It spends tokens. It may even apologise to itself internally, which is apparently what we call progress now. But the core path does not change. The model is not merely short on compute. It is trapped inside its own first guess. ...