Test-Time Compute

Think Before You Click: Test-Time AI Is the New Control Surface

TL;DR for operators AI control is moving downstream. The old operational story was simple enough to fit on a procurement slide: train a better model, deploy it, monitor aggregate metrics, repeat until morale improves. That story is now inadequate. Increasingly, the important decision is not only what the model learned during training, but what the system does after this exact input arrives. ...

QED-Nano: Small Models, Big Proof Energy

Cost is usually where AI miracles become accounting problems. A frontier model can look brilliant when it is allowed to spend enormous inference compute, rely on undisclosed training data, and hide the machinery behind a clean demo. Very convenient. Also very hard to reproduce. For businesses, that matters because a capability that cannot be inspected, budgeted, or adapted is not really a capability. It is a vendor promise with a nice interface. ...

Diffusion Decoding Gets a Personality: When Diversity Stops Being Accidental

Choices are cheap until they all look the same. That is the awkward little problem behind many “generate multiple answers” interfaces. A model produces five suggestions, ten drafts, or thirty candidate solutions; the UI proudly displays variety; and then a human notices that most options are the same answer wearing different shoes. Good shoes, perhaps. Still the same answer. ...

Thinking in Branches: Why LLM Reasoning Needs an Algorithmic Theory

A manager asks an AI system for a risk assessment. It gives a plausible answer. The manager asks again with a slightly different prompt. Another plausible answer appears, with different reasoning. Ask five more times and the system scatters clues across the attempts like a consultant who has read the documents but refuses to assemble the memo in one draft. ...

Recurrent Revival: How Retrofitted Depth Turns LLMs Into Deeper Thinkers

Compute is the bill that arrives after every AI strategy meeting. Everyone wants stronger reasoning. Fewer hallucinations. Better mathematical reliability. More robust planning. The usual menu is familiar: train a bigger model, sample more answers, generate longer chain-of-thought, bolt on a verifier, or pray to the GPU procurement gods. Elegant, in the way an invoice can be elegant. ...

Parallel Minds, Shorter Time: ParaThinker’s Native Thought Width

A familiar enterprise AI failure looks less like stupidity and more like stubbornness. Ask a model to solve a hard problem, and it may begin confidently in the wrong direction. Then it keeps going. It adds details. It self-reflects. It spends tokens. It may even apologise to itself internally, which is apparently what we call progress now. But the core path does not change. The model is not merely short on compute. It is trapped inside its own first guess. ...

Plan, Don't Spam: The Goldilocks Rule for Test‑Time Compute

A busy agent is not necessarily a thinking agent. Anyone who has watched an LLM agent narrate every tiny move knows the feeling. It reviews the goal. It drafts a plan. It revises the plan. It reconsiders the revision. Then, with exquisite deliberation, it clicks the wrong button. The transcript looks intelligent; the behaviour looks like a consultant trapped in a revolving door. ...

Enhancing Privately Deployed AI Models: A Sampling-Based Search Approach

TL;DR for operators Private AI pilots usually fail in a familiar place: the model gives one confident answer, everyone pretends the confidence means something, and then a human quietly redoes the work. Sampling-based search offers a more disciplined alternative. Instead of asking a privately deployed model for one answer, the system asks for many candidate answers, verifies them, compares the strongest contenders, and returns the answer with the best support. The target paper, Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification, studies this pattern at meaningful scale and shows that a minimalist version can materially improve reasoning performance without retraining the base model.1 ...