MCTS | Cognaptus

Rules of Attraction: How LLMs Learn to Judge Better Than We Do

Rubrics are supposed to make judgment boring. That is their charm. A good rubric tells a teacher why one essay deserves a 5 instead of a 3, tells a compliance reviewer why one response is acceptable and another is risky, and tells an internal QA team why a generated summary is useful rather than merely confident. In business, boring judgment is valuable. It scales. It can be audited. It survives employee turnover. It does not wake up one morning and decide that “clarity” now means “vibes with a semicolon.” ...

Branching Out of the Box: Tree‑OPO Turns MCTS Traces into Better RL for Reasoning

Branching Out of the Box: Tree-OPO Turns MCTS Traces into Better RL for Reasoning A search tree is expensive to build. Once you have paid for it, using only the final answers is a little like buying an aircraft engine and admiring the packaging. That is the useful instinct behind Tree-OPO, a paper that asks whether Monte Carlo Tree Search traces from a stronger teacher model can be reused not merely as demonstrations, but as a structured curriculum for training a smaller reasoning policy.1 The idea is not to run MCTS at inference time and call that progress. Nor is it to imitate a teacher’s logits until the student develops the personality of a photocopier. The paper’s more interesting move is subtler: take the partial reasoning states produced by search, let the student complete from those prefixes, and compute advantages in a way that respects where each prefix sits in the tree. ...

Search Party in a Notebook: JUPITER Turns Data Analysis into a Tree Game

A notebook is not just a file. In most companies, it is where the analyst tried three joins, fixed the date column, discovered the leakage, reran the model, cursed quietly, and eventually produced the chart that made it into Monday’s meeting. Then the notebook was archived, copied, half-forgotten, and treated as residue. ...

From Trees to Truths: Making MCTS Talk with Logic-Backed LLMs

TL;DR for operators If your optimisation system can choose the route, assign the vehicle, or schedule the job but cannot explain why, the obvious temptation is to bolt on a chatbot and call the matter solved. That is also how one gets fluent nonsense with a user interface. The paper behind this article proposes a better pattern: let the LLM translate a user’s question into formal variables and logic, evaluate those variables against the actual Monte Carlo Tree Search tree, retrieve domain knowledge only when the question calls for it, and then generate the final natural-language explanation.1 The LLM is still useful, but it is no longer allowed to improvise the evidence. A small mercy, really. ...