Cover image

Branching Out of the Box: Tree‑OPO Turns MCTS Traces into Better RL for Reasoning

The punchline Tree‑OPO takes something many labs already produce—MCTS rollouts from a stronger teacher—and treats them not just as answers but as a curriculum of prefixes. It then optimizes a student with GRPO-like updates, but with staged, tree-aware advantages instead of a flat group mean. The result in math reasoning (GSM8K) is a modest but consistent bump over standard GRPO while keeping memory/complexity low. Why this matters for practitioners: you can get more out of your expensive searches (or teacher traces) without training a value model or lugging around teacher logits during student training. ...

September 17, 2025 · 5 min · Zelina
Cover image

Search Party in a Notebook: JUPITER Turns Data Analysis into a Tree Game

TL;DR Why this paper matters: It shows that how you search matters more than how big your model is for multi‑step, tool‑using analytics. With a notebook‑grounded dataset (NbQA) and value‑guided search, mid‑size open models rival GPT‑4o–based agents on a leading data‑analysis benchmark. What’s new: (1) NbQA, a large corpus of real Jupyter tasks with executable multi‑step solutions; (2) JUPITER, a planner that treats analysis as a tree search over “thought → code → output” steps, guided by a learned value model. Why you should care (operator’s view): This blueprint turns flaky “Code Interpreter”-style sessions into repeatable playbooks—fewer dead ends, more auditable steps, and better generalization without paying for the biggest model. The core idea: analytics as a search tree Most LLM data‑analysis failures come from branching mistakes: choosing the wrong intermediate step, compounding errors, and wasting tool calls. JUPITER reframes the whole exercise as search over notebook states. Each node is a concrete state—accumulated thoughts, code, and execution outputs. The system expands only a few promising branches and prunes the rest using a value model trained on successful and failed trajectories. ...

September 17, 2025 · 5 min · Zelina
Cover image

From Trees to Truths: Making MCTS Talk with Logic-Backed LLMs

In the quest to make AI more trustworthy, few challenges loom larger than explaining sequential decision-making algorithms like Monte Carlo Tree Search (MCTS). Despite its success in domains from transit scheduling to game playing, MCTS remains a black box to most practitioners, generating decisions from expansive trees of sampled possibilities without accessible rationale. A new framework proposes to change that by fusing LLMs with formal logic to bring transparency and dialogue to this crucial planning tool1. ...

May 4, 2025 · 6 min