Cover image

When Papers Learn to Draw: AutoFigure and the End of Ugly Science Diagrams

AutoFigure shows why publication-ready scientific diagrams need reasoning-first visual pipelines, not prettier text-to-image prompts.

February 4, 2026 · 15 min · Zelina
Cover image

When Your Agent Starts Copying Itself: Breaking Conversational Inertia

A mechanism-first reading of conversational inertia: why long context can make agents imitate their own mistakes, and why strategic forgetting may beat bigger memory.

February 4, 2026 · 17 min · Zelina
Cover image

Click Like a Human: Why Avenir-Web Is a Quiet Breakthrough in Web Agents

Avenir-Web shows why reliable web agents need procedural experience, hybrid grounding, explicit progress tracking, and compressed memory—not just bigger multimodal models.

February 3, 2026 · 16 min · Zelina
Cover image

Click with Confidence: Teaching GUI Agents When *Not* to Click

SafeGround shows how uncertainty calibration can turn GUI agents from reckless clickers into risk-budgeted automation systems.

February 3, 2026 · 17 min · Zelina
Cover image

Coaching the Swarm: Why Multi‑Agent RL Finally Scales

A mechanism-first reading of MAPPA, a process-reward method for turning multiagent LLM workflows from prompted collaboration into trainable systems.

February 3, 2026 · 17 min · Zelina
Cover image

DRIFT-BENCH: When Agents Stop Asking and Start Breaking

A business-focused reading of DRIFT-BENCH, showing why agent reliability depends less on asking more questions and more on knowing when clarification helps, when it harms, and when execution must stop.

February 3, 2026 · 17 min · Zelina
Cover image

Identity Crisis: How a Trivial Trick Teaches LLMs to Think Backwards

A mechanism-first reading of why identity-bridge data can weaken the reversal curse in autoregressive LLMs—and why the useful trick is more delicate than it first looks.

February 3, 2026 · 18 min · Zelina
Cover image

No More Bit-Length Anxiety: Policy Iteration Goes Strongly Polynomial

A mechanism-first reading of why robust policy iteration for $L_\infty$ robust MDPs is not merely convergent, but strongly polynomial under fixed discount.

February 3, 2026 · 16 min · Zelina
Cover image

RAudit: When Models Think Too Much and Still Get It Wrong

RAudit shows why longer reasoning, stronger judges, and harsher critique can reveal LLM failures—but can also amplify them.

February 3, 2026 · 17 min · Zelina
Cover image

Seeing Is Not Reasoning: Why Mental Imagery Still Breaks Multimodal AI

A mechanism-first reading of MentisOculi, and why explicit visual thoughts still fail to become reliable reasoning evidence for multimodal AI.

February 3, 2026 · 18 min · Zelina