When Papers Learn to Draw: AutoFigure and the End of Ugly Science Diagrams
AutoFigure shows why publication-ready scientific diagrams need reasoning-first visual pipelines, not prettier text-to-image prompts.
AutoFigure shows why publication-ready scientific diagrams need reasoning-first visual pipelines, not prettier text-to-image prompts.
A mechanism-first reading of conversational inertia: why long context can make agents imitate their own mistakes, and why strategic forgetting may beat bigger memory.
Avenir-Web shows why reliable web agents need procedural experience, hybrid grounding, explicit progress tracking, and compressed memory—not just bigger multimodal models.
SafeGround shows how uncertainty calibration can turn GUI agents from reckless clickers into risk-budgeted automation systems.
A mechanism-first reading of MAPPA, a process-reward method for turning multiagent LLM workflows from prompted collaboration into trainable systems.
A business-focused reading of DRIFT-BENCH, showing why agent reliability depends less on asking more questions and more on knowing when clarification helps, when it harms, and when execution must stop.
A mechanism-first reading of why identity-bridge data can weaken the reversal curse in autoregressive LLMs—and why the useful trick is more delicate than it first looks.
A mechanism-first reading of why robust policy iteration for $L_\infty$ robust MDPs is not merely convergent, but strongly polynomial under fixed discount.
RAudit shows why longer reasoning, stronger judges, and harsher critique can reveal LLM failures—but can also amplify them.
A mechanism-first reading of MentisOculi, and why explicit visual thoughts still fail to become reliable reasoning evidence for multimodal AI.