Cover image

RxnBench: Reading Chemistry Like a Human (Turns Out That’s Hard)

Opening — Why this matters now Multimodal Large Language Models (MLLMs) have become impressively fluent readers of the world. They can caption images, parse charts, and answer questions about documents that would once have required a human analyst and a strong coffee. Naturally, chemistry was next. But chemistry does not speak in sentences. It speaks in arrows, wedges, dashed bonds, cryptic tables, and reaction schemes buried three pages away from their explanations. If we want autonomous “AI chemists,” the real test is not trivia or SMILES strings — it is whether models can read actual chemical papers. ...

December 31, 2025 · 4 min · Zelina
Cover image

SAGA, Not Sci‑Fi: When LLMs Start Doing Science

Opening — Why this matters now For years, we have asked large language models to explain science. The paper behind SAGA asks a more uncomfortable question: what happens when we ask them to do science instead? Scientific discovery has always been bottlenecked not by ideas, but by coordination — between hypothesis generation, experiment design, evaluation, and iteration. SAGA reframes this entire loop as an agentic system problem. Not a chatbot. Not a single model. A laboratory of cooperating AI agents. ...

December 29, 2025 · 3 min · Zelina