Cover image

Learning to Discover at Test Time: When Search Learns Back

Opening — Why this matters now For years, scaling AI meant one thing: train bigger models, then freeze them. At inference time, we search harder, sample wider, and hope brute force compensates for epistemic limits. This paper challenges that orthodoxy. It argues—quietly but decisively—that search alone is no longer enough. If discovery problems are truly out-of-distribution, then the model must be allowed to learn at test time. ...

January 24, 2026 · 3 min · Zelina
Cover image

SAGA, Not Sci‑Fi: When LLMs Start Doing Science

Opening — Why this matters now For years, we have asked large language models to explain science. The paper behind SAGA asks a more uncomfortable question: what happens when we ask them to do science instead? Scientific discovery has always been bottlenecked not by ideas, but by coordination — between hypothesis generation, experiment design, evaluation, and iteration. SAGA reframes this entire loop as an agentic system problem. Not a chatbot. Not a single model. A laboratory of cooperating AI agents. ...

December 29, 2025 · 3 min · Zelina
Cover image

From Benchmarks to Beakers: Stress‑Testing LLMs as Scientific Co‑Scientists

Opening — Why this matters now Large Language Models have already aced exams, written code, and argued philosophy with unsettling confidence. The obvious next step was inevitable: can they do science? Not assist, not summarize—but reason, explore, and discover. The paper behind this article asks that question without romance. It evaluates LLMs not as chatbots, but as proto‑scientists, and then measures how far the illusion actually holds. ...

December 18, 2025 · 3 min · Zelina
Cover image

The Benchmark Awakens: AstaBench and the New Standard for Agentic Science

The Benchmark Awakens: AstaBench and the New Standard for Agentic Science The latest release from the Allen Institute for AI, AstaBench, represents a turning point for how the AI research community evaluates large language model (LLM) agents. For years, benchmarks like MMLU or ARC have tested narrow reasoning and recall. But AstaBench brings something new—it treats the agent not as a static model, but as a scientific collaborator with memory, cost, and strategy. ...

October 31, 2025 · 4 min · Zelina
Cover image

The Missing Link: How AI Maps Hidden Properties in Materials Science

The search for new superconductors, energy materials, and exotic compounds often begins not in a lab—but in a database. Yet despite decades of digitization, scientific knowledge remains fragmented across millions of papers, scattered ontologies, and uncharted connections. A new study from Los Alamos National Laboratory proposes an AI-driven framework that doesn’t just analyze documents—it predicts the next breakthrough. From Papers to Properties: A Three-Tiered Approach At the heart of this method is a clever ensemble pipeline that combines interpretability with predictive power. The authors start by mapping over 46,000 papers on transition-metal dichalcogenides (TMDs)—a key class of 2D materials—into a matrix of latent topics and material mentions. Then they apply a hierarchical modeling approach: ...

July 13, 2025 · 3 min · Zelina
Cover image

Passing Humanity's Last Exam: X-Master and the Emergence of Scientific AI Agents

Is it possible to train a language model to become a capable scientist? That provocative question lies at the heart of a new milestone in AI research. In SciMaster: Towards General-Purpose Scientific AI Agents, a team from Shanghai Jiao Tong University introduces X-Master, a tool-augmented open-source agent that has just achieved the highest score ever recorded on Humanity’s Last Exam (HLE)—surpassing even OpenAI and Google. But what makes this feat more than just a leaderboard update is how X-Master got there. Instead of training a larger model or fine-tuning on more data, the researchers innovated on agentic architecture and inference-time workflows. The result? An extensible framework that emulates the exploratory behavior of human scientists, not just their answers. ...

July 8, 2025 · 4 min · Zelina