Research Automation

FIRE-BENCH: Playing Back the Tape of Scientific Discovery

Opening — Why this matters now Agentic AI has entered its confident phase. Papers, demos, and product pitches increasingly imply that large language model (LLM)–powered agents can already “do research”: formulate hypotheses, run experiments, and even write papers end to end. The uncomfortable question is not whether they look busy—but whether they actually rediscover truth. ...

Infinite Tasks, Finite Minds: Why Agents Keep Forgetting—and How InfiAgent Cheats Time

Opening — Why this matters now Everyone wants an autonomous agent that can just keep going. Write a literature review. Audit 80 papers. Run an open-ended research project for days. In theory, large language models (LLMs) are perfect for this. In practice, they quietly collapse under their own memory. The problem isn’t model intelligence. It’s state. ...

Forecasting a Smarter Planet: How EarthLink Reimagines Climate Science with Self-Evolving AI Agents

Climate science, once defined by hand-tuned code and static diagnostics, is entering a new phase of automation and adaptability. At the forefront is EarthLink, a self-evolving multi-agent AI platform built specifically to support Earth system science. But this isn’t another LLM wrapper for answering climate questions. EarthLink is something deeper: a scientific collaborator that plans experiments, writes code, debugs itself, interprets results, and learns with each use. From Toolkits to Thinking Partners Traditional tools like ESMValTool or ILAMB have standardized climate model evaluation, but they remain brittle and rigid. They require domain-specific programming expertise and offer little flexibility beyond predefined tasks. In contrast, EarthLink introduces a new paradigm: ...