FIRE-BENCH: Playing Back the Tape of Scientific Discovery
Opening — Why this matters now Agentic AI has entered its confident phase. Papers, demos, and product pitches increasingly imply that large language model (LLM)–powered agents can already “do research”: formulate hypotheses, run experiments, and even write papers end to end. The uncomfortable question is not whether they look busy—but whether they actually rediscover truth. ...