APTBench

The Missing Metric: Measuring Agentic Potential Before It’s Too Late In the modern AI landscape, models are not just talkers—they are becoming doers. They code, browse, research, and act within complex environments. Yet, while we’ve become adept at measuring what models know, we still lack a clear way to measure what they can become. APTBench, proposed by Tencent Youtu Lab and Shanghai Jiao Tong University, fills that gap: it’s the first benchmark designed to quantify a model’s agentic potential during pre-training—before costly fine-tuning or instruction stages even begin. ...