
When More Becomes Smarter: The Unreasonable Effectiveness of Scaling Agents
From repetition to reasoning When early computer-use agents (CUAs) appeared, they promised to automate tedious digital workflows—clicking through files, formatting reports, or organizing spreadsheets. Yet anyone who has tried them knows the frustration: sometimes they succeed spectacularly, sometimes they click the wrong button and crash everything. Reliability, not intelligence, has been the missing link. A recent paper from Simular Research, “The Unreasonable Effectiveness of Scaling Agents for Computer Use,” shows that scaling these agents isn’t just about more compute—it’s about how we scale. Their method, Behavior Best-of-N (bBoN), turns the brute-force idea of “run many agents and hope one works” into a structured, interpretable, and near-human-level solution. ...