Ground and Pound: How Iterative Reasoning Quietly Redefines GUI Grounding
Opening — Why this matters now Computer-use agents are finally leaving the demo stage. The problem? They still click the wrong thing. In professional software—CAD suites, IDEs, industrial dashboards—a single mis-grounded element can detonate an entire workflow. And as enterprises move toward AI-assisted operations, grounding mistakes become expensive, embarrassing, or dangerous. The uploaded paper introduces Chain-of-Ground (CoG)【turn0file0】, a deceptively simple idea: stop trusting MLLMs’ first guess, and start making them think twice—literally. It’s a training-free, multi-step reasoning loop that forces models to revise themselves, generating both higher accuracy and clearer interpretability. In an era saturated with ever-larger models, CoG makes a subversive claim: iterating beats inflating. ...