GUI Automation

GUI-Eyes: When Agents Learn Where to Look

Opening — Why this matters now GUI agents are getting smarter in all the wrong ways. Model sizes grow. Benchmarks inch upward. Training datasets balloon into the tens of millions of annotated clicks. Yet in real interfaces—dense IDEs, CAD tools, enterprise dashboards—agents still miss the obvious. Not because they cannot reason, but because they don’t know where to look. ...

Click, Fail, Learn: Why BEPA Might Be the First GUI Agent That Actually Improves

Opening — Why this matters now Autonomous agents are very good at talking about tasks. They are far less competent at actually doing them—especially when “doing” involves clicking the right icon, interpreting a cluttered interface, or recovering gracefully from failure. GUI agents, in particular, suffer from a chronic problem: once they fail, they either repeat the same mistake or forget everything they once did right. ...

MobileDreamer: When GUI Agents Stop Guessing and Start Imagining

Opening — Why this matters now GUI agents are everywhere in demos and nowhere in production. They click, scroll, and type impressively—right up until the task requires foresight. The moment an interface branches, refreshes, or hides its intent behind two more screens, today’s agents revert to trial-and-error behavior. The core problem isn’t vision. It’s imagination. ...

Echoes, Not Amnesia: Teaching GUI Agents to Remember What Worked

Opening — Why this matters now GUI agents are finally competent enough to click buttons without embarrassing themselves. And yet, they suffer from a strangely human flaw: they forget everything they just learned. Each task is treated as a clean slate. Every mistake is patiently re‑made. Every success is quietly discarded. In a world obsessed with scaling models, this paper asks a simpler, sharper question: what if agents could remember? ...

From GUI Novice to Digital Native: How SEAgent Teaches Itself Software Autonomously

If you’ve ever tried to automate your own software workflows using AI, you’ll know the hard part isn’t reasoning — it’s clicking the right button in a sea of ambiguous icons, drop-downs, and obscure UIs. For agents tasked with navigating GUIs like humans do, the real challenge isn’t logic — it’s context. Enter SEAgent: a self-evolving computer-use agent that doesn’t just learn to operate software — it teaches itself how to learn, using nothing but screenshots, feedback from its own past mistakes, and a clever curriculum. ...