MobileDreamer: When GUI Agents Stop Guessing and Start Imagining
A phone screen is not difficult because it is visually beautiful. It is difficult because it keeps changing. Tap the wrong button, and a form disappears. Scroll too far, and the useful item vanishes below the fold. Open the wrong menu, and the agent spends the next three steps politely recovering from its own confidence. Anyone who has watched a GUI agent operate a mobile app has seen the pattern: it often looks competent right until the interface asks for a small amount of foresight. ...