Rules, RPA, ML, LLMs, and Agents: The Decision Ladder

A practical decision ladder for choosing between rules, RPA, traditional machine learning, LLM workflows, and agent-like systems.

April 23, 2026 · 7 min · Michelle
Cover image

GUI-Eyes: When Agents Learn Where to Look

Screenshots look simple until they are not. A human opening a dense professional application does not inspect every pixel with equal seriousness. We glance, zoom in mentally, ignore decorative clutter, search for the likely region, then focus. In other words, we do not merely “see” the interface. We decide where to look. ...

January 17, 2026 · 15 min · Zelina
Cover image

MobileDreamer: When GUI Agents Stop Guessing and Start Imagining

A phone screen is not difficult because it is visually beautiful. It is difficult because it keeps changing. Tap the wrong button, and a form disappears. Scroll too far, and the useful item vanishes below the fold. Open the wrong menu, and the agent spends the next three steps politely recovering from its own confidence. Anyone who has watched a GUI agent operate a mobile app has seen the pattern: it often looks competent right until the interface asks for a small amount of foresight. ...

January 8, 2026 · 14 min · Zelina
Cover image

Ground and Pound: How Iterative Reasoning Quietly Redefines GUI Grounding

Clicks Are Cheap. Wrong Clicks Are Not. Click. That is the unit where many AI agent demos stop being impressive and start becoming expensive. A planning model can write a beautiful instruction sequence: open the settings panel, choose the correct tab, find the export button, confirm the dialog. Lovely. Then the visual grounding model clicks the button two pixels away from the actual target, or chooses the visually similar icon beside it, or mistakes a disabled control for an active one. Suddenly the “agentic workflow” is not a workflow. It is a small robot poking the wrong part of a screen with great confidence. Very modern. Very avoidable, perhaps. ...

December 2, 2025 · 17 min · Zelina
Cover image

Snapshot, Then Solve: InfraMind’s Playbook for Mission‑Critical GUI Automation

Why this paper matters (for operators, not just researchers) Industrial control stacks (think data center DCIM, grids, water, rail) are hostile terrain for “general” GUI agents: custom widgets, nested hierarchies, air‑gapped deployment, and actions that can actually break things. InfraMind proposes a pragmatic agentic recipe that acknowledges these constraints and designs for them. The result is a system that learns an interface before it tries to use it, then executes with auditability and guardrails. ...

October 1, 2025 · 5 min · Zelina