Cover image

GUI-Eyes: When Agents Learn Where to Look

Opening — Why this matters now GUI agents are getting smarter in all the wrong ways. Model sizes grow. Benchmarks inch upward. Training datasets balloon into the tens of millions of annotated clicks. Yet in real interfaces—dense IDEs, CAD tools, enterprise dashboards—agents still miss the obvious. Not because they cannot reason, but because they don’t know where to look. ...

January 17, 2026 · 4 min · Zelina
Cover image

When Interfaces Guess Back: Implicit Intent Is the New GUI Bottleneck

Opening — Why this matters now GUI agents are getting faster, more multimodal, and increasingly competent at clicking the right buttons. Yet in real life, users don’t talk to software like prompt engineers. They omit details, rely on habit, and expect the system to remember. The uncomfortable truth is this: most modern GUI agents are optimized for obedience, not understanding. ...

January 15, 2026 · 4 min · Zelina
Cover image

Click, Fail, Learn: Why BEPA Might Be the First GUI Agent That Actually Improves

Opening — Why this matters now Autonomous agents are very good at talking about tasks. They are far less competent at actually doing them—especially when “doing” involves clicking the right icon, interpreting a cluttered interface, or recovering gracefully from failure. GUI agents, in particular, suffer from a chronic problem: once they fail, they either repeat the same mistake or forget everything they once did right. ...

January 12, 2026 · 3 min · Zelina
Cover image

TowerMind: When Language Models Learn That Towers Have Consequences

Opening — Why this matters now Large Language Models have become fluent planners. Ask them to outline a strategy, decompose a task, or explain why something should work, and they rarely hesitate. Yet when placed inside an environment where actions cost resources, mistakes compound, and time does not politely pause, that fluency often collapses. ...

January 12, 2026 · 4 min · Zelina
Cover image

NPCs With Short-Term Memory Loss: Benchmarking Agents That Actually Live in the World

Opening — Why this matters now Agentic AI has entered its Minecraft phase again. Not because blocks are trendy, but because open-world games remain one of the few places where planning, memory, execution, and failure collide in real time. Yet most agent benchmarks still cheat. They rely on synthetic prompts, privileged world access, or oracle-style evaluation that quietly assumes the agent already knows where everything is. The result: impressive demos, fragile agents, and metrics that flatter models more than they inform builders. ...

January 10, 2026 · 4 min · Zelina
Cover image

From Tokens to Topology: Teaching LLMs to Think in Simulink

Opening — Why this matters now Large Language Models have become dangerously good at writing text—and conspicuously bad at respecting reality. Nowhere is this mismatch more obvious than in model‑based engineering. Simulink, a cornerstone of safety‑critical industries from automotive to aerospace, is not a playground for eloquence. It is a rigid, graphical, constraint‑heavy environment where hallucinations are not amusing quirks but certification failures. ...

January 9, 2026 · 4 min · Zelina
Cover image

MobileDreamer: When GUI Agents Stop Guessing and Start Imagining

Opening — Why this matters now GUI agents are everywhere in demos and nowhere in production. They click, scroll, and type impressively—right up until the task requires foresight. The moment an interface branches, refreshes, or hides its intent behind two more screens, today’s agents revert to trial-and-error behavior. The core problem isn’t vision. It’s imagination. ...

January 8, 2026 · 4 min · Zelina
Cover image

When Your House Talks Back: Teaching Buildings to Think About Energy

Opening — Why this matters now Buildings quietly consume around a third of the world’s energy. Most of that consumption is governed not by grand strategy, but by human habit: when people cook, charge vehicles, cool rooms, or forget to turn things off. For decades, Building Energy Management Systems (BEMS) promised optimization. In practice, they delivered dashboards—dense, technical, and mostly ignored. ...

January 1, 2026 · 4 min · Zelina
Cover image

Many Arms, Fewer Bugs: Why Coding Agents Need to Stop Working Alone

Opening — Why this matters now For all the breathless demos, AI coding agents still collapse embarrassingly often when faced with real software engineering: large repositories, ambiguous issues, long horizons, and no hand-holding. Benchmarks like SWE-bench-Live have made this painfully explicit. Models that look heroic on curated tasks suddenly forget how to navigate a codebase without spiraling into context soup. ...

December 31, 2025 · 4 min · Zelina
Cover image

The Web, Reimagined as a World Model

Opening — Why this matters now Language agents are no longer satisfied with short conversations and disposable prompts. They want places—environments where actions have consequences, memory persists, and the world does not politely forget everything after the next API call. Unfortunately, today’s tooling offers an awkward choice: either rigid web applications backed by databases, or fully generative world models that hallucinate their own physics and promptly lose the plot. ...

December 30, 2025 · 4 min · Zelina