Cover image

From YouTube to Execution: How GUIDE Teaches AI Agents to Actually Use Software

Opening — Why this matters now Everyone is excited about AI agents that can “use a computer.” Few are impressed once they actually try. The failure mode is strangely consistent: the agent understands what you want, but fails somewhere embarrassingly practical—clicking the wrong menu, missing a button, or wandering into a dead-end workflow. This is not a capability problem. It’s a familiarity problem. ...

March 30, 2026 · 5 min · Zelina
Cover image

From Memory to Machinery: Why AI Agents Are Learning to Write Themselves

Opening — Why this matters now There is a quiet but decisive shift happening in the world of AI agents. For the past two years, we’ve been told that agents “learn” by remembering — storing prompts, reflections, and reasoning traces. A polite fiction. Memory, in this context, is little more than annotated hindsight. But real systems don’t scale on hindsight. They scale on reusable execution. ...

March 19, 2026 · 4 min · Zelina
Cover image

OpenSeeker: Breaking the Search Monopoly (One Dataset at a Time)

Opening — Why this matters now Search is no longer a feature. It’s a capability moat. Over the past year, “deep research agents” quietly evolved from novelty demos into decision-making infrastructure. Models are no longer judged by how well they answer, but by how well they search, verify, and synthesize across the web. And yet, despite all the noise about model architectures, one inconvenient truth remains: the best-performing search agents are still controlled by a handful of companies—not because of better models, but because of better data pipelines. ...

March 17, 2026 · 5 min · Zelina
Cover image

AIRS-Bench: When AI Starts Doing the Science, Not Just Talking About It

Opening — Why this matters now For years, AI progress has been narrated through a familiar ritual: introduce a new benchmark, top it with a new model, declare victory, repeat. But as large language models graduate from single-shot answers to multi-step agentic workflows, that ritual is starting to crack. If AI systems are now expected to design experiments, debug failures, iterate on ideas, and judge their own results, then accuracy on static datasets is no longer the right yardstick. ...

February 9, 2026 · 3 min · Zelina
Cover image

DeltaEvolve: When Evolution Learns Its Own Momentum

Opening — Why this matters now LLM-driven discovery systems have crossed an uncomfortable threshold. They no longer fail because models cannot generate ideas, but because they cannot remember the right things. AlphaEvolve, FunSearch, and their successors proved that iterative code evolution works. What they also revealed is a structural bottleneck: context windows are finite, expensive, and poorly used. ...

February 5, 2026 · 4 min · Zelina
Cover image

Search-R2: When Retrieval Learns to Admit It Was Wrong

Opening — Why this matters now Search-integrated LLMs were supposed to be the antidote to hallucination. Give the model tools, give it the web, let it reason step by step—problem solved. Except it wasn’t. What we actually built were agents that search confidently, reason eloquently, and fail quietly. One bad query early on, one misleading paragraph retrieved at the wrong moment, and the whole reasoning chain collapses—yet reinforcement learning still rewards it if the final answer happens to be right. ...

February 4, 2026 · 4 min · Zelina
Cover image

Coaching the Swarm: Why Multi‑Agent RL Finally Scales

Opening — Why this matters now Multi‑agent systems are having a moment. Everywhere you look—AutoGen‑style workflows, agentic data pipelines, research copilots—LLMs are being wired together and told to collaborate. Yet most of these systems share an uncomfortable secret: they don’t actually learn together. They coordinate at inference time, but their weights remain frozen, their mistakes repeatedly rediscovered. ...

February 3, 2026 · 4 min · Zelina
Cover image

FadeMem: When AI Learns to Forget on Purpose

Opening — Why this matters now The race to build smarter AI agents has mostly followed one instinct: remember more. Bigger context windows. Larger vector stores. Ever-growing retrieval pipelines. Yet as agents move from demos to long-running systems—handling days or weeks of interaction—this instinct is starting to crack. More memory does not automatically mean better reasoning. In practice, it often means clutter, contradictions, and degraded performance. Humans solved this problem long ago, not by remembering everything, but by forgetting strategically. ...

February 1, 2026 · 4 min · Zelina
Cover image

MemCtrl: Teaching Small Models What *Not* to Remember

Opening — Why this matters now Embodied AI is hitting a very human bottleneck: memory. Not storage capacity, not retrieval speed—but judgment. Modern multimodal large language models (MLLMs) can see, reason, and act, yet when deployed as embodied agents they tend to remember too much, too indiscriminately. Every frame, every reflection, every redundant angle piles into context until the agent drowns in its own experience. ...

January 31, 2026 · 4 min · Zelina
Cover image

Sequential Beats Parallel: When Deep Research Agents Learn to Reflect

Opening — Why this matters now The last year has been crowded with so-called deep research agents. Everyone parallelizes. Everyone fans out queries. Everyone promises doctoral-level synthesis at web speed. And yet, the leaderboard keeps telling an inconvenient story: throwing more parallel agents at a problem does not reliably buy depth. The paper “Deep Researcher with Sequential Plan Reflection and Candidates Crossover” enters this debate with a pointed thesis: research is not a map-reduce problem. If you want insight, you need memory, reflection, and the ability to change your mind mid-flight. ...

January 31, 2026 · 4 min · Zelina