Cover image

MemCtrl: Teaching Small Models What *Not* to Remember

Opening — Why this matters now Embodied AI is hitting a very human bottleneck: memory. Not storage capacity, not retrieval speed—but judgment. Modern multimodal large language models (MLLMs) can see, reason, and act, yet when deployed as embodied agents they tend to remember too much, too indiscriminately. Every frame, every reflection, every redundant angle piles into context until the agent drowns in its own experience. ...

January 31, 2026 · 4 min · Zelina
Cover image

When Rewards Learn to Think: Teaching Agents *How* They’re Wrong

Opening — Why this matters now Agentic AI is having a credibility problem. Not because agents can’t browse, code, or call tools—but because we still train them like they’re taking a final exam with no partial credit. Most agentic reinforcement learning (RL) systems reward outcomes, not process. Either the agent finishes the task correctly, or it doesn’t. For short problems, that’s tolerable. For long-horizon, tool-heavy reasoning tasks, it’s catastrophic. A single late-stage mistake erases an otherwise competent trajectory. ...

January 30, 2026 · 4 min · Zelina
Cover image

Learning to Discover at Test Time: When Search Learns Back

Opening — Why this matters now For years, scaling AI meant one thing: train bigger models, then freeze them. At inference time, we search harder, sample wider, and hope brute force compensates for epistemic limits. This paper challenges that orthodoxy. It argues—quietly but decisively—that search alone is no longer enough. If discovery problems are truly out-of-distribution, then the model must be allowed to learn at test time. ...

January 24, 2026 · 3 min · Zelina
Cover image

When LLMs Get a Laptop: Why Sandboxes Might Be the Real AGI Benchmark

Opening — Why this matters now LLMs have learned to speak fluently. They can reason passably. Some can even plan. Yet most of them remain trapped in an oddly artificial condition: they think, but they cannot act. The latest wave of agent frameworks tries to fix this with tools, APIs, and carefully curated workflows. But a quieter idea is emerging underneath the hype—one that looks less like prompt engineering and more like infrastructure. ...

January 24, 2026 · 4 min · Zelina
Cover image

Skeletons in the Proof Closet: When Lean Provers Need Hints, Not More Compute

Opening — Why this matters now Neural theorem proving has entered its industrial phase. With reinforcement learning pipelines, synthetic data factories, and search budgets that would make a chess engine blush, models like DeepSeek‑Prover‑V1.5 are widely assumed to have internalized everything there is to know about formal proof structure. This paper politely disagrees. Under tight inference budgets—no massive tree search, no thousand-sample hail‑Mary—the author shows that simple, almost embarrassingly old‑fashioned structural hints still deliver large gains. Not new models. Not more data. Just better scaffolding. ...

January 23, 2026 · 4 min · Zelina
Cover image

Your Agent Remembers—But Can It Forget?

Opening — Why this matters now As reinforcement learning (RL) systems inch closer to real-world deployment—robotics, autonomous navigation, decision automation—a quiet assumption keeps slipping through the cracks: that remembering is enough. Store the past, replay it when needed, act accordingly. Clean. Efficient. Wrong. The paper Memory Retention Is Not Enough to Master Memory Tasks in Reinforcement Learning dismantles this assumption with surgical precision. Its core claim is blunt: agents that merely retain information fail catastrophically once the world changes. Intelligence, it turns out, depends less on what you remember than on what you are able to forget. ...

January 22, 2026 · 4 min · Zelina
Cover image

GUI-Eyes: When Agents Learn Where to Look

Opening — Why this matters now GUI agents are getting smarter in all the wrong ways. Model sizes grow. Benchmarks inch upward. Training datasets balloon into the tens of millions of annotated clicks. Yet in real interfaces—dense IDEs, CAD tools, enterprise dashboards—agents still miss the obvious. Not because they cannot reason, but because they don’t know where to look. ...

January 17, 2026 · 4 min · Zelina
Cover image

MatchTIR: Stop Paying Every Token the Same Salary

Opening — Why this matters now Tool-using agents are no longer a novelty. They are quietly becoming the default interface between LLMs and the real world: APIs, databases, search engines, execution environments. Yet most reinforcement learning pipelines still behave as if every step in a trajectory deserves the same bonus. That assumption was tolerable when tasks were short. It collapses when agents think, call tools, fail, retry, and recover over ten or more turns. ...

January 17, 2026 · 4 min · Zelina
Cover image

Seeing Is Thinking: When Multimodal Reasoning Stops Talking and Starts Drawing

Opening — Why this matters now Multimodal AI has spent the last two years narrating its thoughts like a philosophy student with a whiteboard it refuses to use. Images go in, text comes out, and the actual visual reasoning—zooming, marking, tracing, predicting—happens offstage, if at all. Omni-R1 arrives with a blunt correction: reasoning that depends on vision should generate vision. ...

January 15, 2026 · 4 min · Zelina
Cover image

When Agents Learn Without Learning: Test-Time Reinforcement Comes of Age

Opening — Why this matters now Multi-agent LLM systems are having a moment. From collaborative coding bots to diagnostic committees and AI tutors, orchestration is increasingly the default answer to hard reasoning problems. But there’s an inconvenient truth hiding behind the demos: training multi-agent systems with reinforcement learning is expensive, unstable, and often counterproductive. ...

January 15, 2026 · 4 min · Zelina