Cover image

When Agents Learn Without Learning: Test-Time Reinforcement Comes of Age

Opening — Why this matters now Multi-agent LLM systems are having a moment. From collaborative coding bots to diagnostic committees and AI tutors, orchestration is increasingly the default answer to hard reasoning problems. But there’s an inconvenient truth hiding behind the demos: training multi-agent systems with reinforcement learning is expensive, unstable, and often counterproductive. ...

January 15, 2026 · 4 min · Zelina
Cover image

STACKPLANNER: When Agents Learn to Forget

Opening — Why this matters now Multi-agent systems built on large language models are having a moment. From research copilots to autonomous report generators, the promise is seductive: split a complex task into pieces, let specialized agents work in parallel, and coordinate everything with a central planner. In practice, however, these systems tend to collapse under their own cognitive weight. ...

January 12, 2026 · 4 min · Zelina
Cover image

When Debate Stops Being a Vote: DynaDebate and the Engineering of Reasoning Diversity

Opening — Why this matters now Multi-agent debate was supposed to be the antidote to brittle single-model reasoning. Add more agents, let them argue, and truth would somehow emerge from friction. In practice, what often emerges is something closer to a polite echo chamber. Despite the growing popularity of Multi-Agent Debate (MAD) frameworks, many systems quietly degenerate into majority voting over nearly identical reasoning paths. When all agents make the same mistake—just phrased slightly differently—debate becomes theater. The paper DynaDebate: Breaking Homogeneity in Multi-Agent Debate with Dynamic Path Generation tackles this problem head-on, and, refreshingly, does so by treating reasoning as an engineered process rather than a conversational one. fileciteturn0file0 ...

January 12, 2026 · 4 min · Zelina
Cover image

ResMAS: When Multi‑Agent Systems Stop Falling Apart

Opening — Why this matters now Multi-agent systems (MAS) built on large language models have developed a bad habit: they work brilliantly—right up until the moment one agent goes off-script. A single failure, miscommunication, or noisy response can quietly poison the entire collaboration. In production environments, this isn’t a hypothetical risk; it’s the default operating condition. ...

January 11, 2026 · 4 min · Zelina
Cover image

Agents Gone Rogue: Why Multi-Agent AI Quietly Falls Apart

Opening — Why this matters now Multi-agent AI systems are having their moment. From enterprise automation pipelines to financial analysis desks, architectures built on agent collaboration promise scale, specialization, and autonomy. They work beautifully—at first. Then something subtle happens. Six months in, accuracy slips. Agents talk more, decide less. Human interventions spike. No code changed. No model was retrained. Yet performance quietly erodes. This paper names that phenomenon with unsettling clarity: agent drift. ...

January 8, 2026 · 4 min · Zelina
Cover image

Many Arms, Fewer Bugs: Why Coding Agents Need to Stop Working Alone

Opening — Why this matters now For all the breathless demos, AI coding agents still collapse embarrassingly often when faced with real software engineering: large repositories, ambiguous issues, long horizons, and no hand-holding. Benchmarks like SWE-bench-Live have made this painfully explicit. Models that look heroic on curated tasks suddenly forget how to navigate a codebase without spiraling into context soup. ...

December 31, 2025 · 4 min · Zelina
Cover image

When Reflection Needs a Committee: Why LLMs Think Better in Groups

Opening — Why this matters now LLMs have learned how to explain themselves. What they still struggle with is learning from those explanations. Reflexion was supposed to close that gap: let the model fail, reflect in natural language, try again — no gradients, no retraining, just verbal reinforcement. Elegant. Cheap. And, as this paper demonstrates, fundamentally limited. ...

December 28, 2025 · 3 min · Zelina
Cover image

FinAgent: When AI Starts Shopping for Your Groceries (and Your Health)

Opening — Why this matters now Inflation doesn’t negotiate, food prices don’t stay put, and household budgets—especially middle‑income ones—are asked to perform daily miracles. Most digital tools respond politely after the damage is done: expense trackers explain where money went, diet apps scold what you ate. What they rarely do is coordinate. This paper proposes FinAgent, an agentic AI system that does something radical by modern standards: it plans ahead, adapts continuously, and treats nutrition and money as the same optimization problem. ...

December 25, 2025 · 4 min · Zelina
Cover image

Don’t Tell the Robot What You Know

Opening — Why this matters now Large Language Models are very good at knowing. They are considerably worse at helping. As AI systems move from chat interfaces into robots, copilots, and assistive agents, collaboration becomes unavoidable. And collaboration exposes a deeply human cognitive failure that LLMs inherit wholesale: the curse of knowledge. When one agent knows more than another, it tends to communicate as if that knowledge were shared. ...

December 20, 2025 · 4 min · Zelina
Cover image

Artism, or How AI Learned to Critique Itself

Opening — Why this matters now AI didn’t kill originality. It industrialized its absence. Contemporary art has been circling the same anxiety for decades: the sense that everything has already been done, named, theorized, archived. AI merely removed the remaining friction. What once took years of study and recombination now takes seconds of probabilistic interpolation. The result is not a new crisis, but a visible one. ...

December 18, 2025 · 4 min · Zelina