Cover image

Thinking Fast, Remembering Slow: Why SWE-AGILE Fixes the Memory Crisis of AI Agents

Memory sounds like a storage problem. Give the agent a longer context window, let it keep the full conversation, and the work should become easier. This is the kind of solution that looks obvious until it meets a real software repository, a failing test suite, a long terminal log, and a model that now has to find one important clue buried somewhere in the middle of its own autobiography. ...

April 14, 2026 · 18 min · Zelina
Cover image

Anchors Away: Rethinking How AI Agents Learn to Use Tools

A tool-using AI agent usually fails in a very ordinary way. It does not announce a philosophical crisis. It calls the wrong tool, calls the right tool too many times, writes malformed code, searches before thinking, or confidently takes a useless action because the training process rewarded motion rather than judgment. This is the unglamorous part of agent deployment. The demo shows the agent booking, searching, calculating, and reporting. The training log shows wasted exploration, unstable optimization, and a strange habit of confusing “using tools” with “thinking better.” Apparently, giving a model a calculator does not automatically make it an accountant. Shocking. ...

April 13, 2026 · 17 min · Zelina
Cover image

Spatial-Gym and the Illusion of Thinking: Why AI Can’t Walk Before It Runs

Agents are supposed to act. That is the promise hiding behind most enterprise AI demos: the model will not merely answer a question, but inspect a system, choose the next step, correct itself, and reach a useful outcome. The interface changes from chat box to workflow loop, and suddenly everyone starts using the word “agent” with the confidence of a person who has never watched a model get lost in a four-by-four grid. ...

April 13, 2026 · 18 min · Zelina
Cover image

From Chains to Trees: Why LLM Agents Need Structural Memory

Logs are useful. They are also lazy. A business agent that fails halfway through a product search, customer-support flow, compliance checklist, or research workflow will usually leave behind a long trace: thought, action, observation, thought, action, observation. The standard instinct is to read the failed trace as a chain. This step followed that step; the final reward was bad; therefore the chain was bad. Very tidy. Also very wasteful. ...

April 9, 2026 · 18 min · Zelina
Cover image

QED-Nano: Small Models, Big Proof Energy

Cost is usually where AI miracles become accounting problems. A frontier model can look brilliant when it is allowed to spend enormous inference compute, rely on undisclosed training data, and hide the machinery behind a clean demo. Very convenient. Also very hard to reproduce. For businesses, that matters because a capability that cannot be inspected, budgeted, or adapted is not really a capability. It is a vendor promise with a nice interface. ...

April 7, 2026 · 17 min · Zelina
Cover image

Seeing Charts Like a Quant: When RL Teaches Vision Models to Actually Reason

Charts look harmless. A bar chart sits in a dashboard, a line chart appears in a quarterly report, a scatter plot claims there is a relationship, and everyone pretends the machine only needs to “read the image.” This is the polite fiction behind a large share of enterprise AI demos. In practice, chart understanding is not OCR with prettier fonts. A model has to identify the marks, map colors to legends, recover values, decide which numbers matter, perform arithmetic, interpret trends, and then answer the actual question rather than the easier question it secretly substituted. That last step is where many systems go from impressive to quietly expensive. ...

April 6, 2026 · 15 min · Zelina
Cover image

From Pixels to Python: Teaching AI to Fix Its Own Charts

Charts are supposed to make business communication clearer. In practice, they also create a quiet operational tax: screenshots trapped in PDFs, plots copied from old decks, dashboards whose original code has vanished, and reports where one small visual change requires an analyst to rebuild the chart by hand. That is the mundane setting behind a technically interesting paper. MM-ReCoder asks whether a multimodal model can look at a chart image, write Python code to reproduce it, execute the code, inspect the rendered result, and then fix its own mistakes.1 ...

April 5, 2026 · 16 min · Zelina
Cover image

When Language Models Ask for Help: The Curious Case of Uncertain AI

Escalation is the least glamorous part of automation. It is also where many systems either become useful or become expensive theatre. In a normal business workflow, we understand escalation almost instinctively. A junior analyst handles routine invoices. An exception goes to a senior reviewer. A suspicious transaction goes to compliance. A warehouse robot follows a route until the floor plan stops behaving like yesterday’s floor plan. Nobody sensible asks the senior reviewer to approve every invoice. Nobody sensible lets the junior analyst improvise when the case is clearly outside their experience. ...

April 3, 2026 · 14 min · Zelina
Cover image

Approval Isn’t Free: When AI Safety Trades Capability for Control

Approval sounds cheap. In business systems, it is the familiar answer to almost every automation anxiety. Let the model propose, let an overseer approve, let the workflow continue. A trading agent recommends a position; a risk layer approves it. A customer-support agent drafts a refund decision; a policy checker approves it. A recommendation system optimizes engagement; a governance model approves the output. There. Safety added. Please admire the compliance architecture. ...

April 1, 2026 · 14 min · Zelina
Cover image

Skill Issue? Or Skill Strategy — When Agents Start Remembering What Matters

Memory is easy to sell and hard to govern. Every enterprise AI demo eventually reaches the same theatrical moment: the agent remembers something. A prior customer preference. A workflow exception. A formatting habit. A failed action that should not be repeated. Everyone nods. Someone says “continuous learning.” A roadmap slide appears. The slide is almost certainly too optimistic. ...

March 31, 2026 · 17 min · Zelina