Cover image

Talking to Yourself, but Make It Useful: Intrinsic Self‑Critique in LLM Planning

“Please double-check your work” is one of the least expensive quality-control systems ever invented. It is also one of the least dependable. A person who overlooked a constraint the first time may overlook it again. A language model is no different, except that it can produce a longer and more persuasive explanation of why the overlooked constraint was never important. ...

January 3, 2026 · 17 min · Zelina
Cover image

SpatialBench: When AI Meets Messy Biology

A dataset arrives. Not a clean demo dataset. Not a tidy CSV with three columns and a tutorial notebook waiting nearby like a hotel concierge. A real spatial biology dataset arrives: high-dimensional, platform-specific, noisy, partially processed, full of tacit assumptions, and attached to a scientific question that cannot be answered by knowing biology in the abstract. ...

December 29, 2025 · 17 min · Zelina
Cover image

When Actions Need Nuance: Learning to Act Precisely Only When It Matters

A warehouse robot does not always need elegance. In an open aisle, “move forward a bit” is probably good enough. Near a shelf, a wall, or a human ankle, “a bit” becomes an expensive philosophy. That is the practical problem behind Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions, the paper introducing PEARL: Parameterized Extended state/action Abstractions for Reinforcement Learning.1 The paper is not really about making reinforcement learning more fashionable. Mercifully. It is about making action precision conditional. ...

December 28, 2025 · 14 min · Zelina
Cover image

AGI by Committee: Why the First General Intelligence Won’t Arrive Alone

The meeting room is already becoming the machine Meeting rooms are underrated metaphors for intelligence. A company can produce a market forecast, negotiate a contract, audit a supplier, design a campaign, and respond to a legal dispute without any single employee understanding the whole operation. The intelligence is distributed. One person knows finance. Another knows regulation. Someone else knows the client. A manager routes the work. A spreadsheet remembers what everyone forgot. Somehow, the organization acts. ...

December 19, 2025 · 18 min · Zelina
Cover image

Delegating to the Almost-Aligned: When Misaligned AI Is Still the Rational Choice

A manager does not hire a consultant because the consultant shares every value, incentive, and emotional preference of the firm. The consultant wants fees. The doctor wants throughput. The lawyer wants billable hours. The cloud provider wants usage. Humanity, somehow, survives this scandal. The real delegation question has never been: “Is this agent perfectly aligned with me?” It is: “Will things go better if I let this agent decide here?” ...

December 18, 2025 · 14 min · Zelina
Cover image

Ports, But Make Them Agentic: When LLMs Start Running the Yard

Ports are already full of automation. Cranes move containers, AGVs follow routes, software coordinates flows, dashboards blink reassuringly at managers who are paid to pretend that blinking equals control. Then one terminal changes its layout, closes a road, adds a vehicle restriction, or introduces a new safety corridor. Suddenly the “automated” dispatching system needs engineers, operations researchers, domain experts, test scripts, model reformulation, solver debugging, and several meetings where everyone discovers that “just adjust the rule” was not, in fact, just. ...

December 17, 2025 · 16 min · Zelina
Cover image

Bits, Bets, and Budgets: When Agents Should Walk Away

Budget is not an afterthought Budget is usually treated as the boring part of agent design. The exciting part is the agent: planning, calling tools, trying strategies, revising itself, and occasionally behaving like a junior analyst who has discovered both confidence and the corporate credit card. But in real automation, budget is not boring. Budget is the boundary between useful autonomy and expensive wandering. ...

December 9, 2025 · 16 min · Zelina
Cover image

Breaking Rules, Not Systems: How Penalties Make Autonomous Agents Behave

Emergency is a terrible product requirement. It sounds simple in a meeting: “The agent should follow policy, except when the situation is urgent.” Wonderful. Very human. Also almost useless. A delivery robot should not enter a restricted zone. Unless the package is critical medicine. A warehouse agent should not skip safety checks. Unless a fire alarm requires rerouting. A self-driving system should obey traffic norms. Unless an emergency trip makes delay costly. But “unless urgent” does not tell the agent which rule can bend, which rule must hold, and which shortcut turns the system from flexible into reckless. ...

December 4, 2025 · 15 min · Zelina
Cover image

Prompting on Life Support: How Invasive Context Engineering Fights Long-Context Drift

The prompt was clear. Then the conversation kept going. A familiar enterprise AI story starts politely enough. The legal assistant is told to be conservative. The medical triage bot is told not to diagnose. The procurement agent is told never to approve a vendor without documented checks. Everyone nods. The system prompt is immaculate. Compliance is laminated. ...

December 3, 2025 · 15 min · Zelina
Cover image

Debate Club for Robots: How Multi-Agent Arguing Makes Embodied AI Safer

The robot should not need a philosophy seminar before using a microwave Microwaves are excellent devices for exposing weak safety logic. A normal household assistant can be asked to warm food, boil water, clean a counter, water a plant, or move objects around a kitchen. Most of these tasks are harmless. Some are not. “Put a book into the microwave and turn it on” is not a creative lifestyle experiment. It is a fire hazard with better lighting. ...

November 28, 2025 · 17 min · Zelina