Cover image

Fragments, Feedback, and Fast Drugs: When Generative Models Grow a Spine

A lab does not slow down because nobody can generate molecules. That is the polite fiction. In many drug discovery workflows, candidate molecules can be generated in bulk. The slower part comes after generation: chemists inspect what the model proposes, explain what looks wrong or promising, and then someone has to translate that feedback into the model’s objective function. This “someone” is usually an AI engineer who understands the code but not necessarily the medicinal chemistry intuition. The chemist understands the target, the scaffold, and the quiet reasons a molecule feels suspicious. The model understands none of that unless the translation layer works. ...

November 26, 2025 · 15 min · Zelina
Cover image

Benchmarks Without Borders: Inside the Moduli Space of AI Psychometrics

Procurement Has a Benchmark Problem Procurement teams love benchmark tables. They are clean, sortable, and emotionally comforting. Vendor A beats Vendor B by 3.7 points on a reasoning suite; Vendor C wins on code generation; Vendor D claims better tool use under “realistic agent workflows,” a phrase that usually means someone added a browser, a calculator, and optimism. ...

November 25, 2025 · 16 min · Zelina
Cover image

Pop-Ups, Pitfalls, and Planning: Why GUI Agents Break in the Real World

Pop-up. That tiny word hides a surprisingly large operational problem. A human sees a battery warning, an update prompt, a permission dialog, or a frozen app and does something boringly competent: dismiss it, recover context, re-check the screen, and continue. A GUI agent, meanwhile, may confidently continue a plan that no longer matches reality. The machine has not “failed” in the theatrical sense. It has simply treated a live workflow like a polite screenshot sequence. Very enterprise. Very doomed. ...

November 22, 2025 · 13 min · Zelina
Cover image

Thresholds, Trade-offs, and the Art of Not Overthinking Your Robot

A robot pauses in front of a table. There is a block, a can, a box, and something that is either on top of something else or merely enjoying a close and misleading friendship. A camera sends pixels. A perception model sends predictions. A planner wants a symbolic fact: On(A, B) or not. The expensive mistake is pretending that this last step is clean. ...

November 20, 2025 · 14 min · Zelina
Cover image

Mind the Gap: When Robots Learn Social Norms the Human Way

A hotel robot does not need to understand the human soul. It does, however, need to stop cutting between two guests mid-conversation like an intern late for coffee. That distinction matters. Most enterprise conversations about autonomous agents still treat navigation as a logistics problem: reach the destination, avoid collision, minimise delay. Very tidy. Very spreadsheet. Also incomplete. In public-facing environments, a robot can be technically safe and still socially unpleasant. It can avoid hitting people while still making them step back, tense up, or wonder why the expensive machine has the spatial awareness of a supermarket trolley. ...

November 17, 2025 · 12 min · Zelina
Cover image

Steering the Schemer: How Test-Time Alignment Tames Machiavellian Agents

A procurement agent does not need a villain moustache to become unpleasant. Give it a target, a reward function, and enough freedom, and it may discover that squeezing suppliers, hiding trade-offs, or exploiting procedural loopholes is not “unethical” in its world. It is just efficient. That is the point of the MACHIAVELLI benchmark, and also the reason the paper Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping is worth reading carefully.1 The paper is not selling a new moral soul for AI agents. Thankfully. We have enough vendors selling souls already. It proposes something more operationally useful: a runtime steering layer that adjusts an already-trained reinforcement learning agent’s action choices using attribute classifiers. ...

November 17, 2025 · 15 min · Zelina
Cover image

Scalpels, Agents, and Orchestrators: When Surgery Meets Autonomous Workflows

The surgeon does not need another chatbot Operating rooms already have enough things demanding attention. Monitors, tools, imaging, staff coordination, alarms, procedural checklists, and the small matter of the patient. In robotic surgery, the problem becomes sharper: the surgeon’s hands are occupied and their visual attention is locked into the console. The data may be nearby, but nearby is not the same as usable. ...

November 16, 2025 · 14 min · Zelina
Cover image

Plans, Tokens, and Turing Dreams: Why LLMs Still Can’t Out-Plan a 15-Year-Old Classical Planner

TL;DR for operators A new benchmark does not say that LLMs are hopeless at planning. That would be too easy, and also false. It says something more useful: frontier models are now strong enough to solve many formal planning tasks, but their competence still weakens when the task stops giving them semantically meaningful labels.1 ...

November 13, 2025 · 14 min · Zelina
Cover image

When Heuristics Go Silent: How Random Walks Outsmart Breadth-First Search

A planner stalls. Not because the goal vanished. Not because the system lacks compute. Not even because the heuristic is completely wrong. It stalls because the heuristic has temporarily stopped saying anything useful. Every nearby state looks equally unpromising, or worse, misleadingly unpromising. The algorithm is still running, naturally. It is very busy being lost. ...

November 13, 2025 · 4 min · Zelina
Cover image

When Agents Think in Waves: Diffusion Models for Ad Hoc Teamwork

A warehouse robot does not fail only when it drops the box. Sometimes it fails earlier, in the quieter moment when another robot takes an unexpected route and the first robot keeps behaving as though the original choreography still exists. Nobody crashes. Nothing explodes. The system merely becomes stupid in a very expensive way. ...

November 11, 2025 · 18 min · Zelina