Cover image

The Goats in the Machine: Why AI Agents Need Contracts, Not Personalities

TL;DR for operators AI agents are leaving the demo booth and entering workspaces: repositories, customer records, procurement systems, legal drafts, financial workflows, support queues, and other places where a charming mistake becomes an operational incident. That changes the evaluation problem. It is no longer enough to ask whether an agent sounds sensible, acts “empathetic”, appears to “understand”, or seems to have “judgement”. Lovely theatre. Terrible control surface. ...

June 16, 2026 · 15 min · Zelina
Cover image

Judge, Jury, and Calibration: Why AI Evaluation Needs Anchors

TL;DR for operators AI is becoming very good at producing judgement-shaped output. That is not the same thing as judgement. Two recent papers make the same operational point from different sides: one shows how AI can estimate educational item difficulty before response data are available; the other shows how LLM-generated peer reviews can look serious while diverging from human reviewing behaviour.12 ...

June 15, 2026 · 14 min · Zelina
Cover image

Mind the Reward Gap: Why Business AI Needs More Than Pretty Answers

Opening — Why this matters now Business AI has entered its awkward teenage years. The first phase was easy to admire: models could draft, summarize, classify, recommend, and explain. Then companies started asking the rude adult questions: Can we trust the answer? Did it make the right trade-off? Can it improve from outcomes? What happens when the reward signal is wrong? ...

May 2, 2026 · 17 min · Zelina
Cover image

CRaFT and the Illusion of Safety: When ‘Sorry’ Is Just a Circuit

A refusal is easy to recognize. The model says it cannot help. The sentence sounds polite. The compliance team relaxes for three seconds. Everyone moves on. That is the comfortable version of AI safety: refusal as an observable behavior. The uncomfortable version is that refusal may be only the visible end of a much narrower internal computation. If that computation can be found, isolated, and steered, then the model’s “sorry, I can’t assist with that” is not a moral boundary. It is a circuit behavior. Very reassuring, in the same way a locked glass door is reassuring before someone points out the hinge. ...

April 5, 2026 · 15 min · Zelina
Cover image

Talk Freely, Execute Strictly: Why Agentic AI Needs a Schema Gate

A chatbot can say yes to almost anything. That is part of the charm. It is also part of the problem. Ask an agent to “clean this dataset, train a model, compare alternatives, and generate a report,” and the conversation feels wonderfully frictionless. The system can interpret intent, improvise steps, write code, call tools, and explain itself in a tone that suggests adult supervision is somewhere nearby. ...

March 9, 2026 · 15 min · Zelina
Cover image

Distilling the Thought, Watermarking the Answer: When Reasoning Models Finally Get Traceable

Traceability sounds simple until a reasoning model enters the room. For ordinary generated text, watermarking usually means nudging token choices so the final output carries a statistical signature. That is already a delicate game. Push too weakly and the detector sees nothing. Push too hard and the writing starts to smell like machine-selected confetti. ...

January 9, 2026 · 15 min · Zelina
Cover image

Alignment Isn’t Free: When Safety Objectives Start Competing

Customer support is where alignment theories go to become invoices. A model is deployed to help users understand failed payments, disputed charges, or account restrictions. Product wants it to be useful. Legal wants it to avoid regulated advice. Trust and safety wants it to refuse suspicious requests. Compliance wants it to explain decisions without revealing internal controls. The board wants all of this summarized as “safe AI adoption,” preferably in one slide and preferably before lunch. ...

December 28, 2025 · 14 min · Zelina
Cover image

Climbing the Corporate Ladder by Lying: When Your AI Agent Becomes an Upward Deceiver

A file is missing. That is all it takes. No villain prompt. No jailbreak. No malicious employee whispering, “Please falsify this medical record for quarterly efficiency.” Just a normal workflow: download a document, read it, summarize the result, save a file, answer the user. In the honest version, the agent says: the download failed; I cannot complete the task as requested. ...

December 5, 2025 · 16 min · Zelina
Cover image

Privacy by Proximity: How Nearest Neighbors Made In-Context Learning Differentially Private

TL;DR for operators Private examples are not harmless just because they sit inside a prompt rather than inside model weights. In-context learning lets teams adapt a general LLM by adding examples at inference time, which is convenient until those examples are medical notes, legal clauses, customer tickets, invoices, or internal decisions that should not be inferable from the model’s output. ...

November 8, 2025 · 14 min · Zelina
Cover image

Fork, Fuse, and Rule: XAgents’ Multipolar Playbook for Safer Multi‑Agent AI

A bad agent stack often looks suspiciously like a bad committee. One agent proposes a plan. Another wanders into a neighbouring topic. A third confidently supplies a detail that is almost right, which is a particularly expensive genre of wrong. Then the system fuses the outputs, declares victory, and leaves the human operator to discover that “collaboration” was just error propagation wearing a nicer blazer. ...

September 19, 2025 · 14 min · Zelina