Cover image

When AI Becomes the Reviewer: Pairwise Judgment at Scale

A committee has one expensive problem before it has any philosophical problem: too many proposals, too little time, and no clean way to know whether Proposal 17 was actually better than Proposal 42. So the usual system does what institutions often do when the task is too large to compare directly. It fragments the work. A few reviewers score a few proposals. Their scores are averaged. A ranked list appears. Everyone pretends the number is more stable than the process that produced it. ...

December 12, 2025 · 16 min · Zelina
Cover image

Counterfactuals Unchained: How Causality Escapes Its Own Models

A loan is rejected. Now explain why. A borrower is rejected by an automated lending system. The compliance team asks a simple question: What caused the rejection? A naïve answer points to a variable: low income, high debt ratio, thin credit history, missing documentation, or some equally respectable-looking field in the model. A better answer asks what would have happened if that variable had changed. A still better answer asks which surrounding facts must be held fixed while we imagine that change. ...

November 28, 2025 · 16 min · Zelina
Cover image

The Memory Advantage: When AI Agents Learn from the Past

TL;DR for operators Memory is usually sold as a comfort feature for AI agents: the assistant remembers your preferences, your workflow, your charming habit of naming files final_final_v7. Fine. But operationally, memory matters less as storage and more as control. The hard question is not whether an agent can remember. It is whether the agent knows when a remembered episode should override fresh exploration. ...

June 3, 2025 · 17 min · Zelina
Cover image

Flashcards for Giants: How RAL Lets Large Models Learn Without Fine-Tuning

TL;DR for operators Training a model is not the only way to make it behave less cluelessly in a specialised environment. The paper behind Retrieval Augmented Learning, or RAL, proposes a cheaper route: let the agent try strategies, validate what happened, and store the resulting lessons as retrievable experience rather than changing the model’s weights.1 ...

May 6, 2025 · 16 min · Zelina