Cover image

Split Before You Scale: Why Useful AI Starts by Sorting the Mess

TL;DR for operators AI systems fail less dramatically when they stop treating every messy signal as the same kind of mess. The three papers in this cluster look unrelated at first: one generates graphs, one studies exploration in restless bandits, and one improves reinforcement-learning generalisation from formal task specifications. Under the surface, they make a shared operational point: before scaling an AI system, separate the structure that must be preserved, the uncertainty that should guide action, and the supervision signal stable enough to train on. ...

June 15, 2026 · 16 min · Zelina
Cover image

Share the Trunk, Spare the Averaging: Federated Actor-Critic Gets Personal

A fleet looks unified on a dashboard. It is rarely unified in the world. The warehouse robots share a navigation objective, but one floor has glossy tiles, another has uneven concrete, and a third has humans who treat marked lanes as casual decoration. The delivery drones may use the same controller family, but wind, payload, battery ageing, and local regulation quietly rewrite the operating problem. Industrial arms may repeat the same task, until a supplier swaps a component and the “same” movement is no longer quite the same. ...

June 14, 2026 · 14 min · Zelina
Cover image

Memory Foam: When AI Stops Storing Everything and Starts Learning From It

Enterprise AI has developed a small obsession with memory. The promise is tidy: give the model more context, attach a vector database, retrieve relevant fragments, and suddenly the system becomes a persistent assistant rather than a forgetful autocomplete machine wearing a blazer. The problem is that storage is not memory. Retrieval is not understanding. And a larger context window is not the same thing as knowing what matters. ...

June 13, 2026 · 17 min · Zelina
Cover image

Rewarding Behavior: Why Enterprise AI Needs More Than Bigger Models

Enterprise AI teams have developed a familiar reflex. When the model behaves unreliably, they try a better prompt. When that fails, they try a larger model. When that becomes expensive, they invent a workflow diagram with many arrows and call it an operating model. Very dignified. Very scalable, in the same way that adding more sticky notes to a broken process is scalable. ...

June 10, 2026 · 17 min · Zelina
Cover image

The Policy Has to Work Somewhere: RL for Scale, Trust, and Other Inconveniences

Deployment is where elegant AI systems go to meet bandwidth caps, slow devices, noisy user preferences, and privacy policies written by committees with very strong coffee. That is the useful lens for reading Guangchen Lan’s dissertation, Reinforcement Learning for Scalable and Trustworthy Intelligent Systems.1 It is tempting to describe the work as a collection of four reinforcement-learning methods: one for synchronous federated RL, one for asynchronous federated RL, one for preference optimization, and one for contextual privacy. Technically, that is true. Editorially, it is the least interesting way to read it. ...

June 8, 2026 · 21 min · Zelina
Cover image

Think Meter, Not Think Bigger: The New Control Layer for AI Reasoning

Most companies do not actually want an AI system that “thinks longer.” They want one that knows when extra thinking is worth the bill. That distinction is becoming more important. Reasoning models are moving from demo-stage math puzzles into document review, financial research, compliance analysis, customer support escalation, and agentic workflows. In these settings, reasoning has three costs: latency, compute, and misplaced confidence. A model that spends 30 seconds producing an elegant wrong answer has not reasoned. It has performed expensive theatre. Very fluent theatre, admittedly. ...

June 2, 2026 · 14 min · Zelina
Cover image

High Entropy, Low Drama: The Internal Fingerprint of LLM Reasoning

Scores are comforting. They fit neatly into leaderboards, procurement decks, and internal model-comparison spreadsheets. One model gets 71.5, another gets 72.9, and someone in the meeting says, “So the second one reasons better.” Maybe. Or maybe the model merely passed a particular checkpoint more often. That is useful, but it is not the same as knowing whether the model has learned a controllable reasoning process. A thermometer tells you the patient is hot; it does not explain the infection. Benchmarks are the thermometer. The paper Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models tries to look for something closer to the infection mechanism — or, less dramatically, the internal process signature behind “slow thinking” in large reasoning models.1 ...

June 1, 2026 · 15 min · Zelina
Cover image

High Entropy, Low Drama: The Internal Fingerprint of LLM Reasoning

Debugging a reasoning model usually starts at the wrong end. A model gives a wrong mathematical answer, so we inspect the final output. Then we inspect the chain-of-thought. Then we compare benchmark scores, sample more answers, compute pass rates, and hope the model’s visible reasoning trace tells us what happened inside. This is convenient. It is also a little like diagnosing a factory by reading only the shipping label. ...

May 31, 2026 · 15 min · Zelina
Cover image

Experience Is Not Memory: Why Learning Agents Need a Better Feedback Loop

A support ticket goes wrong. A workflow agent chooses the wrong tool. A finance assistant misses a procedural step. The usual response is familiar: add the failure to memory, rewrite a prompt, perhaps ask the agent to “reflect” before trying again. This is useful, in the same way that putting a sticky note on a broken machine is useful. It may prevent the same mistake next time. It does not prove the machine has learned how to improve. ...

May 29, 2026 · 18 min · Zelina
Cover image

The Confidence Trick: When Long AI Reasoning Arrives Too Early

A model gives you a long answer. It lists assumptions. It walks through steps. It sounds patient, organized, and slightly overqualified for the task. In a business setting, that style is comforting. A compliance analyst sees a neat explanation. A finance team sees a transparent calculation. A product manager sees “reasoning.” Everyone relaxes a little. ...

May 29, 2026 · 19 min · Zelina