Cover image

Stop or Strip? Teaching Disassembly When to Quit

A battery pack arrives at an end-of-life processing facility. The easy story says the operator should recover as much value as possible while doing the sustainable thing. The harder story starts five minutes later, when someone has to decide whether to stop, reuse the pack, remove the cover, strip the thermal shield, extract a module, test it, recycle it, or finally admit defeat and dispose of what remains. ...

December 20, 2025 · 15 min · Zelina
Cover image

Adversaries, Slices, and the Art of Teaching LLMs to Think

A math tutor does not wait until the end of a two-page solution, circle the final answer, and say “wrong.” At least, not a good one. The useful tutor interrupts earlier. This line follows. That parity condition does not. This factorization is legal, but the conclusion you drew from it is not. The feedback is local, not theatrical. It tells the student where the reasoning began to rot, before the final answer becomes merely the visible corpse. ...

December 19, 2025 · 22 min · Zelina
Cover image

Stepwise Think-Critique: Teaching LLMs to Doubt Themselves (Productively)

The useful part of doubt is timing Doubt is not useful after the invoice is paid, the client report is sent, or the model has already produced a confident wrong answer with twelve decorative paragraphs of reasoning. At that point, “let us verify” becomes less like quality control and more like archaeology. ...

December 18, 2025 · 16 min · Zelina
Cover image

Picking Less to Know More: When RAG Stops Ranking and Starts Thinking

Search is not judgment Search is easy to admire because it produces something visible. A ranked list. A bigger context window. A satisfying pile of passages that says, “Look, we retrieved evidence.” Very comforting. Also not the same as knowing what evidence is actually needed. That distinction is the core of Context-Picker: Dynamic Context Selection Using Multi-stage Reinforcement Learning.1 The paper studies a familiar RAG problem: if a system retrieves too little, it misses the answer; if it retrieves too much, it drags in distractors, repeats, weakly related fragments, and the usual long-context swamp where useful evidence politely disappears in the middle. ...

December 17, 2025 · 14 min · Zelina
Cover image

When Reasoning Needs Receipts: Graphs Over Guesswork in Medical AI

Diagnosis is not a magic word. In medicine, the answer matters, but the path to the answer matters almost as much. A model that says the correct disease name after skipping the decisive evidence is not “reasoning efficiently.” It is guessing with bedside manner. That is the problem addressed by MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph.1 The paper’s core claim is not simply that a medical LLM can score higher on benchmarks. That would be useful, but not especially surprising. The more interesting move is architectural: the authors try to make clinical reasoning trainable by turning it into a graph of required evidence, then rewarding the model for following that graph. ...

December 16, 2025 · 15 min · Zelina
Cover image

When Rewards Learn Back: Evolution, but With Gradients

Rewards are where many agent projects go to become expensive folklore. A team wants an AI agent to complete long workflows: search, reason, call tools, check constraints, recover from mistakes, and produce a useful answer. The model can talk. The tools work. The benchmark demo is acceptable. Then reinforcement learning enters the room, and someone has to decide what “good” means at every step. ...

December 16, 2025 · 17 min · Zelina
Cover image

When Tokens Become Actions: A Policy Gradient Built for Transformers

Tool calls are not tokens. Neither are paragraphs, reasoning blocks, spreadsheet edits, web searches, code executions, or the awkward little detours an agent takes before finally answering the user. Yet much of reinforcement learning for language models still behaves as if it must choose between two unsatisfying extremes. At one end, every token is treated as a tiny action. At the other, the whole answer is treated as one indivisible action. The first view is mathematically tidy and operationally noisy. The second is practical for verifiable tasks, but it compresses an entire reasoning process into one final score, which is a bit like reviewing an employee only by checking whether the office building is still standing. ...

December 14, 2025 · 14 min · Zelina
Cover image

RL Grows a Third Dimension: Why Text-to-3D Finally Needs Reasoning

A chair is not a picture of a chair. That sounds obvious until a text-to-3D system forgets the backrest from one angle, gives the chair three legs from another, paints the seat correctly, and somehow convinces a weak evaluator that the job is mostly done. In 2D generation, a model can often survive by producing a plausible view. In 3D generation, every view is a witness. Geometry, texture, object parts, and spatial relationships all have to agree. Annoying, yes. Also the entire point. ...

December 13, 2025 · 16 min · Zelina
Cover image

Agents Without Time: When Reinforcement Learning Meets Higher-Order Causality

Handoffs Are Where Fixed Time Sneaks Into Agent Design Handoffs look harmless. One agent collects evidence, another checks it, a third decides, and a fourth sends the answer to a customer, robot, trader, or dashboard. The workflow diagram has arrows. The arrows have a direction. Someone decided which component acts first. Usually that decision is treated as engineering housekeeping. In Matt Wilson’s paper, it becomes the point of the story.1 ...

December 12, 2025 · 14 min · Zelina
Cover image

Fault, Interrupted: How RIFT Reinvents Reliability for the LLM Hardware Era

A chip does not need to fail everywhere to fail badly A modern AI accelerator is not fragile in the poetic sense. It is not a porcelain teacup trembling on the edge of a desk. It is much more annoying than that. It can run billions of parameters at high throughput, survive ordinary engineering noise, and still contain a few small fault locations where one carefully placed disturbance can turn a capable model into expensive decorative silicon. The problem is not that every bit matters equally. The problem is that a few bits may matter absurdly more than the rest. ...

December 11, 2025 · 17 min · Zelina