Cover image

When Benchmarks Rot: Why Static ‘Gold Labels’ Are a Clinical Liability

Opening — Why this matters now Clinical AI has entered an uncomfortable phase of maturity. Models are no longer failing loudly; they are failing quietly. They produce fluent answers, pass public benchmarks, and even outperform physicians on narrowly defined tasks — until you look closely at what those benchmarks are actually measuring. The paper at hand dissects one such case: MedCalc-Bench, the de‑facto evaluation standard for automated medical risk-score computation. The uncomfortable conclusion is simple: when benchmarks are treated as static truth, they slowly drift away from clinical reality — and when those same labels are reused as reinforcement-learning rewards, that drift actively teaches models the wrong thing. ...

December 23, 2025 · 4 min · Zelina
Cover image

Darwin, But Make It Neural: When Networks Learn to Mutate Themselves

Opening — Why this matters now Modern AI has become very good at climbing hills—provided the hill stays put and remains differentiable. But as soon as the terrain shifts, gradients stumble. Controllers break. Policies freeze. Re-training becomes ritualistic rather than intelligent. This paper asks a quietly radical question: what if adaptation itself lived inside the network? Not as a scheduler, not as a meta-optimizer bolted on top, but as part of the neural machinery that gets inherited, mutated, and selected. ...

December 21, 2025 · 3 min · Zelina
Cover image

When Rewards Learn to See: Teaching Humanoids What the Ground Looks Like

Opening — Why this matters now Humanoid robots can now run, jump, and occasionally impress investors. What they still struggle with is something more mundane: noticing the stairs before falling down them. For years, reinforcement learning (RL) has delivered impressive locomotion demos—mostly on flat floors. The uncomfortable truth is that many of these robots are, functionally speaking, blind. They walk well only because the ground behaves politely. Once the terrain becomes uneven, discontinuous, or adversarial, performance collapses. ...

December 21, 2025 · 4 min · Zelina
Cover image

Adversaries, Slices, and the Art of Teaching LLMs to Think

Opening — Why this matters now Large language models can already talk their way through Olympiad math, but they still stumble in embarrassingly human ways: a missed parity condition, a silent algebra slip, or a confident leap over an unproven claim. The industry’s usual fix—reward the final answer and hope the reasoning improves—has reached diminishing returns. Accuracy nudges upward, but reliability remains brittle. ...

December 19, 2025 · 4 min · Zelina
Cover image

Stepwise Think-Critique: Teaching LLMs to Doubt Themselves (Productively)

Opening — Why this matters now Large Language Models have learned how to think out loud. What they still struggle with is knowing when that thinking is wrong — while it is happening. In high‑stakes domains like mathematics, finance, or policy automation, delayed error detection is not a feature; it is a liability. Most modern reasoning pipelines still follow an awkward split: first generate reasoning, then verify it — often with a separate model. Humans do not work this way. We reason and judge simultaneously. This paper asks a simple but uncomfortable question: what if LLMs were trained to do the same? ...

December 18, 2025 · 4 min · Zelina
Cover image

Picking Less to Know More: When RAG Stops Ranking and Starts Thinking

Opening — Why this matters now Retrieval-Augmented Generation has a dirty secret: it keeps retrieving more context while quietly getting no smarter. As context windows balloon to 100K tokens and beyond, RAG systems dutifully shovel in passages—Top‑5, Top‑10, Top‑100—hoping recall will eventually rescue accuracy. It doesn’t. Accuracy plateaus. Costs rise. Attention diffuses. The model gets lost in its own evidence pile. ...

December 17, 2025 · 4 min · Zelina
Cover image

When Rewards Learn Back: Evolution, but With Gradients

Opening — Why this matters now Reinforcement learning has always had an uncomfortable secret: most of the intelligence is smuggled in through the reward function. We talk about agents learning from experience, but in practice, someone—usually a tired engineer—decides what “good behavior” numerically means. As tasks grow longer-horizon, more compositional, and more brittle to specification errors, this arrangement stops scaling. ...

December 16, 2025 · 4 min · Zelina
Cover image

When Tokens Become Actions: A Policy Gradient Built for Transformers

Opening — Why this matters now Reinforcement learning has always assumed that actions are atomic. Large language models politely disagree. In modern LLM training, an “action” is rarely a single move. It is a sequence of tokens, often structured, sometimes tool‑augmented, occasionally self‑reflective. Yet most policy‑gradient methods still pretend that Transformers behave like generic RL agents. The result is a growing mismatch between theory and practice—especially visible in agentic reasoning, tool use, and long‑horizon tasks. ...

December 14, 2025 · 4 min · Zelina
Cover image

RL Grows a Third Dimension: Why Text-to-3D Finally Needs Reasoning

Opening — Why this matters now Text-to-3D generation has quietly hit a ceiling. Diffusion-based pipelines are expensive, autoregressive models are brittle, and despite impressive demos, most systems collapse the moment a prompt requires reasoning rather than recall. Meanwhile, reinforcement learning (RL) has already reshaped language models and is actively restructuring 2D image generation. The obvious question—long avoided—was whether RL could do the same for 3D. ...

December 13, 2025 · 4 min · Zelina
Cover image

Agents Without Time: When Reinforcement Learning Meets Higher-Order Causality

Opening — Why this matters now Reinforcement learning has spent the last decade obsessing over better policies, better value functions, and better credit assignment. Physics, meanwhile, has been busy questioning whether time itself needs to behave nicely. This paper sits uncomfortably—and productively—between the two. At a moment when agentic AI systems are being deployed in distributed, partially observable, and poorly synchronized environments, the assumption of a fixed causal order is starting to look less like a law of nature and more like a convenience. Wilson’s work asks a precise and unsettling question: what if decision-making agents and causal structure are the same mathematical object viewed from different sides? ...

December 12, 2025 · 3 min · Zelina