Cover image

Safety First, Reward Second — But Not Last

Opening — Why this matters now Reinforcement learning has spent the last decade mastering games, simulations, and neatly bounded optimization problems. Reality, inconveniently, is none of those things. In robotics, autonomous vehicles, industrial automation, and any domain where mistakes have real-world consequences, almost safe is simply unsafe. Yet most “safe RL” methods quietly rely on a compromise: allow some violations, average them out, and hope the system behaves. This paper refuses that bargain. It treats safety as a hard constraint, not a tunable preference—and then asks an uncomfortable question: can we still learn anything useful? ...

January 4, 2026 · 4 min · Zelina
Cover image

Deployed, Retrained, Repeated: When LLMs Learn From Being Used

Opening — Why this matters now The AI industry likes to pretend that training happens in neat, well-funded labs and deployment is merely the victory lap. Reality, as usual, is less tidy. Large language models are increasingly learning after release—absorbing their own successful outputs through user curation, web sharing, and subsequent fine‑tuning. This paper puts a sharp analytical frame around that uncomfortable truth: deployment itself is becoming a training regime. ...

January 1, 2026 · 4 min · Zelina
Cover image

Let It Flow: ROME and the Economics of Agentic Craft

Opening — Why this matters now 2025 quietly settled an uncomfortable truth in AI: agents are not products, they are supply chains. Anyone can demo a tool-using model. Very few can make it survive contact with real environments, long-horizon tasks, and users who refuse to behave like benchmarks. The paper “Let It Flow: Agentic Crafting on Rock and Roll” arrives at exactly this inflection point. Instead of promising yet another agent, it asks a more grown-up question: what kind of ecosystem is required to reliably produce agents at scale? ...

January 1, 2026 · 3 min · Zelina
Cover image

When Maps Start Thinking: Teaching Agents to Plan in Time and Space

Opening — Why this matters now AI can already write poetry, debug code, and argue philosophy. Yet ask most large language models to plan a realistic trip—respecting time, geography, traffic, weather, and human constraints—and they quietly fall apart. Real-world planning is messy, asynchronous, and unforgiving. Unlike math problems, you cannot hallucinate a charging station that does not exist. ...

January 1, 2026 · 3 min · Zelina
Cover image

Replay the Losses, Win the Game: When Failed Instructions Become Your Best Training Data

Opening — Why this matters now Reinforcement learning for large language models has a dirty secret: most of the time, nothing happens. When tasks demand perfect instruction adherence—formatting, style, length, logical constraints—the model either nails everything or gets a zero. Binary rewards feel principled, but in practice they starve learning. Aggregated rewards try to help, but they blur causality: different mistakes, same score, same gradient. The result is slow, noisy, and often misdirected optimization. ...

December 30, 2025 · 4 min · Zelina
Cover image

When Actions Need Nuance: Learning to Act Precisely Only When It Matters

Opening — Why this matters now Reinforcement learning has become impressively competent at two extremes: discrete games with neat action menus, and continuous control tasks where everything is a vector. Reality, inconveniently, lives in between. Most real systems demand choices and calibration—turn left and decide how much, brake and decide how hard. These are parameterized actions, and they quietly break many of today’s best RL algorithms. ...

December 28, 2025 · 4 min · Zelina
Cover image

When Safety Stops Being a Turn-Based Game

Opening — Why this matters now LLM safety has quietly become an arms race with terrible reflexes. We discover a jailbreak. We patch it. A new jailbreak appears, usually crafted by another LLM that learned from the last patch. The cycle repeats, with each round producing models that are slightly safer and noticeably more brittle. Utility leaks away, refusal rates climb, and nobody is convinced the system would survive a genuinely adaptive adversary. ...

December 28, 2025 · 4 min · Zelina
Cover image

When Policies Read Each Other: Teaching Agents to Cooperate by Reading the Code

Opening — Why this matters now Multi-agent systems are finally leaving the toy world. Autonomous traders negotiate with other bots. Supply-chain agents coordinate across firms. AI copilots increasingly share environments with other AI copilots. And yet, most multi-agent reinforcement learning (MARL) systems are still stuck with a primitive handicap: agents cannot meaningfully understand what other agents are doing. ...

December 26, 2025 · 4 min · Zelina
Cover image

When One Clip Isn’t Enough: Teaching LLMs to Watch Long Videos Like Adults

Opening — Why this matters now Large language models have learned to see. Unfortunately, they still have the attention span of a distracted intern when the video runs longer than a minute. As multimodal LLMs expand their context windows and promise “end-to-end” video understanding, a hard reality remains: long videos are not just longer inputs—they are fundamentally different reasoning problems. Information is sparse, temporally distant, multimodal, and often only meaningful when grounded precisely in time and space. Compress everything up front, and you lose the evidence. Don’t compress, and you blow the context budget. ...

December 24, 2025 · 4 min · Zelina
Cover image

Policy Gradients Grow Up: Teaching RL to Think in Domains

Opening — Why this matters now Reinforcement learning keeps winning benchmarks, but keeps losing the same argument: it doesn’t generalize. Train it here, deploy it there, and watch confidence evaporate. Meanwhile, classical planning—decidedly uncool but stubbornly correct—has been quietly producing policies that provably work across arbitrarily large problem instances. This paper asks the uncomfortable question the RL community often dodges: can modern policy-gradient methods actually learn general policies, not just big ones? ...

December 23, 2025 · 4 min · Zelina