Cover image

Browsing Without the Bloat: Teaching Agents to Think Before They Scroll

Opening — Why this matters now Large Language Models have learned to think. Then we asked them to act. Now we want them to browse — and suddenly everything breaks. Deep research agents are running head‑first into a practical wall: the modern web is not made of tidy pages and polite APIs. It is dynamic, stateful, bloated, and aggressively redundant. Give an agent a real browser and it drowns in tokens. Don’t give it one, and it misses the most valuable information entirely. ...

December 31, 2025 · 4 min · Zelina
Cover image

When Models Look Back: Memory, Leakage, and the Quiet Failure Modes of LLM Training

Opening — Why this matters now Large language models are getting better at many things—reasoning, coding, multi‑modal perception. But one capability remains quietly uncomfortable: remembering things they were never meant to remember. The paper underlying this article dissects memorization not as a moral failure or an anecdotal embarrassment, but as a structural property of modern LLM training. The uncomfortable conclusion is simple: memorization is not an edge case. It is a predictable outcome of how we scale data, objectives, and optimization. ...

December 30, 2025 · 3 min · Zelina
Cover image

When Tokens Become Actions: A Policy Gradient Built for Transformers

Opening — Why this matters now Reinforcement learning has always assumed that actions are atomic. Large language models politely disagree. In modern LLM training, an “action” is rarely a single move. It is a sequence of tokens, often structured, sometimes tool‑augmented, occasionally self‑reflective. Yet most policy‑gradient methods still pretend that Transformers behave like generic RL agents. The result is a growing mismatch between theory and practice—especially visible in agentic reasoning, tool use, and long‑horizon tasks. ...

December 14, 2025 · 4 min · Zelina
Cover image

Memory, Bias, and the Mind of Machines: How Agentic LLMs Mislearn

Opening — Why this matters now AI models are no longer passive text engines. They remember, reason, and improvise — sometimes poorly. As large language models (LLMs) gain memory and autonomy, we face a paradox: they become more useful because they act more like humans, and more dangerous for the same reason. This tension lies at the heart of a new paper, “When Memory Leads Us Astray: A Study of Bias and Mislearning in Agentic LLMs” (arXiv:2511.08585). ...

November 12, 2025 · 3 min · Zelina
Cover image

Thinking in Circles: How Self-Questioning LLMs Learn Without Labels

What if an LLM could learn not by reading more, but by thinking harder? That’s the radical premise behind Self-Questioning Language Models (SQLM), a framework that transforms large language models from passive learners into active generators of their own training data. No curated datasets. No labeled answers. Just a prompt — and a model that gets smarter by challenging itself. From Self-Play in Robotics to Reasoning in Language The inspiration for SQLM comes from asymmetric self-play, a technique used in robotics where one agent proposes tasks and another learns to solve them. Here, that paradigm is adapted to LLMs: ...

August 6, 2025 · 3 min · Zelina
Cover image

Fine-Tuning Isn’t Just Supervised: Why SFT Is Really RL in Disguise

In the arms race to align large language models (LLMs), supervised fine-tuning (SFT) and reinforcement learning (RL) are often painted as competing paradigms. SFT is praised for its stability and simplicity; RL is heralded for its theoretical soundness and alignment fidelity. But what if this dichotomy is an illusion? A recent preprint from Chongli Qin and Jost Tobias Springenberg makes a bold and elegant claim: SFT on curated data is not merely supervised learning—it is actually optimizing a lower bound on the RL objective. ...

July 18, 2025 · 4 min · Zelina
Cover image

Train of Thought: How Long-Haul RL Unlocks LLM Reasoning Diversity

In the race to make Large Language Models (LLMs) reason like humans—or better—most researchers obsess over one thing: prompting. Chain-of-thoughts, few-shot demos, scratchpads, tools. But a new study from NVIDIA suggests something even more fundamental: it’s not just how you prompt them—it’s how long you train them. Their paper, Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training, explores how stretching reinforcement learning (RL) over time unlocks broader, more stable, and more versatile reasoning in LLMs. This isn’t just about incremental gains—it’s about escaping reasoning ruts. ...

July 18, 2025 · 3 min · Zelina
Cover image

The Sink That Remembers: Solving LLM Memorization Without Forgetting Everything Else

When large language models (LLMs) memorize repeated content during training—be it a phone number, a copyrighted paragraph, or a user’s personal story—the implications go beyond benign repetition. They touch the very core of AI safety, privacy, and trust. And yet, removing this memorized content after training has proven to be a devil’s bargain: anything you subtract tends to weaken the model’s overall capabilities. In their recent ICML 2025 paper, Ghosal et al. propose an elegant reframing of this problem. Rather than performing painful post-hoc surgery on a trained model, they suggest we prepare the model from the outset to isolate memorization into removable compartments—which they call Memorization Sinks (MemSinks). ...

July 15, 2025 · 4 min · Zelina