Cover image

When Graphs Stop Guessing: Teaching Models to Rewrite Their Own Meaning

GES shows how graph models can improve not by becoming larger, but by using LLMs to rewrite node descriptions around task-relevant structural evidence.

December 26, 2025 · 16 min · Zelina
Cover image

When Guardrails Learn from the Shadows

A semi-supervised safety-classification paper shows why unlabeled AI interaction data becomes useful only when the training process preserves harmful intent, not just surface wording.

December 26, 2025 · 16 min · Zelina
Cover image

When Models Learn to Forget: Why Memorization Isn’t the Same as Intelligence

A practical reading of MemSinks and what it teaches AI builders about memorization, generalization, and why forgetting must be designed before deployment.

December 26, 2025 · 15 min · Zelina
Cover image

When Policies Read Each Other: Teaching Agents to Cooperate by Reading the Code

A mechanism-first reading of how programmatic policies let LLM agents condition on each other’s source code, and why the business value is inspectable coordination rather than magic cooperation.

December 26, 2025 · 19 min · Zelina
Cover image

When the Answer Matters More Than the Thinking

A mechanism-first reading of SFTKey-Tag, a two-stage fine-tuning method that separates answer correctness from reasoning-format training.

December 26, 2025 · 2 min · Zelina
Cover image

FinAgent: When AI Starts Shopping for Your Groceries (and Your Health)

FinAgent shows how agentic AI can turn grocery planning into a price-aware loop across household budgets, nutrition targets, health constraints, and food substitutions.

December 25, 2025 · 14 min · Zelina
Cover image

Personas, Panels, and the Illusion of Free A/B Tests

A practical reading of when LLM persona panels can replace field experiments for method benchmarking—and when they merely create cheaper noise.

December 25, 2025 · 16 min · Zelina
Cover image

Reading the Room? Apparently Not: When LLMs Miss Intent

A case-first reading of a paper showing why LLM safety fails when models respond to surface wording while missing the user's likely intent.

December 25, 2025 · 16 min · Zelina
Cover image

RoboSafe: When Robots Need a Conscience (That Actually Runs)

A mechanism-first reading of RoboSafe, a runtime safety guardrail that turns embodied-agent safety from vague refusals into executable checks over context and time.

December 25, 2025 · 18 min · Zelina
Cover image

Traffic, but Make It Agentic: When Simulators Learn to Think

A mechanism-first reading of TrafficSimAgent, showing why agentic traffic simulation is less about chatting with SUMO and more about turning simulation workflows into controllable, memory-aware optimization systems.

December 25, 2025 · 18 min · Zelina