Cover image

Learning the Rules by Breaking Them: Exception-Aware Constraint Mining for Care Scheduling

Historical schedules contain both operating rules and emergency compromises; this paper shows how to extract the former without institutionalizing the latter.

January 1, 2026 · 15 min · Zelina
Cover image

Let It Flow: ROME and the Economics of Agentic Craft

ROME shows that competitive agent performance depends less on possessing the largest model than on operating a disciplined learning loop around execution, verification, training, and control.

January 1, 2026 · 19 min · Zelina
Cover image

When Maps Start Thinking: Teaching Agents to Plan in Time and Space

STAgent shows how a stable tool sandbox, aggressive log curation, and model-relative training can turn operational data into a specialized planning agent.

January 1, 2026 · 16 min · Zelina
Cover image

When Your House Talks Back: Teaching Buildings to Think About Energy

A smart-building benchmark shows why LLM agents are already useful for grounded device operations—and why financial reasoning still belongs behind deterministic controls.

January 1, 2026 · 15 min · Zelina
Cover image

Browsing Without the Bloat: Teaching Agents to Think Before They Scroll

NestBrowse shows that better browser agents may depend less on larger models or longer contexts than on controlling which information reaches the reasoning loop.

December 31, 2025 · 17 min · Zelina
Cover image

Many Arms, Fewer Bugs: Why Coding Agents Need to Stop Working Alone

BOAD shows that coding-agent performance depends less on assembling more agents than on discovering a small team, assigning individual credit, and controlling what each agent needs to remember.

December 31, 2025 · 19 min · Zelina
Cover image

RxnBench: Reading Chemistry Like a Human (Turns Out That’s Hard)

RxnBench reveals why multimodal models that excel on isolated reaction schemes still struggle to read complete chemistry papers reliably.

December 31, 2025 · 15 min · Zelina
Cover image

The Invariance Trap: Why Matching Distributions Can Break Your Model

Why symmetric domain alignment can erase useful information—and how directional simulation offers a safer objective for transfer learning.

December 31, 2025 · 16 min · Zelina
Cover image

When Models Forget on Purpose: Why Data Selection Matters More Than Data Volume

A business-focused reading of dynamic data weighting in LLM training, and why selective forgetting may matter more than simply feeding models more tokens.

December 31, 2025 · 17 min · Zelina
Cover image

When the Paper Talks Back: Lost in Translation, Rejected by Design

A multilingual prompt-injection experiment shows why documents must be treated as active attack surfaces—and why apparent resistance in one language may still conceal unstable decisions.

December 31, 2025 · 13 min · Zelina