Cognaptus Insights

Learning the Rules by Breaking Them: Exception-Aware Constraint Mining for Care Scheduling

Historical schedules contain both operating rules and emergency compromises; this paper shows how to extract the former without institutionalizing the latter.

Let It Flow: ROME and the Economics of Agentic Craft

ROME shows that competitive agent performance depends less on possessing the largest model than on operating a disciplined learning loop around execution, verification, training, and control.

When Maps Start Thinking: Teaching Agents to Plan in Time and Space

STAgent shows how a stable tool sandbox, aggressive log curation, and model-relative training can turn operational data into a specialized planning agent.

When Your House Talks Back: Teaching Buildings to Think About Energy

A smart-building benchmark shows why LLM agents are already useful for grounded device operations—and why financial reasoning still belongs behind deterministic controls.

Browsing Without the Bloat: Teaching Agents to Think Before They Scroll

NestBrowse shows that better browser agents may depend less on larger models or longer contexts than on controlling which information reaches the reasoning loop.

Many Arms, Fewer Bugs: Why Coding Agents Need to Stop Working Alone

BOAD shows that coding-agent performance depends less on assembling more agents than on discovering a small team, assigning individual credit, and controlling what each agent needs to remember.

RxnBench: Reading Chemistry Like a Human (Turns Out That’s Hard)

RxnBench reveals why multimodal models that excel on isolated reaction schemes still struggle to read complete chemistry papers reliably.

The Invariance Trap: Why Matching Distributions Can Break Your Model

Why symmetric domain alignment can erase useful information—and how directional simulation offers a safer objective for transfer learning.

When Models Forget on Purpose: Why Data Selection Matters More Than Data Volume

A business-focused reading of dynamic data weighting in LLM training, and why selective forgetting may matter more than simply feeding models more tokens.

When the Paper Talks Back: Lost in Translation, Rejected by Design

A multilingual prompt-injection experiment shows why documents must be treated as active attack surfaces—and why apparent resistance in one language may still conceal unstable decisions.