Learning the Rules by Breaking Them: Exception-Aware Constraint Mining for Care Scheduling
Historical schedules contain both operating rules and emergency compromises; this paper shows how to extract the former without institutionalizing the latter.
Historical schedules contain both operating rules and emergency compromises; this paper shows how to extract the former without institutionalizing the latter.
ROME shows that competitive agent performance depends less on possessing the largest model than on operating a disciplined learning loop around execution, verification, training, and control.
STAgent shows how a stable tool sandbox, aggressive log curation, and model-relative training can turn operational data into a specialized planning agent.
A smart-building benchmark shows why LLM agents are already useful for grounded device operations—and why financial reasoning still belongs behind deterministic controls.
NestBrowse shows that better browser agents may depend less on larger models or longer contexts than on controlling which information reaches the reasoning loop.
BOAD shows that coding-agent performance depends less on assembling more agents than on discovering a small team, assigning individual credit, and controlling what each agent needs to remember.
RxnBench reveals why multimodal models that excel on isolated reaction schemes still struggle to read complete chemistry papers reliably.
Why symmetric domain alignment can erase useful information—and how directional simulation offers a safer objective for transfer learning.
A business-focused reading of dynamic data weighting in LLM training, and why selective forgetting may matter more than simply feeding models more tokens.
A multilingual prompt-injection experiment shows why documents must be treated as active attack surfaces—and why apparent resistance in one language may still conceal unstable decisions.