Cover image

Skill Issue, Literally: Repairing Agent Instructions Without an Answer Key

SkillAudit shows how agent skills can be improved without hidden tests by comparing with-skill and without-skill executions, but only when correctness leaves an observable trace.

July 3, 2026 · 21 min · Zelina
Cover image

The Drift Alarm Is Not the Strategy

A practical reading of learner-based concept drift detection: when SPC, windowing, and ensemble methods help, when they disappoint, and why real streams refuse to behave like benchmark streams.

July 3, 2026 · 16 min · Zelina
Cover image

The Robot Needs a Shift Supervisor

A systematic study of hierarchical VLA agents shows that robot reliability depends less on simply adding hierarchy than on how planning, control, memory, observation, and handoff are orchestrated.

July 3, 2026 · 24 min · Zelina
Cover image

The Molecule Was Right. The Reasoning Was Not.

ChemCoTBench-V2 shows why chemical AI evaluation has to inspect intermediate molecular and reaction states, not just final answers.

July 2, 2026 · 17 min · Zelina
Cover image

The Room Remembers, the Model Forgets

LongSpace shows why long-video AI needs spatial memory, not just larger context windows.

July 2, 2026 · 17 min · Zelina
Cover image

The Tool Response Is Not Your Boss

AgentRedBench shows that enterprise AI-agent risk is less about naughty chat prompts and more about untrusted SaaS content quietly steering authorized tool actions.

July 1, 2026 · 19 min · Zelina
Cover image

Do Not Mix the Wires Before They Sing

A mechanism-first reading of why EEG-to-music reconstruction improves when models preserve electrode-level structure before alignment and generation.

June 29, 2026 · 17 min · Zelina
Cover image

Measure Twice, Generate, Then Look Again

IterCAD shows why reliable CAD automation depends less on one-shot generation and more on closed-loop execution, visual feedback, and survivor-bias-free evaluation.

June 29, 2026 · 21 min · Zelina
Cover image

No CIG, Still Checking: When Medical Guidelines Become Executable

A mechanism-first reading of an LLM-orchestrated stroke-care conformance pipeline, and what it teaches operators about turning unstructured policy into auditable process checks.

June 29, 2026 · 24 min · Zelina
Cover image

No Structure, No Glory: Why AI Cognition Has to Be Shown, Not Named

Two recent papers show why serious claims about AI cognition require evidence of internal organization, not just fluent behavior or attractive labels.

June 29, 2026 · 18 min · Zelina