Cover image

When Models Guess the Verb by Looking at the Drawer

Drawer. That is the easy part. A model sees a drawer, and it knows that drawers are often opened. Then it watches a video where someone is closing the drawer and predicts opening anyway. This is not the kind of error that makes a demo look silly for five seconds and then disappear into the benchmark appendix. It is the kind of error that reveals what the system is really using as evidence. The model is not necessarily watching the motion. It may be recognizing the object, remembering the most common verb attached to that object during training, and calling that “video understanding.” Very efficient. Also wrong. ...

January 24, 2026 · 17 min · Zelina
Cover image

The Diligent but Brittle Student Inside Every LLM

TL;DR for operators LearnerAgent puts LLM-based “students” through a simulated year of high-school English learning: weekly lessons, exercises, monthly exams, memory retrieval, self-reflection, confidence updates, and peer debate.1 The point is not to cosplay a classroom because AI research apparently needed more homework. The point is to observe learning as a process, not merely as a final benchmark score. ...

August 8, 2025 · 15 min · Zelina