Cover image

HAROOD: When Benchmarks Grow Up and Models Stop Cheating

Opening — Why this matters now Human Activity Recognition (HAR) has quietly become one of those applied ML fields where headline accuracy keeps improving, while real-world reliability stubbornly refuses to follow. Models trained on pristine datasets collapse the moment the sensor moves two centimeters, the user changes, or time simply passes. The industry response has been predictable: larger models, heavier architectures, and now—inevitably—LLMs. The paper behind HAROOD argues that this reflex is misplaced. The real problem is not model capacity. It is evaluation discipline. ...

December 12, 2025 · 3 min · Zelina