HAROOD: When Benchmarks Grow Up and Models Stop Cheating
A wearable model can look brilliant in the lab and embarrass itself on Monday morning. The user changes. The watch slides down the wrist. A sensor is mounted on the chest instead of the pocket. The same person walks differently after fatigue, injury, aging, or simply because life has the terrible habit of not matching the training set. Human Activity Recognition, or HAR, has always lived with this problem. It turns sensor streams from accelerometers, gyroscopes, EMG, ECG, and other wearable or ambient devices into labels such as walking, running, sitting, cycling, or stress state. It is useful precisely because it moves into the real world. That is also where benchmark accuracy goes to die. ...