Cover image

Mind the Flux: Why Average Accuracy Fails Where the Towers Aren’t

TL;DR for operators Models are often sold as if accuracy were a passport: one clean number, stamped at the border, cleared for deployment. FLUXtrapolation is a useful reminder that the border is usually where the problem begins. The paper introduces a benchmark for predicting hourly ecosystem fluxes — carbon, water, and energy exchanges between ecosystems and the atmosphere — when direct measurements exist only at sparse flux-tower sites.1 The mechanism is simple and unpleasant: train models where towers exist, then test them in progressively less comfortable situations where the future, the geography, or the temperature regime has shifted. ...

June 16, 2026 · 17 min · Zelina
Cover image

HAROOD: When Benchmarks Grow Up and Models Stop Cheating

A wearable model can look brilliant in the lab and embarrass itself on Monday morning. The user changes. The watch slides down the wrist. A sensor is mounted on the chest instead of the pocket. The same person walks differently after fatigue, injury, aging, or simply because life has the terrible habit of not matching the training set. Human Activity Recognition, or HAR, has always lived with this problem. It turns sensor streams from accelerometers, gyroscopes, EMG, ECG, and other wearable or ambient devices into labels such as walking, running, sitting, cycling, or stress state. It is useful precisely because it moves into the real world. That is also where benchmark accuracy goes to die. ...

December 12, 2025 · 20 min · Zelina