Particle-Physics

TL;DR for operators A useful scaling law does not merely say “bigger is better.” That is not a law; that is a purchasing department with a GPU account. The paper behind this article studies whether the composition of pretraining data can change the compute-optimal balance between model size and downstream data in jet classification.1 The answer, in this setting, is yes. Training from scratch on JetClass produces a nearly balanced scaling rule: as compute grows, the optimal model size and dataset size grow at roughly similar rates. But after pretraining on a JetClass-II corpus augmented with Beyond Standard Model resonance decays, the compute-optimal rule shifts sharply toward downstream data. More of the next compute budget should be spent processing more examples, not inflating the model. ...