Data Weighting

TL;DR for operators Training data is not a warehouse inventory problem. It is closer to nutrition. What helps a model early in pretraining may not be what helps it later, and a sample’s value can depend on the other samples sitting in the same batch. Obvious, perhaps. Operationalised? Less often. The paper behind this article, LLM Data Selection and Utilization via Dynamic Bi-level Optimization, proposes a Data Weighting Model, or DWM, that does not merely decide which data enters training. It assigns weights to samples within each batch, freezes those weights while the language model trains for a stage, then updates the weighting model using validation performance through a bi-level optimisation loop.1 ...