Systems Optimization

Freeze Now, Learn Faster: When Parameter Freezing Meets Pipeline Reality Freeze. That sounds like the least exciting verb in machine learning. We prefer more heroic verbs: scale, align, reason, distill, orchestrate, agentify. Freeze sounds like something a GPU does right before the invoice becomes spiritually educational. But in large-model training, freezing can be a serious efficiency tool. The idea is simple: if some parameters do not need to be updated at every step, skip their backward computation and save time. The trap is also simple: saving computation is not the same as saving wall-clock time. In pipeline-parallel training, a GPU can compute less and still finish the batch no earlier, because another dependency is blocking the schedule. Congratulations, the model learned less and the training job did not get meaningfully faster. A tiny miracle of systems inefficiency. ...