Cover image

Freeze Now, Learn Faster: When Parameter Freezing Meets Pipeline Reality

Opening — Why this matters now Training large language models has quietly shifted from an optimization problem into a scheduling problem. As model sizes balloon and GPU clusters grow deeper rather than wider, pipeline parallelism has become unavoidable. Yet most efficiency tricks—parameter freezing included—still behave as if time does not exist. This paper introduces TimelyFreeze, a system-level rethink of parameter freezing that aligns what we freeze with when computation actually happens. Instead of blindly freezing layers based on gradient statistics or heuristics, TimelyFreeze asks a more practical question: which parameters are on the critical path right now? ...

February 8, 2026 · 3 min · Zelina