Memory Optimization

TL;DR for operators The useful lesson from Unifying Data, Memory, and Compute Efficiency in LLM Training: A Survey is not that one efficiency method is about to save everyone’s GPU bill. That would be charming, in the same way procurement decks are charming. The paper’s real contribution is to show why LLM efficiency has become a coupled operating problem: what data you train on changes the compute you spend; how you fit training into memory changes the optimization path; and when you stop, refresh, or reallocate compute depends on both.1 ...

Memory Optimization

LoRA Was Supposed to Fit on the Edge. The Activations Disagreed.

The One-Weird-Trick Era of LLM Efficiency Is Over