Cover image

LoRA Was Supposed to Fit on the Edge. The Activations Disagreed.

TL;DR for operators LoRA does not magically make LLM fine-tuning fit on phones, laptops, or small edge boxes. It reduces the number of trainable parameters. The paper’s useful contribution is showing that this is only the opening move. The real memory bill arrives from activations, checkpoint boundaries, vocabulary-sized output computations, and tokens that are being processed even though they do not contribute to the loss. Apparently the memory allocator did not attend the product strategy meeting. ...

June 21, 2026 · 19 min · Zelina
Cover image

The One-Weird-Trick Era of LLM Efficiency Is Over

TL;DR for operators The useful lesson from Unifying Data, Memory, and Compute Efficiency in LLM Training: A Survey is not that one efficiency method is about to save everyone’s GPU bill. That would be charming, in the same way procurement decks are charming. The paper’s real contribution is to show why LLM efficiency has become a coupled operating problem: what data you train on changes the compute you spend; how you fit training into memory changes the optimization path; and when you stop, refresh, or reallocate compute depends on both.1 ...

June 21, 2026 · 18 min · Zelina