Cover image

Tokens, Watts, and Waste: The Hidden Energy Bill of LLM Inference

Opening — Why this matters now Large language models are now a routine part of software development. They autocomplete functions, explain repositories, and quietly sit inside CI pipelines. The productivity gains are real. The energy bill is less visible. As inference increasingly dominates the lifecycle cost of LLMs, the environmental question is no longer about how models are trained, but how often—and how inefficiently—they are used. This paper asks an unfashionable but necessary question: where exactly does inference energy go? The answer turns out to be uncomfortable. ...

February 8, 2026 · 3 min · Zelina