The Meek Shall Compute It
TL;DR for operators The usual AI strategy story is simple: whoever spends the most on compute owns the future. The paper behind this article makes a more awkward claim: under current language-model scaling assumptions, massive compute advantage may be a temporary lead, not a permanent moat.1 The mechanism is not magic. It is diminishing returns. Chinchilla-like scaling laws imply that each additional unit of training compute buys a smaller reduction in loss. Meanwhile, hardware improvement and algorithmic progress are shared forces. They do not only help the largest labs. They also make yesterday’s “small” budget more capable. The result is a curve where frontier models pull ahead, peak in relative advantage, and then become less distinguishable from cheaper models. ...