The Meek Shall Compute It
For the past five years, discussions about AI progress have centered on a simple formula: more data + more compute = better models. This scaling paradigm has produced marvels like GPT-4 and Gemini—but also entrenched a new aristocracy of compute-rich players. Is this inequality here to stay?
According to a provocative new paper from MIT CSAIL, the answer may be: not for long. The authors argue that due to the laws of diminishing returns, the performance gap between state-of-the-art (SOTA) models and smaller, cheaper “meek” models will shrink over time. If true, this reframes the future of AI as one not of centralized supremacy, but of widespread, affordable competence.
Diminishing Returns: When Bigger Isn’t Better
The paper models training loss using Chinchilla-style scaling laws, where:
L(C) = A * C^-alpha + L0
Here, C
is compute, and as C
increases, the improvement to loss diminishes. Even with exponentially growing investments (e.g., 3.6x per year), the gains taper off.
A $1,000 model trained today may trail a SOTA system. But give it a few years of hardware (1.4x/year) and algorithmic progress (2.8x/year), and suddenly it’s punching far above its weight. Eventually, the expensive model’s edge becomes marginal. Figure 1 in the paper captures this beautifully: a steep initial gap that flattens over time.
Inference: The Great Equalizer
Inference matters more than training for most users. And here, the convergence is even faster. Inference costs are dropping rapidly (up to 9x/year), driven by distillation, sparsity, speculative sampling, and efficient transformers.
As a result, users with a $0.5–$1 per million tokens budget can now run models that are “effectively trained" with compute comparable to near-SOTA systems. Figure 4 shows that the inference-based loss gap shrinks faster than the training-based one.
The punchline? You don’t need to train a frontier model to benefit from one. You just need to ride the inference cost curve.
Does Loss Still Capture What Matters?
Skeptics might ask: but does lower loss mean better capabilities? The authors show that benchmark scores like MMLU correlate tightly with loss (via a sigmoid fit). Moreover, using information theory and sequential hypothesis testing, they show that as the loss gap shrinks, it takes exponentially more tokens to reliably distinguish a meek model from a SOTA one. Eventually, they’re nearly indistinguishable.
When Small Gaps Matter
Of course, not all capabilities scale smoothly. In adversarial or multi-step reasoning settings, small differences in loss can balloon into strategic advantages. In a game-playing scenario, 1% better prediction may yield a 20% higher win rate. So while meek models will match general performance, edge cases may still favor giants—for now.
Strategic Implications
This convergence challenges assumptions across AI governance, commercialization, and policy:
Area | Prevailing View | Implication of Meek Model Convergence |
---|---|---|
AI Governance | Regulate frontier models (via Flops thresholds) | Ineffective long-term; need to regulate algorithms, data, and inference channels |
Innovation | Only Big Tech can build useful models | No longer true; individuals can access high-performing models |
Market Power | Money and compute = durable moat | That moat erodes as diminishing returns kick in |
AI Safety | Concentrated capability = easier monitoring | Window is narrow; later ubiquity may pose wider risk |
Beyond Scaling: What Could Break the Trend?
This convergence assumes static training objectives (e.g., next-token prediction on static corpora). But if models shift toward:
- RLHF or competitive fine-tuning,
- synthetic or self-curated data,
- open-ended environments or multi-agent games,
…then the scaling rules may reset.
As the paper notes, “It is no longer a question of how well AIs are learning but what they are learning.”
The Future Is Meek, but Not Weak
This is not a story about small models beating giants. It’s about the flattening of marginal advantage at the frontier. In a world governed by Chinchilla laws, Moore’s law, and fast algorithmic progress, what was once elite becomes accessible.
Yes, a few elite players may still define new capabilities. But once discovered, these capabilities will quickly cascade down to cheaper, more efficient implementations. That’s what the history of software teaches us. And that’s why the meek models may not inherit everything, but they might just inherit the earth.
Cognaptus: Automate the Present, Incubate the Future