Compute Efficiency

Scaling Laws Without Power Laws: Why Bigger Models Still Win

Opening — Why this matters now The scaling law debate was supposed to be settled. Bigger models, more data, more compute—loss falls predictably. Then came the uncomfortable question: what exactly is being scaled? If power laws in natural language data are the root cause, then scaling laws might be an artifact of language itself, not of learning. This paper dismantles that comfort. ...

When Lateral Beats Linear: How LToT Rethinks the Tree of Thought

When Lateral Beats Linear: How LToT Rethinks the Tree of Thought AI researchers are learning that throwing more compute at reasoning isn’t enough. The new Lateral Tree-of-Thoughts (LToT) framework shows that the key isn’t depth—but disciplined breadth. The problem with thinking deeper As models like GPT and Mixtral gain access to massive inference budgets, the default approach—expanding Tree-of-Thought (ToT) searches—starts to break down. With thousands of tokens or nodes to explore, two predictable pathologies emerge: ...

The Meek Shall Compute It

The Meek Shall Compute It For the past five years, discussions about AI progress have centered on a simple formula: more data + more compute = better models. This scaling paradigm has produced marvels like GPT-4 and Gemini—but also entrenched a new aristocracy of compute-rich players. Is this inequality here to stay? According to a provocative new paper from MIT CSAIL, the answer may be: not for long. The authors argue that due to the laws of diminishing returns, the performance gap between state-of-the-art (SOTA) models and smaller, cheaper “meek” models will shrink over time. If true, this reframes the future of AI as one not of centralized supremacy, but of widespread, affordable competence. ...