Scaling Laws Without Power Laws: Why Bigger Models Still Win
Opening — Why this matters now The scaling law debate was supposed to be settled. Bigger models, more data, more compute—loss falls predictably. Then came the uncomfortable question: what exactly is being scaled? If power laws in natural language data are the root cause, then scaling laws might be an artifact of language itself, not of learning. This paper dismantles that comfort. ...