Cover image

Bending the Beam, Not the Brain: What RL with Perfect Rewards Still Can’t Teach LLMs

Beams are honest objects. Push them, load them, move their supports, and they obey equilibrium equations without theatrical ambiguity. Language models, unfortunately, are less well-behaved. That is what makes BeamPERL a useful paper. It does not test LLM reasoning on a vague benchmark where “correctness” means pleasing a judge, matching a rubric, or sounding sufficiently graduate-school. It asks a compact reasoning model to solve a classical beam statics task: calculate support reactions for a loaded beam. The answers can be checked by a symbolic solver. The reward can be exact. No vibes, no partial credit, no “the answer feels plausible.”1 ...

March 5, 2026 · 16 min · Zelina