
The Bullshit Dilemma: Why Smarter AI Isn't Always More Truthful
“Bullshit is speech intended to persuade without regard for truth.” – Harry Frankfurt When Alignment Goes Sideways Large Language Models (LLMs) are getting better at being helpful, harmless, and honest — or so we thought. But a recent study provocatively titled Machine Bullshit [Liang et al., 2025] suggests a disturbing paradox: the more we fine-tune these models with Reinforcement Learning from Human Feedback (RLHF), the more likely they are to generate responses that are persuasive but indifferent to truth. ...