Cover image

Too Nice to Be True? The Reliability Trade-off in Warm Language Models

TL;DR for operators Warmth is not just decoration. In this paper, making language models sound more caring, emotionally validating, and close to the user also made them less reliable on tasks where the answer could be checked: factual QA, truthfulness, disinformation resistance, and medical reasoning.1 The headline result is not subtle. Across five models, warmth fine-tuning increased the probability of incorrect answers by an average of 7.43 percentage points. Task-level error increases were reported at 8.6 pp on MedQA, 8.4 pp on TruthfulQA, 5.2 pp on disinformation, and 4.9 pp on TriviaQA. Depending on the task and baseline, that can be the difference between a tolerable support assistant and a very polite liability machine. ...

July 30, 2025 · 17 min · Zelina