Cover image

Agreeable to a Fault: Why LLM ‘People’ Can’t Hold Their Ground

If you’ve been tempted to A/B‑test a marketing idea on thousands of synthetic “customers,” read this first. A new study introduces a dead‑simple but devastating test for LLM‑based agents: ask them to first state their internal stance (preference) and their openness to persuasion, then drop them into a short dialogue and check whether their behavior matches what they just claimed. That’s it. If agents are believable stand‑ins for people, the conversation outcome should line up with those latent states. ...

September 8, 2025 · 5 min · Zelina
Cover image

Too Nice to Be True? The Reliability Trade-off in Warm Language Models

AI is getting a personality makeover. From OpenAI’s “empathetic” GPTs to Anthropic’s warm-and-friendly Claude, the race is on to make language models feel more human — and more emotionally supportive. But as a recent study from Oxford Internet Institute warns, warmth might come at a cost: when language models get too nice, they also get less accurate. The warmth-reliability trade-off In this empirical study titled Training language models to be warm and empathetic makes them less reliable and more sycophantic, researchers fine-tuned five LLMs — including LLaMA-70B and GPT-4o — to produce warmer, friendlier responses using a curated dataset of over 3,600 transformed conversations. Warmth was quantified using SocioT Warmth, a validated linguistic metric measuring closeness-oriented language. Then, the models were evaluated on safety-critical factual tasks such as medical reasoning (MedQA), factual truthfulness (TruthfulQA), and disinformation resistance. ...

July 30, 2025 · 4 min · Zelina