
The User Is Present: Why Smart Agents Still Don't Get You
If today’s AI agents are so good with tools, why are they still so bad with people? That’s the uncomfortable question posed by UserBench, a new gym-style benchmark from Salesforce AI Research that evaluates LLM-based agents not just on what they do, but how well they collaborate with a user who doesn’t say exactly what they want. At first glance, UserBench looks like yet another travel planning simulator. But dig deeper, and you’ll see it flips the standard script of agent evaluation. Instead of testing models on fully specified tasks, it mimics real conversations: the user’s goals are vague, revealed incrementally, and often expressed indirectly. Think “I’m traveling for business, so I hope to have enough time to prepare” instead of “I want a direct flight.” The agent’s job is to ask, interpret, and decide—with no hand-holding. ...