Cover image

Suzume-chan, or: When RAG Learns to Sit in Your Hand

Opening — Why this matters now For all the raw intelligence of modern LLMs, they still feel strangely absent. Answers arrive instantly, flawlessly even—but no one is there. The interaction is efficient, sterile, and ultimately disposable. As enterprises rush to deploy chatbots and copilots, a quiet problem persists: people understand information better when it feels socially grounded, not merely delivered. ...

December 13, 2025 · 3 min · Zelina
Cover image

The User Is Present: Why Smart Agents Still Don't Get You

If today’s AI agents are so good with tools, why are they still so bad with people? That’s the uncomfortable question posed by UserBench, a new gym-style benchmark from Salesforce AI Research that evaluates LLM-based agents not just on what they do, but how well they collaborate with a user who doesn’t say exactly what they want. At first glance, UserBench looks like yet another travel planning simulator. But dig deeper, and you’ll see it flips the standard script of agent evaluation. Instead of testing models on fully specified tasks, it mimics real conversations: the user’s goals are vague, revealed incrementally, and often expressed indirectly. Think “I’m traveling for business, so I hope to have enough time to prepare” instead of “I want a direct flight.” The agent’s job is to ask, interpret, and decide—with no hand-holding. ...

July 30, 2025 · 3 min · Zelina