Human-AI Interaction

Peak Performance: Why Alignment Needs a Sense of Timing

A support ticket does not usually fail because every message was bad. More often, it fails because one reply arrived at exactly the wrong moment: the bot misunderstood a frustrated customer, repeated a stale answer, missed the escalation point, and then ended the interaction with something sterile enough to pass a benchmark but useless enough to make the customer leave. The average quality may look acceptable. The experience still feels broken. ...

From Saliency to Systems: Operationalizing XAI with X-SYS

The explanation worked in the notebook; then production happened A familiar enterprise AI story begins with a reassuring demo. A model produces a questionable prediction. Someone opens a notebook, runs SHAP, LIME, a saliency map, a concept attribution method, or whatever interpretability tool is currently fashionable enough to appear in slide decks. The plot looks plausible. The team nods. Compliance is told that explainability has been “implemented.” ...

When Robots Guess, People Bleed: Teaching AI to Say ‘This Is Ambiguous’

Vial. That is the easy version of the problem. A robot stands near a surgical tray. A person says, “Pass me the vial.” There are two vials. One is harmless. One is not. The robot does not need a better smile, a warmer voice, or a more fluent explanation of how helpful it intends to be. It needs to know that the instruction should not be executed yet. ...

Too Human, Too Soon? The Global Limits of Anthropomorphic AI

A chatbot with a name, a warmer tone, a few emojis, and a slightly irregular rhythm does not feel like a philosophical problem at first. It feels like product polish. That is exactly why anthropomorphic AI is difficult to govern. The cues are small. A friendly name here, a follow-up question there, a little latency to imitate human typing, a softer apology, a more adaptive conversational style. None of these looks dramatic enough to trigger a board-level ethics review. Together, however, they move the system from “tool” toward “someone-like.” ...

Suzume-chan, or: When RAG Learns to Sit in Your Hand

A visitor walks into a research demo, a museum gallery, a hospital information corner, or a corporate training booth. The expert is busy. The brochure is dry. The QR code leads to a page nobody wants to read while standing up. The chatbot is available, technically, but it lives behind a screen and feels like another form to be tolerated. ...

Flame Tamed: Can LLMs Put Out the Internet’s Worst Fires?

Flame Tamed: Can LLMs Put Out the Internet’s Worst Fires? A comment thread rarely explodes in one clean motion. It starts with a correction. Then someone reads the correction as condescension. Then another person adds a historical grievance, a screenshot, three exclamation marks, and the kind of moral certainty normally reserved for courtrooms and family dinners. By the time a moderator arrives, the thread is no longer a conversation. It is archaeology with insults. ...

Mind the Gap: When Robots Learn Social Norms the Human Way

A hotel robot does not need to understand the human soul. It does, however, need to stop cutting between two guests mid-conversation like an intern late for coffee. That distinction matters. Most enterprise conversations about autonomous agents still treat navigation as a logistics problem: reach the destination, avoid collision, minimise delay. Very tidy. Very spreadsheet. Also incomplete. In public-facing environments, a robot can be technically safe and still socially unpleasant. It can avoid hitting people while still making them step back, tense up, or wonder why the expensive machine has the spatial awareness of a supermarket trolley. ...

When Ambiguity Helps: Rethinking How AI Interprets Our Data Questions

A manager asks the analytics copilot, “Which regions are underperforming this quarter?” This sounds like a normal business question. It is also, technically, a small swamp. Which regions? Sales regions, operating regions, logistics regions, or customer billing regions? Underperforming against what: forecast, last quarter, budget, peers, margin, revenue, retention, or some executive’s private sense of disappointment? And “this quarter” may mean calendar quarter, fiscal quarter, quarter-to-date, or the latest complete quarter if the finance team has not closed the books yet. ...

From Chat Logs to Goal Logs: OnGoal’s Playbook for Goal‑Truthful LLMs

TL;DR for operators OnGoal is not another attempt to make the chatbot magically “understand intent”. That would be adorable, and also not the paper. It is a goal-observability interface: a way to show users which goals the system thinks are active, how those goals change over a conversation, and whether each model response appears to confirm, contradict, or ignore them.1 ...

Mind Games: How LLMs Subtly Rewire Human Judgment

TL;DR for operators When an LLM summarises a review, policy memo, support ticket, medical note, or news item, the operational question is not only “Did it get the facts right?” The sharper question is: did it change what the user is likely to believe, prioritise, or buy? The paper behind this article studies exactly that problem. It treats LLM-generated content as a decision interface and measures three ways the interface can quietly bend human judgment: changing the sentiment frame of the source, overemphasising the beginning of the source, and fabricating confident answers for events beyond the model’s knowledge cutoff.1 ...