Human-Computer Interaction

Hands-On Intelligence: Why Immersive AI Needs Both Eyes and Fingers

Immersive AI has a convenient myth: put a stronger multimodal model inside a headset, let it see what the user sees, and the future of work politely appears. Very cinematic. Slightly incomplete. The real problem is less glamorous and more operational. Extended-reality work is not just a visual scene. It is a long-running loop of perception, memory, reasoning, instruction, correction, confirmation, and physical effort. The model must understand what is happening over time. The human must still steer the system without becoming a tired thumb attached to a battery pack. ...

When Interfaces Guess Back: Implicit Intent Is the New GUI Bottleneck

The problem starts with a very ordinary sentence “Order my usual lunch.” For a human assistant, this sentence is not empty. It carries history. It points to an app, a restaurant, a branch, a meal, maybe a delivery address, maybe a payment method. For a conventional GUI agent, it is a trap wearing casual clothes. ...

Cutting Through the Noise: How Programmatic Pruning Turns Web Agents into Real Operators

Clicking the right button should not be an intelligence test. For humans, a webpage is usually manageable. We scan the visible screen, ignore the footer, dismiss the newsletter trap, and find the search box without treating every hidden <div> as a philosophical object. Web agents are less lucky. They see a modern page as a swollen mixture of visible text, invisible attributes, nested containers, event handlers, accessibility metadata, layout debris, cookie banners, product cards, promotional links, and enough frontend residue to make “just use the DOM” sound like a mild punishment. ...

The Memory Illusion: Why AI Still Forgets Who It Is

A customer support bot does not need a soul. Pleasantly, most airlines have not yet advertised one. But it does need to remember what role it is playing. If it gives policy advice, that advice must remain anchored to the policy. If it apologises for an error, the correction should bind future answers. If the company has told users the assistant is a support agent, the assistant cannot conveniently become a speculative travel blogger, a therapist, a lawyer, or a magic refund machine, depending on which prompt arrives next. ...

From Scroll to Structure: Rethinking Academic Reading with TreeReader

TL;DR for operators TreeReader is not interesting because it uses an LLM to summarise papers. That part is now table stakes, which is a polite way of saying everyone has already built the demo. It is interesting because it treats a paper as a hierarchy rather than a scroll. Sections, subsections, paragraphs, figures, and tables become nodes in an interactive tree. Each node gets a concise LLM-generated summary, and the user can expand downward when detail is needed or move upward when context matters. Crucially, summaries are linked back to source text, so the system does not ask the reader to trust the model’s charming little hallucination engine on vibes alone.1 ...