Cover image

When 'Check the AC' Becomes the Hard Part

TL;DR for operators Smart-home assistants do not fail only when users are vague. They fail when users become efficient. The PEC-Home paper studies a familiar pattern: after repeated interaction, people stop saying the whole thing. “Please turn on the air conditioner in the bedroom and set it to 26 degrees at 10 PM” eventually becomes “check the AC” or “handle that thing.” Humans manage this because shared context, identity, place, and prior routines do the missing work. Current LLM assistants are much less charming under that burden. ...

June 25, 2026 · 19 min · Zelina
Cover image

Trace Evidence: The AI Learned Something. Can You Inspect What?

TL;DR for operators AI systems are increasingly learning from traces: documents, chats, code reviews, human rationales, fine-grained labels, unlabeled examples, user profiles, browsing context, and interaction history. That is useful. It is also how quiet operational risk walks through the front door wearing a badge that says “personalization.” Three recent papers form a useful logic chain. One paper shows how human traces can be turned into explicit, portable, correctable skill artifacts. A second shows how task-specific labels, synthetic reasoning, and reinforcement learning can optimize a model for a difficult moderation task. A third shows why consumer-facing health LLMs remain hard to evaluate independently once personalization, browser interfaces, multi-turn interaction, and silent model updates enter the picture. ...

June 24, 2026 · 14 min · Zelina
Cover image

Rewarding Behavior: Why Enterprise AI Needs More Than Bigger Models

Enterprise AI teams have developed a familiar reflex. When the model behaves unreliably, they try a better prompt. When that fails, they try a larger model. When that becomes expensive, they invent a workflow diagram with many arrows and call it an operating model. Very dignified. Very scalable, in the same way that adding more sticky notes to a broken process is scalable. ...

June 10, 2026 · 17 min · Zelina
Cover image

Time to Prefer: Why Binary RLHF Feedback Leaves Reward Models Guessing

Time to Prefer: Why Binary RLHF Feedback Leaves Reward Models Guessing Thumbs-up feedback looks efficient. It is clean, cheap, easy to store, and friendly to dashboards. One output wins, another output loses, and the reward model learns what humans supposedly want. A tidy little morality market, with all the nuance of a vending machine. ...

June 5, 2026 · 17 min · Zelina
Cover image

When Your AI Knows Too Little: The Hidden Bottleneck in Personal Agents

Lunch is a simple word. In an AI assistant demo, “order me lunch” looks like the kind of request that should be easy by now. Open the food app. Pick something. Pay. Done. The button-clicking part is no longer the miracle. The problem is everything the user did not say. Do they avoid peanuts? Do they usually order from Tuantuan or Chilemei? Is “light lunch” about calories, price, time, or avoiding the food coma before a meeting? Should the assistant ask first, or does asking defeat the whole point of assistance? And if the user says no, does the assistant actually stop, or does it “helpfully” continue doing the wrong thing with the confidence of a junior consultant holding a fresh slide deck? ...

April 10, 2026 · 15 min · Zelina
Cover image

Memory That Actually Remembers: Why MemMachine Signals a Shift in AI Agent Architecture

Memory sounds simple until a business actually needs it. A sales agent should remember what the client objected to last month. A customer-support agent should remember that a refund exception was already approved. A research assistant should remember which dataset was rejected, not vaguely summarize it into “user prefers cleaner data.” A healthcare or financial assistant should not turn a precise historical statement into a soft personality trait because the memory layer wanted to look elegant. Cute demos tolerate this. Production systems do not. ...

April 7, 2026 · 18 min · Zelina
Cover image

The File System Strikes Back: Why AI Agents Still Can’t Understand Your Life

Files are where AI agent demos go to become adults. In a product video, the agent opens a few clean documents, remembers your preferences, drafts an answer, books the meeting, and looks quietly inevitable. In an actual computer, the same agent faces a folder called final_final_v3, a receipt saved as an image, a calendar invite with the wrong title, a video that contains the decisive evidence at second 8, and three people who all appear in the same user’s digital life. Suddenly the assistant that “knows you” looks less like a colleague and more like an intern who has discovered search for the first time. ...

April 2, 2026 · 17 min · Zelina
Cover image

Drive My Way: When Autonomous Cars Start Having Personalities

Car settings are usually pretending to know you. Sport mode assumes you are impatient. Eco mode assumes you have discovered moral superiority through fuel efficiency. Comfort mode assumes everyone in the vehicle prefers to be gently transported like a bowl of soup. These modes are not useless. They are just blunt. They adjust a handful of parameters and call the result personalization, which is a bit like calling a restaurant “personalized” because it offers small, medium, and large. ...

March 28, 2026 · 20 min · Zelina
Cover image

When Interfaces Guess Back: Implicit Intent Is the New GUI Bottleneck

The problem starts with a very ordinary sentence “Order my usual lunch.” For a human assistant, this sentence is not empty. It carries history. It points to an app, a restaurant, a branch, a meal, maybe a delivery address, maybe a payment method. For a conventional GUI agent, it is a trap wearing casual clothes. ...

January 15, 2026 · 15 min · Zelina
Cover image

EverMemOS: When Memory Stops Being a Junk Drawer

Memory sounds simple until the assistant has to remember two incompatible things at once. A customer loves craft beer. The same customer is temporarily taking antibiotics. A flat memory system retrieves “likes IPA” and recommends a variety pack, because apparently “memory” means grabbing the loudest sticky note from a drawer and pretending it is wisdom. A more useful assistant retrieves the preference, the medical constraint, the timing, and the relation among them. It recommends a mocktail and quietly avoids turning personalization into negligence. ...

January 6, 2026 · 17 min · Zelina