Cover image

Safe Hands, Unsafe Audit: Why Robot Success Does Not Prove Robot Safety

A robot finishes the task. It picks, places, inserts, wipes, stacks, or assembles. The demo video looks clean. The benchmark reports success. Everyone exhales. This is exactly where the safety argument should begin, not end. The awkward truth about embodied AI is that a robot can complete a task while accumulating risk along the way. It may interpret the instruction too narrowly, skip an implicit prerequisite, recover from a mistake in a physically unstable way, apply too much force, or pass through a near miss that the final success metric politely declines to remember. The task is done. The audit trail is missing. Convenient, in the same way a black box with wheels is convenient. ...

June 7, 2026 · 18 min · Zelina
Cover image

Lost in the Grid: Why AI Agents Still Can’t Spot the Impostor

Everyone wants autonomous AI agents now. Not assistants. Not copilots. Agents: systems that watch a situation, decide what matters, take action, coordinate with others, and notice when someone in the room is quietly working against the plan. A normal business version sounds less theatrical than a social-deduction game, but the structure is familiar. A workflow has goals. People and software components have partial information. Some signals are useful. Some are noise. Some actors may be careless, misaligned, or malicious. The agent is expected to keep moving, complete the job, and not be fooled by plausible behavior. ...

April 22, 2026 · 16 min · Zelina
Cover image

Eyes Wide Compute: Why Physical AI Needs Better Senses, Not Bigger Models

Camera first. Model second. That is not how most AI roadmaps are written. The usual enterprise recipe is tidier: pick a bigger model, add a cloud endpoint, compress something if the bill becomes embarrassing, then declare the system “edge-ready.” This works tolerably well when the input is a clean document, a database row, or an already-captured image. It works less well when the input is a moving camera in a dark warehouse, a microphone beside a noisy motor, a tactile pad on a robot gripper, or smart glasses trying to understand the world before the battery starts writing its resignation letter. ...

April 16, 2026 · 18 min · Zelina
Cover image

Seeing Is Not Solving: Why AI Still Gets Stuck in 3D Worlds

Wall. That is not the grand philosophical frontier AI companies usually place in their product decks. The frontier is supposed to be reasoning, planning, tool use, autonomy, maybe a tasteful diagram with arrows and a glowing robot hand. But in a visually rich 3D world, a surprisingly large part of “autonomy” still reduces to something less glamorous: can the agent notice that it is stuck against a wall, step back, change angle, and continue? ...

April 12, 2026 · 18 min · Zelina
Cover image

Walking the Line: When Robots Learn to Step Like Humans (Without the Drama)

Walking looks easy until you ask a robot to do it. For humans, stepping over a box or climbing a stair is usually not an executive decision. The body sees the surface, estimates where the foot should land, keeps rhythm, adjusts weight, and moves on. No committee meeting. No multi-stage training pipeline. No adversarial discriminator whispering, “that gait is not sufficiently human-like.” ...

March 22, 2026 · 18 min · Zelina
Cover image

When Memory Lies and Rules Save It: Rethinking LLM Agents in Closed Worlds

Memory is usually sold as the adult upgrade for LLM agents. Give the agent a past. Give it a vector database. Give it episodes, reflections, mistakes, summaries, and a long enough context window to remember every tiny embarrassment. Surely it will become more reliable. The RPMS paper is useful because it interrupts that comforting story with a less fashionable point: memory can make an agent worse when the world has hard action rules.1 ...

March 19, 2026 · 18 min · Zelina
Cover image

Learning From the Punches: How AI Agents Turn Mistakes into Skills

Mistakes are cheap until an agent repeats them. A human worker who keeps failing at the same task usually leaves traces: a blocked aisle, a missing tool, a wrong form field, an error message, a process exception. A competent manager does not simply tell the worker to “try again with more confidence.” The useful move is more boring and more valuable: identify the pattern, write the repair rule, and make sure the next attempt starts from the point of failure rather than from the beginning. ...

March 16, 2026 · 18 min · Zelina
Cover image

When Models Get Lost in Space: Why MLLMs Still Fail Geometry

Geometry looks clean. A cube has edges. A projection has rules. A missing view should follow from the views already shown. This is not the messy world of occluded street scenes, motion blur, shadows, or a warehouse camera pointed at the wrong shelf. It is the kind of visual reasoning many students learn before they are trusted with anything more dangerous than a compass, a ruler, and mild boredom. ...

February 14, 2026 · 15 min · Zelina
Cover image

Stable World Models, Unstable Benchmarks: Why Infrastructure Is the Real Bottleneck

A robot does not fail politely. It does not say, “I was trained on a slightly different shade of blue.” It just misses the object, pushes the wrong way, or confidently follows a plan that only works in the tidy little universe where the benchmark was born. That is the uncomfortable lesson behind stable-worldmodel-v1, a paper that is less about inventing a new world model and more about asking whether world-model research has been measuring the right thing in the first place.1 ...

February 10, 2026 · 14 min · Zelina
Cover image

Benchmarks Lie, Rooms Don’t: Why Embodied AI Fails the Moment It Enters Your House

The room is not impressed by your leaderboard A robot that performs well on a public benchmark has not necessarily learned how to operate in your house. It may recognize a chair in a dataset. It may answer a visual question about a tidy image. It may even produce a confident paragraph explaining where the coffee mug should be. Then it enters a real room — with mirrors, partial views, cluttered corners, awkward sightlines, and objects that are not positioned for benchmark convenience — and suddenly the “general intelligence” starts behaving like a tourist holding the map upside down. ...

February 7, 2026 · 17 min · Zelina