Cover image

MemCtrl: Teaching Small Models What *Not* to Remember

Opening — Why this matters now Embodied AI is hitting a very human bottleneck: memory. Not storage capacity, not retrieval speed—but judgment. Modern multimodal large language models (MLLMs) can see, reason, and act, yet when deployed as embodied agents they tend to remember too much, too indiscriminately. Every frame, every reflection, every redundant angle piles into context until the agent drowns in its own experience. ...

January 31, 2026 · 4 min · Zelina
Cover image

From Talking to Living: Why AI Needs Human Simulation Computation

Opening — Why this matters now Large language models have become remarkably fluent. They explain, summarize, reason, and occasionally even surprise us. But fluency is not the same as adaptability. As AI systems are pushed out of chat windows and into open, messy, real-world environments, a quiet limitation is becoming impossible to ignore: language alone does not teach an agent how to live. ...

January 21, 2026 · 4 min · Zelina
Cover image

When Diffusion Learns How to Open Drawers

Opening — Why this matters now Embodied AI has a dirty secret: most simulated worlds look plausible until a robot actually tries to use them. Chairs block drawers, doors open into walls, and walkable space exists only in theory. As robotics shifts from toy benchmarks to household-scale deployment, this gap between visual realism and functional realism has become the real bottleneck. ...

January 14, 2026 · 3 min · Zelina
Cover image

When Robots Guess, People Bleed: Teaching AI to Say ‘This Is Ambiguous’

Opening — Why this matters now Embodied AI has become very good at doing things. What it remains surprisingly bad at is asking a far more basic question: “Should I be doing anything at all?” In safety‑critical environments—surgical robotics, industrial automation, AR‑assisted operations—this blind spot is not academic. A robot that confidently executes an ambiguous instruction is not intelligent; it is dangerous. The paper behind Ambi3D and AmbiVer confronts this neglected layer head‑on: before grounding, planning, or acting, an agent must determine whether an instruction is objectively unambiguous in the given 3D scene. ...

January 12, 2026 · 4 min · Zelina
Cover image

NPCs With Short-Term Memory Loss: Benchmarking Agents That Actually Live in the World

Opening — Why this matters now Agentic AI has entered its Minecraft phase again. Not because blocks are trendy, but because open-world games remain one of the few places where planning, memory, execution, and failure collide in real time. Yet most agent benchmarks still cheat. They rely on synthetic prompts, privileged world access, or oracle-style evaluation that quietly assumes the agent already knows where everything is. The result: impressive demos, fragile agents, and metrics that flatter models more than they inform builders. ...

January 10, 2026 · 4 min · Zelina
Cover image

Think First, Grasp Later: Why Robots Need Reasoning Benchmarks

Opening — Why this matters now Robotics has reached an awkward adolescence. Vision–Language–Action (VLA) models can now describe the world eloquently, name objects with near-human fluency, and even explain why a task should be done a certain way—right before dropping the object, missing the grasp, or confidently picking up the wrong thing. This is not a data problem. It’s a diagnostic one. ...

January 3, 2026 · 5 min · Zelina
Cover image

Don’t Forget How to Feel: Teaching Motion Models Empathy Without Amnesia

Opening — Why this matters now Embodied AI has learned how to move. It has learned how to listen. It has even learned how to respond. But when it comes to learning how to feel, most systems quietly panic the moment the world changes. Robots trained to walk sadly forget how to do so once they start running. Avatars that learned exaggerated emotion on stage lose subtlety in sports. This isn’t a bug—it’s the inevitable outcome of static datasets colliding with a dynamic world. ...

December 23, 2025 · 4 min · Zelina
Cover image

Don’t Tell the Robot What You Know

Opening — Why this matters now Large Language Models are very good at knowing. They are considerably worse at helping. As AI systems move from chat interfaces into robots, copilots, and assistive agents, collaboration becomes unavoidable. And collaboration exposes a deeply human cognitive failure that LLMs inherit wholesale: the curse of knowledge. When one agent knows more than another, it tends to communicate as if that knowledge were shared. ...

December 20, 2025 · 4 min · Zelina
Cover image

CitySeeker: Lost in Translation, Found in the City

Opening — Why this matters now Urban navigation looks deceptively solved. We have GPS, street-view imagery, and multimodal models that can describe a scene better than most humans. And yet, when vision-language models (VLMs) are asked to actually navigate a city — not just caption it — performance collapses in subtle, embarrassing ways. The gap is no longer about perception quality. It is about cognition: remembering where you have been, knowing when you are wrong, and understanding implicit human intent. This is the exact gap CitySeeker is designed to expose. ...

December 19, 2025 · 3 min · Zelina
Cover image

SceneMaker: When 3D Scene Generation Stops Guessing

Opening — Why this matters now Single-image 3D scene generation has quietly become one of the most overloaded promises in computer vision. We ask a model to hallucinate geometry, infer occluded objects, reason about spatial relationships, and place everything in a coherent 3D world — all from a single RGB frame. When it fails, we call it a data problem. When it half-works, we call it progress. ...

December 13, 2025 · 4 min · Zelina