Cover image

The Art of Interrupting AI: When Knowing Isn’t Talking

Opening — Why this matters now The current generation of AI models can see, hear, and respond. In theory, they should also be able to participate. In practice, they often behave like that one person in a meeting who either interrupts too early—or never speaks at all. This gap is no longer academic. As omni-modal models move into real-time assistants, customer service agents, and even trading copilots, the question is shifting from “Can the model understand?” to something more uncomfortable: ...

March 18, 2026 · 4 min · Zelina
Cover image

Mind the Gap: Why AI Still Struggles to Build Common Ground

Opening — Why this matters now The current generation of AI systems can summarize books, write code, and even simulate conversations that feel uncannily human. Yet place these same systems inside a real collaborative task, and the illusion quickly breaks. Human collaboration depends on something subtle but powerful: common ground—the evolving set of shared beliefs and mutually recognized facts that allow teams to coordinate action. In workplaces, negotiations, and engineering teams, this shared understanding forms the invisible infrastructure of decision-making. ...

March 6, 2026 · 6 min · Zelina
Cover image

When Agents Start Thinking Twice: Teaching Multimodal AI to Doubt Itself

Opening — Why this matters now Multimodal models are getting better at seeing, but not necessarily at understanding. They describe images fluently, answer visual questions confidently—and yet still contradict themselves when asked to reason across perception and language. The gap isn’t capability. It’s coherence. The paper behind this article targets a subtle but costly problem in modern AI systems: models that generate answers they cannot later justify—or even agree with. In real-world deployments, that gap shows up as unreliable assistants, brittle agents, and automation that looks smart until it’s asked why. ...

February 9, 2026 · 3 min · Zelina
Cover image

When Images Pretend to Be Interfaces: Stress‑Testing Generative Models as GUI Environments

Opening — Why this matters now Image generation models are no longer confined to art prompts and marketing visuals. They are increasingly positioned as interactive environments—stand‑ins for real software interfaces where autonomous agents can be trained, tested, and scaled. In theory, if a model can reliably generate the next GUI screen after a user action, we gain a cheap, flexible simulator for everything from mobile apps to desktop workflows. ...

February 9, 2026 · 4 min · Zelina
Cover image

Click Like a Human: Why Avenir-Web Is a Quiet Breakthrough in Web Agents

Opening — Why this matters now For years, autonomous web agents have promised to automate the internet: booking flights, scraping dashboards, configuring enterprise tools, or simply clicking buttons so humans don’t have to. And yet, anyone who has actually tried to deploy one knows the truth—these agents fail in embarrassingly human ways. They get lost. They click the wrong thing. They forget what they were doing halfway through. ...

February 3, 2026 · 5 min · Zelina
Cover image

Seeing Is Not Reasoning: Why Mental Imagery Still Breaks Multimodal AI

Opening — Why this matters now Multimodal AI is having its cinematic moment. Video generation, image rollouts, and interleaved vision–language reasoning are being marketed as steps toward models that can think visually. The implicit promise is seductive: if models can generate images while reasoning, perhaps they can finally reason with them. This paper delivers a colder verdict. When tested under controlled conditions, today’s strongest multimodal models fail at something deceptively basic: maintaining and manipulating internal visual representations over time. In short, they can see—but they cannot mentally imagine in any robust, task‑reliable way. ...

February 3, 2026 · 4 min · Zelina
Cover image

Thinking in Panels: Why Comics Might Beat Video for Multimodal Reasoning

Opening — Why this matters now Multimodal reasoning has quietly hit an efficiency wall. We taught models to think step by step with text, then asked them to imagine with images, and finally to reason with videos. Each step added expressive power—and cost. Images freeze time. Videos drown signal in redundancy. Somewhere between the two, reasoning gets expensive fast. ...

February 3, 2026 · 3 min · Zelina
Cover image

Seeing Is Thinking: When Images Do the Reasoning

Opening — Why this matters now Large language models have learned to talk their way through reasoning. But the real world does not speak in tokens. It moves, collides, folds, and occludes. As multimodal models mature, a quiet question has become unavoidable: is language really the best internal medium for thinking about physical reality? ...

February 2, 2026 · 3 min · Zelina
Cover image

When LLMs Invent Languages: Efficiency, Secrecy, and the Limits of Natural Speech

Opening — Why this matters now Large language models are supposed to speak our language. Yet as they become more capable, something uncomfortable emerges: when pushed to cooperate efficiently, models often abandon natural language altogether. This paper shows that modern vision–language models (VLMs) can spontaneously invent task-specific communication protocols—compressed, opaque, and sometimes deliberately unreadable to outsiders—without any fine-tuning. Just prompts. ...

January 31, 2026 · 3 min · Zelina
Cover image

Seeing Is Misleading: When Climate Images Need Receipts

Opening — Why this matters now Climate misinformation has matured. It no longer argues; it shows. A melting glacier with the wrong caption. A wildfire image from another decade. A meme that looks scientific enough to feel authoritative. In an era where images travel faster than footnotes, public understanding of climate science is increasingly shaped by visuals that lie by omission, context shift, or outright fabrication. ...

January 23, 2026 · 3 min · Zelina