Cover image

When LLMs Invent Languages: Efficiency, Secrecy, and the Limits of Natural Speech

Opening — Why this matters now Large language models are supposed to speak our language. Yet as they become more capable, something uncomfortable emerges: when pushed to cooperate efficiently, models often abandon natural language altogether. This paper shows that modern vision–language models (VLMs) can spontaneously invent task-specific communication protocols—compressed, opaque, and sometimes deliberately unreadable to outsiders—without any fine-tuning. Just prompts. ...

January 31, 2026 · 3 min · Zelina
Cover image

Seeing Is Misleading: When Climate Images Need Receipts

Opening — Why this matters now Climate misinformation has matured. It no longer argues; it shows. A melting glacier with the wrong caption. A wildfire image from another decade. A meme that looks scientific enough to feel authoritative. In an era where images travel faster than footnotes, public understanding of climate science is increasingly shaped by visuals that lie by omission, context shift, or outright fabrication. ...

January 23, 2026 · 3 min · Zelina
Cover image

MobileDreamer: When GUI Agents Stop Guessing and Start Imagining

Opening — Why this matters now GUI agents are everywhere in demos and nowhere in production. They click, scroll, and type impressively—right up until the task requires foresight. The moment an interface branches, refreshes, or hides its intent behind two more screens, today’s agents revert to trial-and-error behavior. The core problem isn’t vision. It’s imagination. ...

January 8, 2026 · 4 min · Zelina
Cover image

Crossing the Line: Teaching Pedestrian Models to Reason, Not Memorize

Opening — Why this matters now Pedestrian fatalities are rising, mid-block crossings dominate risk exposure, and yet most models tasked with predicting pedestrian behavior remain stubbornly local. They perform well—until they don’t. Move them to a new street, a wider arterial, or a different land-use mix, and accuracy quietly collapses. This is not a data problem. It’s a reasoning problem. ...

January 5, 2026 · 4 min · Zelina
Cover image

Echoes, Not Amnesia: Teaching GUI Agents to Remember What Worked

Opening — Why this matters now GUI agents are finally competent enough to click buttons without embarrassing themselves. And yet, they suffer from a strangely human flaw: they forget everything they just learned. Each task is treated as a clean slate. Every mistake is patiently re‑made. Every success is quietly discarded. In a world obsessed with scaling models, this paper asks a simpler, sharper question: what if agents could remember? ...

December 23, 2025 · 3 min · Zelina
Cover image

Seeing Isn’t Knowing: Why Vision-Language Models Still Miss the Details

Opening — Why this matters now Vision-language models (VLMs) have become unreasonably confident. Ask them to explain a chart, reason over a meme, or narrate an image, and they respond with eloquence that borders on arrogance. Yet, beneath this fluency lies an uncomfortable truth: many of these models still struggle with seeing the right thing. ...

December 14, 2025 · 4 min · Zelina
Cover image

Tunnel Vision, Literally: When Cropping Makes Multimodal Models Blind

Opening — Why this matters now Multimodal Large Language Models (MLLMs) can reason, explain, and even philosophize about images—until they’re asked to notice something small. A number on a label. A word in a table. The relational context that turns a painted line into a parking space instead of a traffic lane. The industry’s default fix has been straightforward: crop harder, zoom further, add resolution. Yet performance stubbornly plateaus. This paper makes an uncomfortable but important claim: the problem is not missing pixels. It’s missing structure. ...

December 14, 2025 · 3 min · Zelina
Cover image

ImplicitRDP: When Robots Stop Guessing and Start Feeling

Opening — Why this matters now Robotic manipulation has always had a split personality. Vision plans elegantly in slow motion; force reacts brutally in real time. Most learning systems pretend this tension doesn’t exist — or worse, paper over it with handcrafted hierarchies. The result is robots that see the world clearly but still fumble the moment contact happens. ...

December 13, 2025 · 4 min · Zelina
Cover image

Seeing Green: When AI Learns to Detect Corporate Illusions

Seeing Green: When AI Learns to Detect Corporate Illusions Oil and gas companies have long mastered the art of framing—selectively showing the parts of reality they want us to see. A commercial fades in: wind turbines turning under a soft sunrise, a child running across a field, the logo of an oil major shimmering on the horizon. No lies are spoken, but meaning is shaped. The message? We care. The reality? Often less so. ...

October 31, 2025 · 4 min · Zelina
Cover image

Beyond Words: Teaching AI to See and Fix Charts with ChartM3

When you tell an AI, “make the third bar blue,” what does it actually see? If it’s a typical large language model (LLM), it doesn’t really see anything. It parses your instruction, guesses what “third bar” means, and fumbles to write chart code—often missing the mark. ChartM$^3$ (Multimodal, Multi-level, Multi-perspective) changes the game. It challenges AIs to not only read and write code but also visually comprehend what a user points at. With 1,000 human-curated chart editing tasks and 24,000 training examples, this new benchmark sets a higher bar—one that demands both verbal and visual fluency. ...

July 30, 2025 · 4 min · Zelina