Cover image

Seeing Is Not Thinking: Teaching Multimodal Models Where to Look

Opening — Why this matters now Multimodal models can answer visual questions with alarming confidence. They can also be catastrophically wrong while sounding perfectly reasonable. The uncomfortable truth is that many vision–language models succeed without actually seeing what matters. They talk first. They look later—if at all. The paper behind LaViT puts a name to this failure mode: the Perception Gap. It is the gap between saying the right thing and looking at the right evidence. And once you see it quantified, it becomes hard to ignore. ...

January 18, 2026 · 4 min · Zelina
Cover image

Latent Brilliance: Turning LLMs into Creativity Engines

What if we stopped asking language models to “be creative”—and instead let them explore creativity the way humans brainstorm: by remixing ideas, nudging boundaries, and iterating through meaningful variations? That’s exactly what Large Language Models as Innovators proposes: a novel framework that leverages the latent embedding space of ideas—not prompts—to drive controlled, domain-agnostic creativity. Rather than relying on handcrafted rules or complex prompting tricks, the authors show how LLMs can generate original and relevant ideas by interpolating between known concepts, evaluating results, and refining outputs over time. ...

July 21, 2025 · 3 min · Zelina