Multimodal AI

Fake News Feels Different: How SEER Uses Emotion and Semantics to Spot Deception

The latest advancement in fake news detection doesn’t just analyze what is said—it also looks at how it feels. The SEER model (Semantic Enhancement and Emotional Reasoning Network) introduces an innovative approach that harnesses emotional reasoning and semantic depth to surpass existing benchmarks in multimodal fake news detection. 🧠 Beyond Consistency: The Emotional Gap in Fake News Traditionally, models focus on image-text consistency: does the photo match the caption? But this misses the forest for the trees. Fake news isn’t just mismatched—it’s emotionally manipulative. ...

Tunnel Vision: Why Vision-Language Models Still Miss the Bigger Picture

It’s no secret that Vision-Language Models (VLMs) have dazzled us with their prowess—excelling at image captioning, chart understanding, and even medical diagnostics. But beneath the glitter of benchmark wins, a deeper flaw lurks: these models often suffer from what Berman and Deng (Princeton) have sharply diagnosed as “tunnel vision.” Their new paper, VLMs Have Tunnel Vision, introduces a battery of tasks that humans can breeze through but that leading VLMs—from Gemini 2.5 Pro to Claude Vision 3.7—fail to solve even marginally above chance. These tasks aren’t edge cases or contrived puzzles. They simulate basic human visual competencies like comparing two objects, following a path, and making discrete visual inferences from spatially distributed evidence. The results? A sobering reminder that state-of-the-art perception doesn’t equate to understanding. ...

Sketching a Thought: How Mental Imagery Could Unlock Autonomous Machine Reasoning

From Reaction to Reflection Modern AI models, especially language models, are stunningly capable at answering our queries. But what happens when there is no query? Can an AI reason about the world not just in reaction to prompts, but proactively — triggered by internal goals, simulated futures, and visual imagination? That’s the central question Slimane Larabi explores in his latest paper: “Can Mental Imagery Improve the Thinking Capabilities of AI Systems?” ...

Sound and Fury Signifying Stock Picks

In an age where TikTok traders and YouTube gurus claim market mastery, a new benchmark dataset asks a deceptively simple question: Can AI tell when someone really believes in their own stock pick? The answer, it turns out, reveals not just a performance gap between finfluencers and index funds, but also a yawning chasm between today’s multimodal AI models and human judgment. Conviction Is More Than a Call to Action The paper “VideoConviction” introduces a unique multimodal benchmark composed of 288 YouTube videos from 22 financial influencers, or “finfluencers,” spanning over six years of market cycles. From these, researchers extracted 687 stock recommendation segments, annotating each with: ...