When One Patch Rules Them All: Teaching MLLMs to See What Isn’t There
Opening — Why this matters now Multimodal large language models (MLLMs) are no longer research curiosities. They caption images, reason over diagrams, guide robots, and increasingly sit inside commercial products that users implicitly trust. That trust rests on a fragile assumption: that these models see the world in a reasonably stable way. The paper behind this article quietly dismantles that assumption. It shows that a single, reusable visual perturbation—not tailored to any specific image—can reliably coerce closed-source systems like GPT‑4o or Gemini‑2.0 into producing attacker‑chosen outputs. Not once. Not occasionally. But consistently, across arbitrary, previously unseen images. ...