Trace Evidence: When Vision-Language Models Fail Before They Fail
Opening — Why This Matters Now In an era where multimodal AI systems claim to reason, we still evaluate them like glorified calculators—checking whether the final answer matches the answer key. It’s convenient, comforting, and catastrophically misleading. A vision–language model (VLM) can arrive at a correct conclusion for all the wrong reasons, or worse, construct a beautifully fluent chain-of-thought that collapses under the slightest inspection. ...