From Seeing to Doing: Why Agentic AI Still Trips Over Reality
Tools do not make an agent; they make the failure more interesting Camera. Browser. Crop tool. Search engine. Python sandbox. That sounds like the beginning of an intelligent workflow. Give a multimodal model these tools, and it should move from merely seeing the world to actually doing something with it: zoom into the blurry sign, search the extracted clue, cross-check the result, and produce the answer. ...