Pretty Text, Ugly Logic: When Image Models Learn to Write but Not to Reason
A slide looks finished. The headline is sharp, the equations are aligned, the answer box is confident, and the design has the mild corporate glow of something that has already been approved by three people who did not read it. That is exactly the problem. For years, text-to-image models failed in a wonderfully obvious way: they could not spell. A poster would say “Qaurterly Reveneu,” the mockup button would contain mystical glyphs, and everyone understood the output was decorative, not operational. Recent models have changed that. They can now place readable text inside images, produce document-like pages, and generate slide-like visual artifacts. The failure mode has become less funny and more expensive: the text may be readable, but the reasoning may be wrong. ...