OCR | Cognaptus

OCR and the City: Why Document AI Still Needs Eyes

A document lands in an intake queue. It might be an invoice, a memo, a form, a résumé, or one of those corporate artifacts whose layout says more than the words do. Someone wants the system to classify it instantly, because every downstream workflow—routing, extraction, compliance, archiving—depends on that first label. The fashionable answer is: send it to a large language model. Extract the text, paste it into a prompt, ask for one label, and let the machine be clever. This is attractive because it feels general. It is also how many automation projects quietly turn a visual problem into a text problem, then act surprised when the system starts calling file folders “proposals” because the word proposal appeared somewhere on the page. ...

Pretty Text, Ugly Logic: When Image Models Learn to Write but Not to Reason

A slide looks finished. The headline is sharp, the equations are aligned, the answer box is confident, and the design has the mild corporate glow of something that has already been approved by three people who did not read it. That is exactly the problem. For years, text-to-image models failed in a wonderfully obvious way: they could not spell. A poster would say “Qaurterly Reveneu,” the mockup button would contain mystical glyphs, and everyone understood the output was decorative, not operational. Recent models have changed that. They can now place readable text inside images, produce document-like pages, and generate slide-like visual artifacts. The failure mode has become less funny and more expensive: the text may be readable, but the reasoning may be wrong. ...

$Cover image$

When the Right Answer Is No Answer: Teaching AI to Refuse Messy Math

A scanned exam paper is not a polite input. It arrives bent, shadowed, annotated, folded, half-covered by a student’s handwriting, and occasionally photographed at an angle chosen by someone apparently in active conflict with geometry. For a human teacher, this is annoying. For a document AI system, it is more than annoying. It creates a dangerous fork in the road: extract what is visible, or admit that the question cannot be recovered. ...

Same Content, Different Worlds: Why Multimodal LLMs Still Disagree With Themselves

Screenshot. That is where many business workflows quietly change the problem. A support agent receives a screenshot of a customer bill instead of the billing table as text. A contract review tool receives a scanned clause instead of the clause extracted from the PDF. A procurement assistant receives a rendered purchase order, not the original form fields. Everyone involved assumes the content is the same. The model can read it. The OCR looks correct. The answer should be the same. ...

Eyeconomy: Fine-Tuned Vision Models for OCR in Emerging Markets

TL;DR for operators Paper invoices are not a nostalgia problem. They are a working-capital, tax-compliance, and operations problem wearing a thermal-printer costume. The operational case for fine-tuned vision models is not that they can “read documents” in the abstract. Plenty of systems can read clean documents under polite lighting. The case is that emerging-market business paperwork is local, messy, multilingual, photographed at bad angles, and shaped by tax rules that global OCR products do not treat as first-class citizens. ...

DeepSeek-V3

A multi-modal foundation model by DeepSeek AI, integrating vision and language for high-performance tasks including OCR, captioning, and visual reasoning.