The Phantom Menace in Your Knowledge Base
TL;DR for operators The paper’s core warning is simple: a RAG system may not be reading the same document your employee just approved. A PDF, HTML page, or DOCX file can look clean to a human reviewer while carrying hidden text, altered Unicode, poisoned fonts, or layout tricks that a document loader still extracts. ...