The Phantom Menace in Your Knowledge Base

TL;DR for operators

The paper’s core warning is simple: a RAG system may not be reading the same document your employee just approved. A PDF, HTML page, or DOCX file can look clean to a human reviewer while carrying hidden text, altered Unicode, poisoned fonts, or layout tricks that a document loader still extracts.

That makes the data loader a security boundary, not a plumbing detail. The authors introduce PhantomText, a toolkit for creating stealthy document manipulations, and use it to test five popular loader ecosystems: Docling, Haystack, LangChain, LlamaIndex, and LLMSherpa. Across the loader-level evaluation, they report a 74.4% overall attack success rate. DOCX is especially exposed on injection attacks, font poisoning and homoglyph characters are highly effective, and metadata injection is mostly a damp firework.

The business risk is not limited to dramatic jailbreaks. The paper’s taxonomy maps attacks to confidentiality, integrity, and availability: distorted facts, outdated knowledge, vague answers, empty refusals, token inflation, pipeline crashes, biased responses, and sensitive data disclosure. In other words, the damage can look like “the assistant was unhelpful today” rather than “we have been attacked.” Convenient for the attacker. Less so for the incident report.

The practical response is ingestion security. Enterprises should add Unicode normalisation, zero-width and bidirectional character scanning, hidden-style detection, render-versus-parse comparison, provenance checks, loader-specific red-team tests, and selective OCR where the risk justifies the cost. OCR blocks many invisibility-based tricks, according to the paper, but it adds overhead, can introduce transcription errors, and misses some attacks. So, no, OCR is not a magic priest for possessed PDFs.

The boundary: these are controlled experiments, not a universal audit of every production RAG deployment. Some attacks fail. Some systems resist certain targeted scenarios. NotebookLM is notably resilient in several of the CIA-oriented attacks. The lesson is not that every RAG system is doomed; it is that many RAG systems are trusting the least glamorous part of the stack far too much.

The document looked clean. The parser disagreed.

A compliance officer uploads a vendor policy. A legal associate adds a contract annex. A support team syncs a supplier’s product manual into the company knowledge base. Someone checks the file first. It opens. It looks normal. No suspicious paragraph. No awkward prompt injection in white text at the bottom of the page. Everyone moves on.

The RAG system, however, is not reading the document as a human sees it. It is reading whatever the loader extracts.

That difference is the whole paper.

In The Hidden Threat in Plain Text: Attacking RAG Data Loaders, Alberto Castagnaro and co-authors move the security discussion away from the model prompt and toward the ingestion layer: the document parsing stage that turns files into chunks, embeddings, and eventually retrievable knowledge.¹ Their central claim is not that RAG can be attacked. We already knew that, and the industry has produced enough prompt-injection demos to wallpaper a small airport. Their sharper claim is that a RAG knowledge base can be poisoned before the LLM is ever asked anything.

The mechanism is ordinary in the most irritating way. Modern document formats are not plain text. PDF, HTML, and DOCX can carry visible text, invisible text, metadata, layout instructions, styling attributes, embedded fonts, off-page elements, hidden XML properties, and Unicode characters that change how machines interpret strings. Humans inspect the rendered surface. Loaders process structure. Attackers live in the gap.

The paper calls its toolkit PhantomText. The name is a little theatrical, but the threat is not. PhantomText generates poisoned documents using stealth techniques that either hide existing content from the parser, inject new content into the parser, or make one layer of the document say one thing while another layer says something else. Once that poisoned text is extracted, it can be embedded into the vector store like any other knowledge. The RAG pipeline then treats it as context. Garbage in, authoritative garbage out.

The overlooked security boundary is the loader, not the chatbot

Most enterprise RAG diagrams quietly glide over ingestion. They show documents entering a pipeline, being chunked, embedded, stored, retrieved, and passed to an LLM. The interesting boxes are usually the model, the retriever, the vector database, and the application interface. The loader is treated as a converter. A necessary appliance. The toaster of the AI kitchen.

The paper argues that this is exactly backwards for security.

A RAG system depends on the fidelity of several translations:

The human-readable file becomes machine-readable text.
The text becomes chunks.
The chunks become embeddings.
The embeddings become retrieved context.
The context becomes an answer.

If the first translation is corrupted, the rest of the pipeline can behave perfectly and still produce poisoned output. That is what makes the attack uncomfortable. It does not require model weights to be compromised. It does not require database access. It does not require a user to paste a malicious prompt into the chat window. It only requires a document that passes ordinary human inspection but behaves differently under parsing.

The authors structure the attack space around two broad technique families.

Content obfuscation corrupts or hides existing document content. Examples include zero-width characters inserted into words, homoglyph substitutions that replace familiar Latin letters with visually similar Unicode characters, bidirectional reordering, diacritical mark stacking, and OCR-targeted perturbations. The human still sees something plausible. The parser or downstream tokenizer may not.

Content injection inserts new machine-readable content that remains invisible or unobtrusive to a human reviewer. Examples include text placed behind images, off-page text, transparent text, zero-size or near-zero-size fonts, hidden DOCX properties, malicious metadata, and font-level tricks.

Font poisoning is the most elegant nuisance. A custom font can render a glyph that looks like one character while the underlying code maps to another. To the reviewer, the file appears benign. To the parser, the text can resolve into a different string. That is less “prompt hacking” and more “the document is lying at the typographic layer.” Naturally, software was not designed for this level of melodrama.

The taxonomy is useful because the damage is not always hallucination

The paper’s first contribution is a taxonomy of nine RAG knowledge-base poisoning attacks mapped to the confidentiality, integrity, and availability triad. This matters because “RAG poisoning” is too vague to be operational. Security teams need to know whether the attack breaks access, corrupts answers, leaks data, burns compute, or merely makes the system useless with a straight face.

Attack class	What changes in the RAG system	Business interpretation
Pipeline failure	The ingestion or retrieval pipeline crashes or destabilises	Availability risk; document upload becomes a denial-of-service path
Reasoning overload	Retrieved content forces excessive reasoning or token generation	Cost and latency risk, especially for reasoning-heavy models
Unreadable output	Responses contain gibberish, corrupted symbols, or unusable text	Service quality collapse without a clean system failure
Empty statement response	The assistant refuses or gives generic non-answers	Availability-by-politeness; the system responds but does not help
Vague output	The assistant produces non-committal, weakly grounded answers	Decision quality risk hidden as cautious language
Bias injection	Retrieved content nudges the answer toward a preferred view	Governance and reputational risk
Factual distortion	The answer includes false or fabricated facts	Integrity risk; the classic RAG failure, but with a document-level cause
Outdated knowledge	Recent facts are hidden or obsolete facts are retrieved	Operational risk in policy, product, legal, or compliance workflows
Sensitive data disclosure	Private information becomes retrievable or is induced into output	Confidentiality risk and potential regulatory exposure

The useful move here is that the authors do not reduce poisoning to misinformation. In enterprise settings, a bad answer is not always obviously false. Sometimes it is oddly vague. Sometimes it says the information is unavailable. Sometimes it gives yesterday’s policy as if yesterday had tenure. Sometimes it spends five times as many tokens thinking about a problem it should never have been asked to solve.

That range matters. Organisations often monitor RAG quality through accuracy samples, user ratings, or escalation rates. Those catch some factual failures. They may not catch systematic vagueness, hidden token inflation, or poisoned refusals. A model that politely says “I cannot answer that” can look safe while being quietly broken. Very corporate.

PhantomText tests whether invisible content survives the boring part

The paper’s second contribution is PhantomText, a toolkit for generating poisoned documents across PDF, HTML, and DOCX. The authors use it to implement 19 stealthy document manipulation techniques. The techniques are not all equally exotic. Some rely on Unicode. Some rely on CSS or document styling. Some rely on layout. Some rely on metadata. Some rely on custom fonts.

This is important because the paper is not simply saying, “Here is one clever trick.” It is testing a family of tricks against the ingestion software that developers actually use.

The first experiment targets data loaders directly. The authors evaluate Docling, Haystack, LangChain, LlamaIndex, and LLMSherpa, with 21 parser configurations across PDF, HTML, and DOCX. They build a dataset using 100 random samples from Amazon Reviews’23, then generate poisoned documents by applying their techniques across the supported formats. The resulting evaluation contains 4,200 documents and 35,900 loader-level tests.

The evaluation criterion is deliberately mechanical. For obfuscation, an attack succeeds if the target word is absent from the parsed text. For injection, it succeeds if the injected synthetic word appears in the parsed output. The metric is attack success rate:

$$ ASR = \frac{Success}{Success + Failed} $$

This is not a measure of whether the final chatbot answer was fooled. It is a lower-level ingestion test. That distinction matters. Experiment 1 asks: can the poisoned content survive the loader and enter the machine-readable representation? It is main evidence for the loader vulnerability, not yet evidence that a business decision will be corrupted downstream.

The answer is not comforting. The paper reports an overall loader-level attack success rate of 74.4%. Content obfuscation reaches 76.7%; content injection reaches 72.4%. The authors also report that a large share of tested configurations exceed 95% success.

The distribution is more useful than the headline number.

Finding	Evidence from the paper	Practical reading	Boundary
Loader-level vulnerability is broad	Overall ASR is 74.4% across tested loader scenarios	Treat ingestion as a security boundary, not a neutral conversion step	This is not the same as end-to-end answer corruption
File format matters	DOCX injection ASR is 0.89; HTML injection is 0.63; PDF injection is 0.72	DOCX workflows deserve special scrutiny when files come from external or semi-trusted sources	Format risk depends on parser and configuration
Loader choice matters	All tested loaders exceed 0.6 ASR in aggregate; LangChain is highest in the paper’s summary, LlamaIndex lowest	Default loader selection affects security posture	Results are version- and parser-specific
Technique matters more than slogans	Font poisoning and homoglyphs reach 100% loader-level success; metadata mostly fails except moderate LangChain exposure	Red-team with concrete techniques, not generic “poisoning” prompts	Some techniques only apply to some formats
OCR helps but is not complete	OCR-based ingestion blocks more than 90% of invisibility-based attacks in the defence discussion	Use OCR selectively for high-risk ingestion paths	OCR adds cost, errors, and misses some attack types

The sharpest result is not that one tool is bad or another is good. Tool rankings will change with versions, parser settings, and deployment choices. The durable lesson is that document loaders do not share a single interpretation of document reality. They extract different things. Attackers can exploit that diversity. Defenders who say “we use a popular loader” have not actually said much.

The end-to-end tests separate ingestion success from model impact

Experiment 2 asks the obvious next question: if a poisoned document enters the knowledge base, does the RAG system’s final answer actually change?

This experiment is doing a different job from Experiment 1. It is not another loader benchmark. It is a propagation test: poisoned file → retrieved context → generated response. The paper tests six end-to-end RAG setups. Three are white-box systems using Llama 3.2, Gemma 3, and DeepSeek R1 with a local retriever. Three are black-box systems: OpenAI Assistants using GPT-4o and o3-mini, plus NotebookLM based on Gemini 2.0 Flash.

The dataset is intentionally simple. The authors use a fictional company so that the models do not rely on prior world knowledge. They first confirm baseline behaviour with benign documents and queries; the systems answer correctly with 100% accuracy in that quality check. Then they create 14 poisoned PDF documents, one for each PDF PhantomText technique, using obfuscation to hide information and injection to add information about a fictional competitor. They run 840 attack tests.

This is where the paper becomes more interesting than the average “RAG is vulnerable” claim. Loader poisoning does not automatically imply downstream failure. The retriever may not retrieve the poisoned chunk. The generator may ignore it. A black-box system may include filtering or post-processing. A model may be robust to a particular corruption. The full stack has several chances to avoid embarrassment. Several of them decline.

The strongest techniques—camouflage elements, font poisoning, out-of-bound text, transparent text, and zero-size font variants—often produce full attack success across systems in the paper’s Figure 3. Metadata injection has no effect in the end-to-end setting. Homoglyphs and reordering show mixed results: some systems are highly susceptible, others are barely affected.

This is the key operational point: the risk profile is not binary. It is not “RAG secure” versus “RAG insecure.” It is technique × format × loader × retriever × model × system prompt × platform filtering. That is annoying, but also useful. It means security testing can be targeted.

A company does not need to declare metaphysical war on all documents. It needs to know which document types it ingests, which loaders parse them, which parser outputs are embedded, how hidden content is handled, and whether the final assistant follows poisoned chunks when they are retrieved. Less glamorous than buying a new model. More likely to work.

The CIA-oriented attacks show how small parsing tricks become business failures

Experiment 3 is the paper’s scenario test. Its purpose is not merely robustness; it maps the earlier taxonomy onto targeted attacks against full RAG systems.

The authors use political biography datasets and craft scenarios for the nine attack categories. They test both a single poisoned document and a larger setting with 100 documents, where only one is poisoned. This distinction is helpful. A single poisoned document tests maximum attacker control. The 100-document setting asks whether a poison document can still matter when surrounded by legitimate material.

The evaluation uses an LLM-as-judge for most categories, with exceptions: unreadable output is manually analysed, and sensitive data disclosure uses substring matching for leaked information. The authors sample 100 outputs to validate the judge and report only three mislabels, with a 99% Wilson interval for accuracy of [0.8891, 0.9923]. That does not make the judge perfect, but it gives readers a reason not to dismiss the scenario results as vibes with a spreadsheet.

The results are uneven, and that unevenness is the point.

Pipeline failure appears only in the white-box RAG pipelines. The authors used a very large PDF with out-of-bound text, and they explicitly note that this is simple to detect. That makes it a useful implementation warning, not proof that commercial RAG services are generally easy to crash with oversized files.

Reasoning overload applies to reasoning models in the tested setup. DeepSeek R1 shows token-increase factors of 1.25 and 2.04 depending on the document setting, while o3-mini shows 4.90 and 3.48. For operators, this is less about one model name and more about a cost class: malicious retrieved content can make a reasoning model spend real money thinking about attacker-provided nonsense. A billable existential crisis, if you like those.

Several integrity attacks perform strongly. Vague output and outdated knowledge are broadly successful in the table. Factual distortion succeeds across the white-box systems and the OpenAI assistant configurations, while NotebookLM shows 0/10 in both one-document and 100-document settings for that scenario. Bias injection also varies: NotebookLM is resilient in the one-document condition but susceptible in the 100-document setting.

Sensitive data disclosure is mixed. DeepSeek R1 and Gemma 3 show 10/10 in both settings; Llama and GPT-4o show 0/10; o3-mini shows 8/10 and 9/10. NotebookLM is not evaluated for that row in the table. Again, this is not a neat leaderboard. It is evidence that confidentiality exposure can depend heavily on the specific RAG product and implementation.

The business reading is not “NotebookLM safe, others unsafe” or “white-box bad, black-box good.” The more disciplined reading is that black-box systems can have stronger internal defences in some categories while still remaining vulnerable in others. Enterprise buyers do not get to inspect those defences. They get vendor claims, configuration options, and their own test results. Cheerful.

The paper’s best business lesson is boring: sanitise before embedding

For enterprise teams, the most useful part of the paper is not the attack theatre. It is the defence section.

Many of these attacks leave detectable traces. Diacritical stacking creates abnormal Unicode patterns. Homoglyph substitutions introduce characters from unexpected Unicode blocks. Zero-width characters can be scanned and removed. Bidirectional control characters can be flagged when they appear mid-word or mid-token. Invisible content can often be detected by parsing style attributes: colour, opacity, visibility, vanished text, font size, hidden tags, or bounding boxes. Out-of-bound text can be caught by layout analysis.

In other words, the first line of defence is not another alignment lecture for the LLM. It is a document hygiene layer.

For a production RAG system, a minimal ingestion security workflow should include:

Control	What it catches	Where it belongs
Unicode normalisation and character-block checks	Homoglyphs, diacritical abuse, suspicious mixed scripts	Before chunking
Zero-width and bidirectional character scanning	Invisible token disruption and reordering tricks	Before embedding
Render-versus-parse comparison	Text visible to parser but not to human reviewer, or vice versa	During document validation
Style and layout inspection	Transparent text, zero-size text, hidden DOCX properties, out-of-bound objects	Loader pre-processing
Metadata policy	Malicious metadata fields, hidden HTML head content, document properties	Ingestion filter
Source provenance and trust tiers	Supply-chain and public-source poisoning	Knowledge-base governance
Loader-specific red-team tests	Parser-dependent vulnerabilities	Pre-deployment and after upgrades
Selective OCR	Many invisibility-based attacks	High-risk document paths

The “selective” in selective OCR is doing work. The paper reports that OCR-based ingestion using EasyOCR and Tesseract blocks more than 90% of invisibility-based attacks, including transparent and out-of-bound text. That is strong evidence for OCR as a defensive layer.

But OCR is not free. It increases compute cost, slows ingestion, and can introduce transcription errors. It also misses some attacks, including diacritical manipulations and zero-size font injections. So the right answer is not “OCR everything forever.” The right answer is risk-tiered ingestion: use OCR or render-based validation for high-risk external files, legal or compliance documents, public-web ingestion, vendor manuals, and knowledge sources that directly affect decisions.

Internal low-risk material may only need lightweight sanitisation. Public or adversarially exposed content deserves heavier inspection. A RAG system that treats all documents equally is not being fair. It is being naive at scale.

What Cognaptus infers for enterprise RAG governance

The paper directly shows that many loader configurations and several full RAG systems are vulnerable to invisible document manipulations under controlled conditions. It also directly shows that attack success varies substantially. Those are empirical claims.

The business inference is broader: RAG governance needs to extend upstream into document intake.

Most organisations are currently better at governing answers than governing knowledge ingestion. They add disclaimers, review outputs, tune prompts, evaluate answer quality, restrict user permissions, and maybe run retrieval audits. Those are useful. They are also downstream. If the knowledge base itself is poisoned, the assistant can produce a polished answer from corrupted context, and the surface behaviour may look entirely normal.

A better governance model treats the knowledge base like a production data asset, not a folder with embeddings attached.

That means three practical shifts.

First, ingestion should have a trust model. Documents from internal controlled systems, known vendors, public websites, email attachments, shared drives, and user uploads do not deserve the same path into the vector store. Public web ingestion and third-party documentation are supply-chain surfaces. The paper explicitly includes supply-chain, web-based poisoning, and insider threats in its threat model. That is not paranoid; it is what happens when a chatbot becomes the interface to institutional memory.

Second, loader output should be auditable. Teams should store and inspect the extracted text, not only the original file and final chunks. If the rendered document says one thing and the parsed text says another, the extracted text is the version the RAG system will believe. Governance that only reviews the PDF is reviewing the wrong artefact.

Third, RAG evaluation should include adversarial documents, not just adversarial prompts. Prompt-injection testing asks whether the model obeys malicious instructions at query time. PhantomText-style testing asks whether the knowledge base can be quietly contaminated before query time. Both matter. Only testing the former is like locking the front door while accepting anonymous parcels directly into the server room.

The boundaries: this is a serious warning, not a universal doom score

There are several limits to keep straight.

The loader-level experiments use a controlled dataset built from Amazon Reviews’23 samples. The authors justify this because loader parsing is syntactic and structural rather than domain-specific. That is reasonable for testing whether hidden text survives ingestion. It does not mean the exact same success rates will appear in every legal, medical, financial, engineering, or customer-support corpus.

Experiment 2 focuses on PDF for simplicity, although the authors argue that results can generalise to DOCX and HTML. The first experiment gives evidence across all three formats, but downstream system behaviour may differ by format in real deployments.

The full RAG systems are tested with specific configurations, prompts, models, retrieval settings, and platform behaviours. Those details matter. A different parser, chunking policy, top-$k$ retrieval setting, safety layer, file-size limit, or preprocessing step could change outcomes. So could model upgrades. This is AI infrastructure, after all; the floor moves while you are labeling the floor.

Some attack techniques fail. Metadata injection is mostly ineffective in the experiments. NotebookLM is resilient in several CIA-oriented scenarios, including factual distortion and empty statement response. Pipeline failure is demonstrated only against the white-box pipelines and with a crude oversized file vector. These are not minor details. They prevent the paper from becoming a cartoon.

The defence section is also partly a roadmap. Lightweight sanitisation is plausible and often cheap, but comprehensive document security is hard because legitimate documents also use rich formatting, multilingual Unicode, metadata, and accessibility features. A defence that strips everything unusual may break real documents. A defence that allows everything unusual may embed a ghost. The fun, as usual, is in the risk trade-off nobody wanted.

The operational takeaway is to secure the text before it becomes “knowledge”

RAG has made it easy to treat documents as live organisational memory. That is powerful. It is also a strange act of trust. The moment a file is parsed, chunked, embedded, and indexed, it stops being “a document someone uploaded” and becomes “context the assistant can cite, summarise, and act on.” That transition deserves security controls.

PhantomText is valuable because it makes the invisible part of that transition visible. It shows that the difference between rendered text and parsed text is not a theoretical nuisance. It is an attack surface. It also shows that model strength is not a substitute for ingestion discipline. A capable model can still be confidently wrong when fed poisoned context. Very senior behaviour, frankly.

For operators, the action list is clear enough:

inspect the extracted text, not only the original document;
normalise and sanitise Unicode before chunking;
detect hidden, off-page, transparent, vanished, and zero-size text;
separate high-risk and low-risk ingestion paths;
test the actual loader stack with adversarial documents;
use OCR or render-based validation where the risk justifies the cost;
monitor for vague, empty, outdated, or unusually expensive responses, not only false facts.

The most dangerous part of a RAG system may not be the answer box. It may be the quiet little parser that everyone assumed was just doing paperwork.

And paperwork, as every bureaucracy has already demonstrated, can absolutely ruin your day.

Cognaptus: Automate the Present, Incubate the Future.

Alberto Castagnaro, Umberto Salviati, Mauro Conti, Luca Pajola, and Simeone Pizzi, “The Hidden Threat in Plain Text: Attacking RAG Data Loaders,” arXiv:2507.05093, 2025. ↩︎

TL;DR for operators#

The document looked clean. The parser disagreed.#

The overlooked security boundary is the loader, not the chatbot#

The taxonomy is useful because the damage is not always hallucination#

PhantomText tests whether invisible content survives the boring part#

The end-to-end tests separate ingestion success from model impact#

The CIA-oriented attacks show how small parsing tricks become business failures#

The paper’s best business lesson is boring: sanitise before embedding#

What Cognaptus infers for enterprise RAG governance#

The boundaries: this is a serious warning, not a universal doom score#

The operational takeaway is to secure the text before it becomes “knowledge”#