Opening — Why this matters now
Retrieval-Augmented Generation (RAG) was supposed to fix the most embarrassing flaw of large language models: confident nonsense. Give the model access to fresh data, ground its answers in reality, and suddenly hallucinations become… manageable.
Unfortunately, reality is also writable.
As enterprises rush to deploy RAG systems—customer support copilots, internal knowledge assistants, financial research tools—they are quietly expanding their attack surface. Not just the model, but the data pipeline. Not just prompts, but retrieval.
The paper introduces a rather inconvenient idea: if you control both what the model sees and how it looks for it, you don’t need to know the user’s question to control the answer.
That’s not a bug. That’s a business risk.
Background — Context and prior art
RAG systems sit on a deceptively simple architecture:
| Component | Function | Weakness |
|---|---|---|
| Database | Stores external knowledge | Can be poisoned |
| Retriever | Finds relevant documents | Can be steered |
| Generator (LLM) | Produces final answer | Can be manipulated |
Historically, attacks targeted one layer at a time:
1. Prompt Injection (Control Layer)
- Manipulates the input query
- Attempts to override instructions
- Weak in RAG because retrieved context can contradict it
2. Data Poisoning (Data Layer)
- Inserts malicious documents into the database
- Requires knowing the exact user query
- High precision, low flexibility
Both approaches work. Neither scales well in the real world.
Which brings us to the obvious next step: stop choosing.
Analysis — What the paper actually does
The proposed method—PIDP-Attack—is not technically complex. That’s precisely why it’s dangerous.
It combines two levers:
1. Query-Path Injection (Steer the search)
A malicious suffix is appended to any user query:
- Embeds a hidden target question
- Shifts semantic similarity
- Tricks the retriever into fetching attacker-controlled content
2. Database Poisoning (Control the evidence)
A small set of crafted passages is inserted into the corpus:
- Each passage is aligned with the hidden target question
- Each supports a wrong but plausible answer
Together, they form a feedback loop:
| Step | Mechanism | Effect |
|---|---|---|
| 1 | Inject suffix into query | Alters embedding |
| 2 | Retriever selects poisoned docs | Context is compromised |
| 3 | LLM reads “evidence” | Output is biased |
| 4 | Final answer | Matches attacker target |
The elegance lies in one detail: the attacker does not need to know the user’s question.
The system effectively answers a different question—one the attacker chose in advance.
Findings — Results with visualization
The results are, in a word, uncomfortable.
Attack Success Rate (ASR)
| Method | Typical ASR Range |
|---|---|
| Prompt-only | ~0% |
| Poison-only | ~0–1% |
| Retrieval steering (GGPP) | ~60–90% |
| Targeted poisoning | ~80–97% |
| PIDP (combined) | ~90–100% |
The compound attack consistently outperforms all single-vector approaches.
Why it works
The paper distinguishes two key metrics:
| Metric | Meaning | Insight |
|---|---|---|
| Retrieval F1 | Whether poisoned docs are retrieved | Necessary but not sufficient |
| ASR | Whether final answer is manipulated | True impact |
High ASR requires both:
- Poisoned content must be retrieved
- Model must trust that content
PIDP ensures both conditions simultaneously.
Budget Sensitivity
| Parameter | Observation |
|---|---|
| Poison budget (n) | As few as 2–5 documents can achieve >95% ASR |
| Context size (k) | Larger k can dilute or amplify attack |
Notably, increasing context size does not reliably improve safety. Sometimes it makes things worse.
A rare moment where “more data” is not the solution.
Implications — Next steps and significance
1. RAG security is not modular
You cannot secure the model and ignore the data pipeline.
You cannot secure the database and ignore the query path.
The system is only as strong as its weakest interface—and RAG has several.
2. Query is now a control channel
In traditional systems, input is data.
In RAG systems, input is policy.
If queries can be rewritten (via plugins, middleware, proxies), then the attacker has effectively gained partial system control.
3. Data ingestion becomes a governance problem
Allowing external or semi-trusted data into your retrieval corpus is no longer just a quality issue—it’s a security vulnerability.
This shifts responsibility from engineering to data governance:
| Risk Area | Required Control |
|---|---|
| Data ingestion | Provenance verification |
| Query handling | Sanitization & segmentation |
| Retrieval output | Instruction filtering |
4. “Explainability” can backfire
RAG systems are often praised because they show their sources.
Unfortunately, if the sources are poisoned, explainability becomes justification for misinformation.
The model is not hallucinating.
It is citing.
Conclusion — Wrap-up
PIDP-Attack exposes an uncomfortable truth about modern AI systems: grounding a model in external data does not guarantee truth—it merely relocates the problem.
From weights to workflows.
From hallucination to manipulation.
The real takeaway is not that RAG is broken. It’s that RAG is incomplete.
Until systems treat queries as untrusted, data as adversarial, and retrieval as a security boundary—not just a performance optimization—we are building pipelines that can be quietly, consistently, and scalably wrong.
And worse, they will sound perfectly reasonable while doing it.
Cognaptus: Automate the Present, Incubate the Future.