Opening — Why this matters now
RAG was supposed to make LLMs safer. Instead, it quietly became a liability.
As enterprises rushed to bolt retrieval layers onto large language models, they unintentionally created a new attack surface: sensitive internal data flowing straight into a model that cannot reliably distinguish instructions from content. Prompt injection is not a corner case anymore—it is the default threat model. And telling the model to “behave” has proven to be more of a suggestion than a guarantee.
This paper introduces SD‑RAG, a framework that stops pleading with the model and starts enforcing discipline upstream. The idea is almost offensively simple: if the model never sees sensitive information, it can’t leak it.
Background — Context and prior art
Traditional RAG pipelines retrieve relevant chunks, concatenate them with the user query, and hand the whole bundle to an LLM. Privacy controls—when present at all—are embedded as instructions inside the same prompt.
This design fails for structural reasons:
- LLMs do not have hard separation between data and instructions.
- Prompt injection can override or neutralize embedded privacy rules.
- Redaction performed at generation time collapses under adversarial prompting.
Prior work attempted to mitigate this with monolithic prompts, differential privacy, RBAC filters, or trusted-user assumptions. Each helps, none solves the core problem. If unredacted content enters the prompt, leakage is always one clever prompt away.
Analysis — What SD‑RAG does differently
SD‑RAG reframes selective disclosure as a retrieval‑time governance problem, not a generation‑time behavior problem.
The architecture introduces three decisive shifts:
1. Privacy enforcement is decoupled from answering
Instead of asking the answering LLM to self‑censor, SD‑RAG inserts a dedicated redaction step before the user query is ever involved. Retrieved content is sanitized in isolation, using a separate redactor model and explicit constraints.
Once redacted, the content is safe by construction. Prompt injection may still succeed—but there is nothing left to steal.
2. Constraints are first‑class citizens
Privacy and security constraints are expressed in human‑readable natural language and stored alongside data in a graph structure. Each constraint is semantically bound to the chunks it governs.
This enables:
- Dynamic policy updates without re‑indexing the corpus
- Fine‑grained, context‑aware redaction
- Governance that evolves as regulations and business rules change
3. Policy‑aware retrieval, not blind filtering
At query time, SD‑RAG retrieves not only relevant content but also the relevant constraints, ranking them with embedding‑based strategies. Redaction is applied after retrieval, before generation—the narrowest and safest choke point in the pipeline.
Two redaction modes are supported:
| Mode | Mechanism | Trade‑off |
|---|---|---|
| Extractive | Mask sensitive spans | Higher privacy, lower completeness |
| Periphrastic | Constraint‑guided paraphrasing | Better readability, slightly weaker guarantees |
Findings — What the experiments show
The results are unambiguous.
Prompt injection resilience
Under adversarial prompting, traditional monolithic RAG collapses. SD‑RAG does not.
| Strategy | Injection | Privacy Score | Completeness |
|---|---|---|---|
| Baseline RCQA | Yes | ~0.20 | ~0.58 |
| SD‑RAG (Periphrastic) | Yes | ~0.59 | ~0.58 |
| SD‑RAG (Extractive) | Yes | ~0.78 | ~0.58 |
In the worst case, SD‑RAG improves privacy by up to 58% compared to the baseline—because the model never receives sensitive content in the first place.
Constraint retrieval quality
Constraint re‑ranking matters. Averaging constraint‑chunk similarity, weighted by query relevance, delivers the best recall. In privacy systems, false negatives are more dangerous than false positives—and SD‑RAG optimizes accordingly.
Latency impact
Yes, SD‑RAG adds overhead. Redaction requires additional LLM calls. But the increase is measured in milliseconds—not architectural complexity or operational risk.
This is a trade‑off most regulated industries should happily accept.
Implications — Why this changes how we build RAG
SD‑RAG quietly undermines a popular but flawed belief: that alignment prompts and behavioral tuning can secure enterprise LLMs.
They can’t.
Security emerges from architecture, not instructions. SD‑RAG demonstrates that selective disclosure must be enforced where data enters the system, not where text exits it.
For businesses, this has immediate consequences:
- RAG systems can be deployed in regulated domains without trusting the user—or the model
- Privacy policies become editable assets, not hard‑coded prompts
- Compliance shifts from model behavior to system design
Conclusion — Sanitization beats alignment
SD‑RAG is not flashy. It does not promise smarter models or better reasoning. It does something far more valuable: it acknowledges what LLMs are bad at, and designs around it.
By moving privacy enforcement upstream, SD‑RAG turns prompt injection from a catastrophic failure into a contained nuisance. That is what real AI governance looks like—not hoping the model behaves, but ensuring it cannot misbehave.
Cognaptus: Automate the Present, Incubate the Future.