SD‑RAG: Don’t Trust the Model, Trust the Pipeline

Opening — Why this matters now

RAG was supposed to make LLMs safer. Instead, it quietly became a liability.

As enterprises rushed to bolt retrieval layers onto large language models, they unintentionally created a new attack surface: sensitive internal data flowing straight into a model that cannot reliably distinguish instructions from content. Prompt injection is not a corner case anymore—it is the default threat model. And telling the model to “behave” has proven to be more of a suggestion than a guarantee.

This paper introduces SD‑RAG, a framework that stops pleading with the model and starts enforcing discipline upstream. The idea is almost offensively simple: if the model never sees sensitive information, it can’t leak it.

Background — Context and prior art

Traditional RAG pipelines retrieve relevant chunks, concatenate them with the user query, and hand the whole bundle to an LLM. Privacy controls—when present at all—are embedded as instructions inside the same prompt.

This design fails for structural reasons:

LLMs do not have hard separation between data and instructions.
Prompt injection can override or neutralize embedded privacy rules.
Redaction performed at generation time collapses under adversarial prompting.

Prior work attempted to mitigate this with monolithic prompts, differential privacy, RBAC filters, or trusted-user assumptions. Each helps, none solves the core problem. If unredacted content enters the prompt, leakage is always one clever prompt away.

Analysis — What SD‑RAG does differently

SD‑RAG reframes selective disclosure as a retrieval‑time governance problem, not a generation‑time behavior problem.

The architecture introduces three decisive shifts:

1. Privacy enforcement is decoupled from answering

Instead of asking the answering LLM to self‑censor, SD‑RAG inserts a dedicated redaction step before the user query is ever involved. Retrieved content is sanitized in isolation, using a separate redactor model and explicit constraints.

Once redacted, the content is safe by construction. Prompt injection may still succeed—but there is nothing left to steal.

2. Constraints are first‑class citizens

Privacy and security constraints are expressed in human‑readable natural language and stored alongside data in a graph structure. Each constraint is semantically bound to the chunks it governs.

This enables:

Dynamic policy updates without re‑indexing the corpus
Fine‑grained, context‑aware redaction
Governance that evolves as regulations and business rules change

At query time, SD‑RAG retrieves not only relevant content but also the relevant constraints, ranking them with embedding‑based strategies. Redaction is applied after retrieval, before generation—the narrowest and safest choke point in the pipeline.

Two redaction modes are supported:

Mode	Mechanism	Trade‑off
Extractive	Mask sensitive spans	Higher privacy, lower completeness
Periphrastic	Constraint‑guided paraphrasing	Better readability, slightly weaker guarantees

Findings — What the experiments show

The results are unambiguous.

Prompt injection resilience

Under adversarial prompting, traditional monolithic RAG collapses. SD‑RAG does not.

Strategy	Injection	Privacy Score	Completeness
Baseline RCQA	Yes	~0.20	~0.58
SD‑RAG (Periphrastic)	Yes	~0.59	~0.58
SD‑RAG (Extractive)	Yes	~0.78	~0.58

In the worst case, SD‑RAG improves privacy by up to 58% compared to the baseline—because the model never receives sensitive content in the first place.

Constraint retrieval quality

Constraint re‑ranking matters. Averaging constraint‑chunk similarity, weighted by query relevance, delivers the best recall. In privacy systems, false negatives are more dangerous than false positives—and SD‑RAG optimizes accordingly.

Latency impact

Yes, SD‑RAG adds overhead. Redaction requires additional LLM calls. But the increase is measured in milliseconds—not architectural complexity or operational risk.

This is a trade‑off most regulated industries should happily accept.

Implications — Why this changes how we build RAG

SD‑RAG quietly undermines a popular but flawed belief: that alignment prompts and behavioral tuning can secure enterprise LLMs.

They can’t.

Security emerges from architecture, not instructions. SD‑RAG demonstrates that selective disclosure must be enforced where data enters the system, not where text exits it.

For businesses, this has immediate consequences:

RAG systems can be deployed in regulated domains without trusting the user—or the model
Privacy policies become editable assets, not hard‑coded prompts
Compliance shifts from model behavior to system design

Conclusion — Sanitization beats alignment

SD‑RAG is not flashy. It does not promise smarter models or better reasoning. It does something far more valuable: it acknowledges what LLMs are bad at, and designs around it.

By moving privacy enforcement upstream, SD‑RAG turns prompt injection from a catastrophic failure into a contained nuisance. That is what real AI governance looks like—not hoping the model behaves, but ensuring it cannot misbehave.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Context and prior art#

Analysis — What SD‑RAG does differently#

1. Privacy enforcement is decoupled from answering#

2. Constraints are first‑class citizens#

3. Policy‑aware retrieval, not blind filtering#

Findings — What the experiments show#

Prompt injection resilience#

Constraint retrieval quality#

Latency impact#

Implications — Why this changes how we build RAG#

Conclusion — Sanitization beats alignment#