Opening — Why this matters now
Regulation is having a moment — not the glamorous kind, but the unavoidable kind.
As AI systems move from experimentation to deployment, organizations are discovering an inconvenient truth: models don’t just need to perform — they need to comply. Financial services, healthcare, and now AI governance itself are all governed by dense, evolving regulatory frameworks that were never designed for machines to interpret.
The bottleneck is obvious. Translating legal text into operational rules remains manual, expensive, and painfully slow.
Enter De Jure — a system that attempts to automate this translation process entirely. Not by making LLMs “understand law” in some grand philosophical sense, but by forcing them through a disciplined loop of decomposition, evaluation, and repair.
It’s less courtroom drama, more assembly line. And that’s precisely the point.
Background — Context and prior art
Historically, regulatory rule extraction has relied on:
| Approach | Limitation |
|---|---|
| Manual annotation by legal experts | Expensive, slow, non-scalable |
| Rule-based NLP pipelines | Brittle to domain variation |
| Supervised ML models | Require labeled datasets (rare and costly) |
| Prompt-based LLM extraction | Inconsistent, hallucination-prone |
LLMs improved the situation, but introduced a new problem: variability.
A single prompt might produce reasonable outputs — or subtly incorrect ones. In compliance, “subtle” errors are still catastrophic.
The missing ingredient wasn’t capability. It was process control.
Analysis — What De Jure actually does
De Jure is best understood not as a model, but as a pipeline architecture built around iterative refinement.
It operates in four stages:
1. Document Normalization
Raw regulatory text is converted into structured Markdown.
This step sounds trivial. It isn’t.
Legal documents rely heavily on hierarchy — sections, clauses, definitions. Preserving this structure is critical because downstream extraction depends on contextual relationships.
2. Semantic Decomposition
The LLM breaks the document into structured rule units, including:
- Obligations
- Conditions
- Definitions
- Metadata
Think of this as turning paragraphs into executable fragments.
3. LLM-as-a-Judge Evaluation
Here’s where things get interesting.
Instead of trusting the initial output, De Jure evaluates it across 19 distinct criteria, covering:
- Structural correctness
- Semantic fidelity
- Definition completeness
- Logical consistency
This is not a binary pass/fail. It’s a multi-dimensional scoring system.
4. Iterative Repair Loop
Low-scoring components are not discarded — they are repaired.
Crucially, repairs happen upstream first:
- Fix definitions
- Then metadata
- Then rule units
This ordering ensures that foundational context is corrected before higher-level reasoning is attempted again.
The process repeats within a bounded iteration budget (typically ≤ 3 cycles).
Pipeline Summary
| Stage | Function | Risk Mitigated |
|---|---|---|
| Normalization | Preserve structure | Context loss |
| Decomposition | Extract rules | Oversimplification |
| Evaluation | Score outputs | Silent errors |
| Repair | Iterative correction | Error propagation |
Findings — What actually improves
The results are less flashy than you might expect — and more important because of it.
1. Monotonic Quality Improvement
Each iteration consistently improves extraction quality.
| Iteration | Quality Trend |
|---|---|
| 0 (initial) | Baseline, variable |
| 1 | Significant improvement |
| 2 | Near-peak performance |
| 3 | Marginal gains, stabilization |
This predictability matters more than raw accuracy.
2. Domain Generalization
De Jure performs consistently across:
- Finance regulations
- Healthcare policies
- AI governance frameworks
No domain-specific tuning required.
Which is rare. And suspiciously useful.
3. Model-Agnostic Performance
Both open-source and proprietary models benefit from the pipeline.
Translation: the system architecture matters more than the model choice.
4. Downstream Impact (RAG Compliance QA)
When used in retrieval-augmented generation (RAG):
| Input Type | Outcome |
|---|---|
| Raw documents | Inconsistent answers |
| De Jure structured rules | More accurate, grounded responses |
In other words, better inputs → more reliable outputs.
Predictable, but often ignored in practice.
Implications — What this means for business
1. Compliance Becomes a Data Problem
De Jure reframes regulatory interpretation as a data engineering challenge, not a legal one.
That shift is profound.
It means:
- Compliance pipelines can be automated
- Updates can be version-controlled
- Rules can be queried programmatically
Law, meet infrastructure.
2. LLM Reliability Is a System Design Issue
The paper quietly makes a strong claim:
You don’t fix LLM unreliability with better prompts — you fix it with feedback loops.
This has broader implications for any enterprise AI system.
3. Iteration Beats Perfection
Instead of expecting perfect outputs in one pass, De Jure assumes failure — and designs around it.
This is closer to how real systems work:
- Generate
- Evaluate
- Repair
- Repeat
Not glamorous. Extremely effective.
4. Governance of AI Will Require Machines Reading Law
As AI systems themselves become regulated, the ability to machine-interpret regulation becomes foundational.
De Jure is an early blueprint for that capability.
Not a complete solution — but a directionally correct one.
Conclusion — Quiet systems win
De Jure doesn’t try to impress.
It doesn’t claim legal reasoning breakthroughs or philosophical understanding of statutes.
Instead, it does something more useful:
It builds a system that makes LLM outputs less wrong over time.
And in compliance, that’s the difference between a demo and a deployment.
Cognaptus: Automate the Present, Incubate the Future.