Opening — Why this matters now

Regulation is having a moment — not the glamorous kind, but the unavoidable kind.

As AI systems move from experimentation to deployment, organizations are discovering an inconvenient truth: models don’t just need to perform — they need to comply. Financial services, healthcare, and now AI governance itself are all governed by dense, evolving regulatory frameworks that were never designed for machines to interpret.

The bottleneck is obvious. Translating legal text into operational rules remains manual, expensive, and painfully slow.

Enter De Jure — a system that attempts to automate this translation process entirely. Not by making LLMs “understand law” in some grand philosophical sense, but by forcing them through a disciplined loop of decomposition, evaluation, and repair.

It’s less courtroom drama, more assembly line. And that’s precisely the point.

Background — Context and prior art

Historically, regulatory rule extraction has relied on:

Approach Limitation
Manual annotation by legal experts Expensive, slow, non-scalable
Rule-based NLP pipelines Brittle to domain variation
Supervised ML models Require labeled datasets (rare and costly)
Prompt-based LLM extraction Inconsistent, hallucination-prone

LLMs improved the situation, but introduced a new problem: variability.

A single prompt might produce reasonable outputs — or subtly incorrect ones. In compliance, “subtle” errors are still catastrophic.

The missing ingredient wasn’t capability. It was process control.

Analysis — What De Jure actually does

De Jure is best understood not as a model, but as a pipeline architecture built around iterative refinement.

It operates in four stages:

1. Document Normalization

Raw regulatory text is converted into structured Markdown.

This step sounds trivial. It isn’t.

Legal documents rely heavily on hierarchy — sections, clauses, definitions. Preserving this structure is critical because downstream extraction depends on contextual relationships.

2. Semantic Decomposition

The LLM breaks the document into structured rule units, including:

  • Obligations
  • Conditions
  • Definitions
  • Metadata

Think of this as turning paragraphs into executable fragments.

3. LLM-as-a-Judge Evaluation

Here’s where things get interesting.

Instead of trusting the initial output, De Jure evaluates it across 19 distinct criteria, covering:

  • Structural correctness
  • Semantic fidelity
  • Definition completeness
  • Logical consistency

This is not a binary pass/fail. It’s a multi-dimensional scoring system.

4. Iterative Repair Loop

Low-scoring components are not discarded — they are repaired.

Crucially, repairs happen upstream first:

  1. Fix definitions
  2. Then metadata
  3. Then rule units

This ordering ensures that foundational context is corrected before higher-level reasoning is attempted again.

The process repeats within a bounded iteration budget (typically ≤ 3 cycles).


Pipeline Summary

Stage Function Risk Mitigated
Normalization Preserve structure Context loss
Decomposition Extract rules Oversimplification
Evaluation Score outputs Silent errors
Repair Iterative correction Error propagation

Findings — What actually improves

The results are less flashy than you might expect — and more important because of it.

1. Monotonic Quality Improvement

Each iteration consistently improves extraction quality.

Iteration Quality Trend
0 (initial) Baseline, variable
1 Significant improvement
2 Near-peak performance
3 Marginal gains, stabilization

This predictability matters more than raw accuracy.

2. Domain Generalization

De Jure performs consistently across:

  • Finance regulations
  • Healthcare policies
  • AI governance frameworks

No domain-specific tuning required.

Which is rare. And suspiciously useful.

3. Model-Agnostic Performance

Both open-source and proprietary models benefit from the pipeline.

Translation: the system architecture matters more than the model choice.

4. Downstream Impact (RAG Compliance QA)

When used in retrieval-augmented generation (RAG):

Input Type Outcome
Raw documents Inconsistent answers
De Jure structured rules More accurate, grounded responses

In other words, better inputs → more reliable outputs.

Predictable, but often ignored in practice.

Implications — What this means for business

1. Compliance Becomes a Data Problem

De Jure reframes regulatory interpretation as a data engineering challenge, not a legal one.

That shift is profound.

It means:

  • Compliance pipelines can be automated
  • Updates can be version-controlled
  • Rules can be queried programmatically

Law, meet infrastructure.

2. LLM Reliability Is a System Design Issue

The paper quietly makes a strong claim:

You don’t fix LLM unreliability with better prompts — you fix it with feedback loops.

This has broader implications for any enterprise AI system.

3. Iteration Beats Perfection

Instead of expecting perfect outputs in one pass, De Jure assumes failure — and designs around it.

This is closer to how real systems work:

  • Generate
  • Evaluate
  • Repair
  • Repeat

Not glamorous. Extremely effective.

4. Governance of AI Will Require Machines Reading Law

As AI systems themselves become regulated, the ability to machine-interpret regulation becomes foundational.

De Jure is an early blueprint for that capability.

Not a complete solution — but a directionally correct one.

Conclusion — Quiet systems win

De Jure doesn’t try to impress.

It doesn’t claim legal reasoning breakthroughs or philosophical understanding of statutes.

Instead, it does something more useful:

It builds a system that makes LLM outputs less wrong over time.

And in compliance, that’s the difference between a demo and a deployment.

Cognaptus: Automate the Present, Incubate the Future.