Law & Order(ly Data): How LLMs Are Learning to Read Regulations Like Machines

Opening — Why this matters now

Regulation is having a moment — not the glamorous kind, but the unavoidable kind.

As AI systems move from experimentation to deployment, organizations are discovering an inconvenient truth: models don’t just need to perform — they need to comply. Financial services, healthcare, and now AI governance itself are all governed by dense, evolving regulatory frameworks that were never designed for machines to interpret.

The bottleneck is obvious. Translating legal text into operational rules remains manual, expensive, and painfully slow.

Enter De Jure — a system that attempts to automate this translation process entirely. Not by making LLMs “understand law” in some grand philosophical sense, but by forcing them through a disciplined loop of decomposition, evaluation, and repair.

It’s less courtroom drama, more assembly line. And that’s precisely the point.

Background — Context and prior art

Historically, regulatory rule extraction has relied on:

Approach	Limitation
Manual annotation by legal experts	Expensive, slow, non-scalable
Rule-based NLP pipelines	Brittle to domain variation
Supervised ML models	Require labeled datasets (rare and costly)
Prompt-based LLM extraction	Inconsistent, hallucination-prone

LLMs improved the situation, but introduced a new problem: variability.

A single prompt might produce reasonable outputs — or subtly incorrect ones. In compliance, “subtle” errors are still catastrophic.

The missing ingredient wasn’t capability. It was process control.

Analysis — What De Jure actually does

De Jure is best understood not as a model, but as a pipeline architecture built around iterative refinement.

It operates in four stages:

1. Document Normalization

Raw regulatory text is converted into structured Markdown.

This step sounds trivial. It isn’t.

Legal documents rely heavily on hierarchy — sections, clauses, definitions. Preserving this structure is critical because downstream extraction depends on contextual relationships.

2. Semantic Decomposition

The LLM breaks the document into structured rule units, including:

Obligations
Conditions
Definitions
Metadata

Think of this as turning paragraphs into executable fragments.

3. LLM-as-a-Judge Evaluation

Here’s where things get interesting.

Instead of trusting the initial output, De Jure evaluates it across 19 distinct criteria, covering:

Structural correctness
Semantic fidelity
Definition completeness
Logical consistency

This is not a binary pass/fail. It’s a multi-dimensional scoring system.

4. Iterative Repair Loop

Low-scoring components are not discarded — they are repaired.

Crucially, repairs happen upstream first:

Fix definitions
Then metadata
Then rule units

This ordering ensures that foundational context is corrected before higher-level reasoning is attempted again.

The process repeats within a bounded iteration budget (typically ≤ 3 cycles).

Pipeline Summary

Stage	Function	Risk Mitigated
Normalization	Preserve structure	Context loss
Decomposition	Extract rules	Oversimplification
Evaluation	Score outputs	Silent errors
Repair	Iterative correction	Error propagation

Findings — What actually improves

The results are less flashy than you might expect — and more important because of it.

1. Monotonic Quality Improvement

Each iteration consistently improves extraction quality.

Iteration	Quality Trend
0 (initial)	Baseline, variable
1	Significant improvement
2	Near-peak performance
3	Marginal gains, stabilization

This predictability matters more than raw accuracy.

2. Domain Generalization

De Jure performs consistently across:

Finance regulations
Healthcare policies
AI governance frameworks

No domain-specific tuning required.

Which is rare. And suspiciously useful.

3. Model-Agnostic Performance

Both open-source and proprietary models benefit from the pipeline.

Translation: the system architecture matters more than the model choice.

4. Downstream Impact (RAG Compliance QA)

When used in retrieval-augmented generation (RAG):

Input Type	Outcome
Raw documents	Inconsistent answers
De Jure structured rules	More accurate, grounded responses

In other words, better inputs → more reliable outputs.

Predictable, but often ignored in practice.

Implications — What this means for business

1. Compliance Becomes a Data Problem

De Jure reframes regulatory interpretation as a data engineering challenge, not a legal one.

That shift is profound.

It means:

Compliance pipelines can be automated
Updates can be version-controlled
Rules can be queried programmatically

Law, meet infrastructure.

2. LLM Reliability Is a System Design Issue

The paper quietly makes a strong claim:

You don’t fix LLM unreliability with better prompts — you fix it with feedback loops.

This has broader implications for any enterprise AI system.

3. Iteration Beats Perfection

Instead of expecting perfect outputs in one pass, De Jure assumes failure — and designs around it.

This is closer to how real systems work:

Generate
Evaluate
Repair
Repeat

Not glamorous. Extremely effective.

4. Governance of AI Will Require Machines Reading Law

As AI systems themselves become regulated, the ability to machine-interpret regulation becomes foundational.

De Jure is an early blueprint for that capability.

Not a complete solution — but a directionally correct one.

Conclusion — Quiet systems win

De Jure doesn’t try to impress.

It doesn’t claim legal reasoning breakthroughs or philosophical understanding of statutes.

Instead, it does something more useful:

It builds a system that makes LLM outputs less wrong over time.

And in compliance, that’s the difference between a demo and a deployment.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Context and prior art#

Analysis — What De Jure actually does#

1. Document Normalization#

2. Semantic Decomposition#

3. LLM-as-a-Judge Evaluation#

4. Iterative Repair Loop#

Pipeline Summary#

Findings — What actually improves#

1. Monotonic Quality Improvement#

2. Domain Generalization#

3. Model-Agnostic Performance#

4. Downstream Impact (RAG Compliance QA)#

Implications — What this means for business#

1. Compliance Becomes a Data Problem#

2. LLM Reliability Is a System Design Issue#

3. Iteration Beats Perfection#

4. Governance of AI Will Require Machines Reading Law#

Conclusion — Quiet systems win#