RAGulating Compliance: When Triplets Trump Chunks

TL;DR for operators

Compliance teams do not mainly need a chatbot that sounds more confident. They already have enough people sounding confident in meetings. They need answers that can be traced back to the rule text, checked against related provisions, and updated when the regulatory corpus changes.

The paper behind this article proposes a multi-agent system that turns regulatory documents into subject–predicate–object triplets, embeds those triplets alongside their source sections, retrieves triplets for question answering, and shows users the relevant subgraph behind the answer.¹ That matters because regulatory work is not just “find me a paragraph.” It is “show me the applicable rule, the linked requirement, the exception, the deadline, and the neighbouring clause that will embarrass us later.”

The evidence is useful but not as clean as the abstract’s enthusiasm suggests. Triplets do not improve every retrieval metric. At lower similarity thresholds, the triplet system performs worse than retrieval without triplets. At the stricter 0.75 threshold, it performs better. Answer accuracy barely moves, from 4.71 to 4.73 on a 1–5 scale. The stronger signal is navigation: triplets increase average degree and reduce average shortest path, meaning the system can move across related regulatory sections more easily.

So the business lesson is narrow and valuable: triplet-RAG is less about making the final answer prettier and more about making the retrieval substrate inspectable. In regulated domains, that is not a cosmetic upgrade. It is the difference between “the model said so” and “here is the chain of linked regulatory facts, with sources, now please stop sweating.”

The evidence says “better at strict retrieval,” not “magic graph dust”

The paper’s most important table should be read slowly. It compares the system with and without triplets across section overlap, answer accuracy, and graph navigation.

Evaluation area	Without triplets	With triplets	What this actually means
Section overlap at 0.50 threshold	0.0812	0.0745	Triplets do worse at a loose threshold. Not a universal retrieval win.
Section overlap at 0.60 threshold	0.2700	0.2143	Again, worse with triplets. Structure can narrow retrieval too aggressively.
Section overlap at 0.75 threshold	0.1684	0.2888	Triplets help when the bar for similarity is stricter. This is the real retrieval signal.
Average answer accuracy	4.71	4.73	The answer-quality lift is marginal. Do not build a board deck around 0.02.
Average degree	1.2939	1.6080	The triplet graph is more interconnected. Useful for follow-up exploration.
Average shortest path	2.0167	1.3300	Related sections are closer in the triplet network. Useful for audit and investigation workflows.

The tempting version of the story is: “Knowledge graphs improve RAG.” Naturally. Add a graph, sprinkle in agents, call it explainable, and watch the LinkedIn carousel assemble itself.

The more useful version is: triplets seem to help when retrieval needs to be precise rather than merely broad. At a lower threshold, chunk retrieval may catch fuzzy contextual overlap better. At a stricter threshold, the triplet representation appears to recover more tightly aligned sections. That is exactly where regulatory QA becomes interesting, because compliance users often care less about “near enough” than about “this exact rule and its direct relatives.”

The paper’s answer-accuracy result also needs discipline. A move from 4.71 to 4.73 on a five-point scale is not nothing, but it is not the headline. Both systems already score high. The bigger operational gain is that triplets create a navigable evidence structure around the answer. That is harder to summarise in a benchmark table, which is why it is probably the part worth paying attention to.

What the system actually builds

The proposed architecture has three linked layers.

First, the system ingests regulatory text and extracts subject–predicate–object triplets. A rule like “FDA requires submission within 15 days” becomes something closer to a factual atom:

$$ \text{FDA} \rightarrow \text{requires} \rightarrow \text{submission within 15 days} $$

That atom is less expressive than the full legal paragraph, but more structured than a text chunk. The crucial design choice is that each triplet remains linked to its originating section. The graph does not replace the regulatory text. It indexes it.

Second, the system embeds triplets and stores them with their source sections and metadata in an enriched vector database. The paper describes an embedding model based on transformer architectures, trained on text extracted from the Electronic Code of Federal Regulations. At query time, the system retrieves semantically similar triplets and then pulls the linked text sections as evidence for generation.

Third, a multi-agent pipeline coordinates the work. The paper separates ingestion, extraction, normalisation, deduplication, indexing, retrieval, story-building, and answer generation into specialised agents. In plain business language: the pipeline separates data preparation from retrieval and retrieval from final response construction. That separation matters because compliance systems fail in stages. A bad answer may come from weak extraction, sloppy entity matching, missing retrieval, or overconfident generation. A modular pipeline at least gives you somewhere to look when the machine starts making legally expensive poetry.

The “ontology-free” move is practical, but not free

The paper describes the knowledge graph as ontology-free, or schema-light. That means the system does not begin with a rigid, predefined regulatory ontology. Instead, structure emerges from extracted relationships in the source documents.

This is a sensible choice for regulatory material. Rules change. Agencies revise guidance. Terms appear with inconsistent surface forms. Building a complete ontology up front can become a ceremony of false precision: everyone agrees on the schema, then the real documents arrive and immediately misbehave.

But ontology-free does not mean governance-free. The paper itself notes vocabulary fragmentation as a challenge. If one section says “quality system regulation,” another says “QSR,” and a third uses a near-synonym embedded in a procedural clause, the graph needs canonicalisation and entity resolution. Otherwise, it becomes less a knowledge graph and more a very expensive thesaurus having an identity crisis.

The right interpretation is therefore not “skip ontology design.” It is “delay heavy ontology commitments until the corpus shows you where structure is actually needed.” For operators, this suggests a staged implementation:

Stage	What to do	What to avoid
Initial ingestion	Extract triplets with source-section links	Pretending raw extraction is already a governed knowledge layer
Normalisation	Merge duplicates, resolve aliases, standardise entities	Letting acronyms and synonyms split the graph
Review loop	Let domain experts correct high-value triplets	Asking lawyers to validate every edge manually until morale dies
Partial schema	Add controlled types where the corpus demands them	Building a cathedral ontology before the first useful answer

The paper’s ontology-free framing is attractive because it lowers the starting cost. Its production value depends on how quickly the organisation adds enough discipline to keep the graph coherent.

Why triplets help strict retrieval

Chunk-based RAG retrieves passages based on semantic similarity. That works well when the user asks broad questions and the relevant passage contains enough overlapping language. It becomes weaker when the answer depends on relationships scattered across provisions: the agency, the obligation, the deadline, the exception, the linked section, and the procedural consequence.

Triplets compress the text into relationship units. They make the “who did what to whom” structure retrievable. In the paper’s example, multiple eCFR sections converge on a shared 15-day appeal timeframe. A chunk retriever may see separate provisions. A triplet graph can expose the shared procedural structure.

That explains the evaluation pattern. At loose thresholds, chunks may win because they retain more lexical and contextual surface area. At strict thresholds, triplets may win because they reduce noise and force the retrieval system to match on the factual core. For compliance work, this distinction matters.

A loose retrieval system is useful for discovery. A strict retrieval system is useful for defensible answers. Most enterprises need both, but they should not confuse them.

User need	Better suited retrieval mode	Reason
“Find broadly related provisions”	Chunk retrieval	Wider context and fuzzy semantic overlap help exploration.
“Answer this specific regulatory question”	Triplet + linked text	Relationship-level matching can isolate the factual core.
“Show what else this clause connects to”	Triplet graph	Shared entities and predicates support lateral navigation.
“Prepare an audit trail”	Triplet provenance + source text	Every structured fact can point back to a section.

This is the sensible architecture: use chunks for context, triplets for structure, and source text for verification. Anyone selling only one of the three is probably selling simplicity, not compliance.

The multi-agent design is useful because compliance pipelines break in different places

The phrase “multi-agent” is fashionable enough to deserve suspicion on arrival. In this paper, however, the agent split has a reasonable operational role.

The knowledge-graph construction side includes agents for document ingestion, triplet extraction, normalisation and cleaning, and triplet store/indexing. The QA side includes retrieval, story-building, and final generation. This is not agent theatre if implemented with logs, checkpoints, and validation. It creates separable responsibilities:

Agent role	Operational responsibility	Failure mode it helps isolate
Ingestion	Segment regulatory text and capture metadata	Missing sections, stale documents, poor chunk boundaries
Extraction	Produce SPO triplets	Wrong or missing relationships
Normalisation	Deduplicate and resolve entity variants	Fragmented graph, duplicate facts, alias chaos
Indexing	Embed and store triplets with source links	Retrieval blind spots or metadata loss
Retrieval	Select relevant triplets and sections	Wrong evidence set
Story-building	Assemble linked context for generation	Disconnected or misleading narrative
Generation	Produce final answer from triplets and text	Unsupported synthesis or overstatement

This matters because the usual enterprise RAG failure report—“the answer was wrong”—is operationally useless. Wrong how? Wrong source? Wrong extraction? Wrong version? Wrong synthesis? Wrong jurisdiction? Wrong because someone asked the model to reason about an exception that was never retrieved?

A modular pipeline gives compliance, legal, and engineering teams a shared diagnostic surface. It does not eliminate errors. It makes them less mystical. Small mercy. Large value.

Subgraph visualisation is not a dashboard ornament

The paper adds interactive visualisation of retrieved subgraphs. This is easy to dismiss as UI decoration. It is not.

In regulatory QA, the user’s second question is often more important than the first. “What rule applies?” becomes “What else references it?” then “Does that deadline also govern this adjacent process?” then “Which section creates the exception?” A retrieved subgraph supports that motion. It turns a one-shot answer into an investigation path.

This is where the navigation metrics become more interesting than the marginal answer-accuracy gain. The paper reports higher average degree with triplets and a shorter average shortest path. Put less mathematically: related sections appear more connected and reachable when the triplet graph is used. For audit workflows, that can be more valuable than a slightly better final sentence.

A compliance analyst does not only need an answer. They need a way to defend why that answer was given, what it depends on, and what neighbouring provisions might change the interpretation. A subgraph makes that review visible.

What the paper directly shows, what we can infer, and what remains open

The distinction matters. Papers show things. Businesses infer things. Vendors blur the two because blur is where budget approvals go to nap.

Category	What belongs here
Directly shown by the paper	A multi-agent architecture can extract SPO triplets from regulatory text, link them to source sections, retrieve them for QA, and report stronger strict-threshold section overlap plus stronger navigation metrics than a no-triplet condition.
Reasonable Cognaptus inference	The architecture is most valuable for auditability, traceability, and follow-up investigation, rather than for a dramatic jump in answer accuracy.
Still uncertain	Robustness across larger corpora, different agencies, changing rules, adversarial questions, expert-graded legal correctness, temporal constraints, and production-scale maintenance cost.

The evaluation is best treated as a design signal, not a final benchmark. The paper explains its methodology: sample regulatory sections, identify related ground-truth mentions, generate questions and reference answers with an LLM, run inference, evaluate section overlap, assess factual correctness, and measure navigational facility. The likely purpose of each test is different.

Test	Likely purpose	What it supports	What it does not prove
Section-level overlap	Main retrieval evidence	Triplets help under stricter similarity matching	Universal retrieval superiority
Answer accuracy	Main QA evidence	Answers remain highly accurate with triplets and improve slightly	Material improvement in legal correctness
Navigation metrics	Main graph-utility evidence	Triplets improve connectedness and movement across related sections	That every connection is legally meaningful
Subgraph visualisation	Implementation and usability detail	Retrieved evidence can be inspected visually	That users will interpret the graph correctly
Ontology-free construction	Architectural design choice	Faster adaptation to evolving regulatory text	Elimination of schema or data-governance work

The strongest reading is therefore measured: this is a credible architecture for making regulatory QA more inspectable. It is not proof that a knowledge graph automatically makes every answer better.

The business value is auditability before accuracy

For regulated industries, the obvious buyer fantasy is an AI assistant that answers every compliance question correctly. Lovely. Also not the first thing to buy.

The first thing to buy is a system that can show its work in a form humans can inspect. Triplet-RAG is valuable because it creates a structured evidence layer. Every answer can be tied to retrieved relationships and the underlying source sections. That supports three practical workflows.

First, regulatory QA: users ask questions and receive answers grounded in both structured triplets and source text. This reduces reliance on free-form chunk retrieval and gives reviewers more precise evidence units.

Second, audit preparation: teams can export or inspect the chain of retrieved facts. This matters when the question is not “what does the rule say?” but “how did we arrive at this interpretation?”

Third, change impact analysis: if a regulation changes, affected triplets can theoretically be re-extracted or flagged, then downstream answers depending on those triplets can be reviewed. The paper frames incremental update mechanisms as future work, so this is an implementation direction rather than a completed result. Still, it is one of the most commercially interesting directions.

The ROI case is not “0.02 better answer score.” The ROI case is fewer blind spots in review, faster traceability, reusable regulatory structure, and better handoff between legal, quality, compliance, and operations teams.

Where this would fit in a real compliance stack

A production version should not replace document management, legal review, or quality systems. It should sit between the regulatory corpus and the human decision workflow.

A practical stack would look something like this:

Source layer: regulations, guidance, SOPs, policies, inspection findings, and internal controls.
Extraction layer: triplets with source-section provenance, metadata, jurisdiction, effective dates, and document versions.
Governance layer: entity canonicalisation, duplicate handling, expert correction, confidence scoring, and change logs.
Retrieval layer: hybrid retrieval using both text chunks and triplets.
Reasoning layer: generation constrained by retrieved evidence, with explicit unsupported-answer handling.
Review layer: answer, citations, graph walk, conflicting evidence, and exportable audit trail.

The key word is hybrid. Triplets are not a replacement for text. They are a structured handle on text. In compliance, losing the original wording is dangerous because obligations often live in qualifiers: “unless,” “within,” “except,” “after,” “before,” “may,” “shall,” and other tiny words with invoice-sized consequences.

The limits are not decorative; they define the product

The paper’s own challenge section points to the right production concerns: vocabulary fragmentation, extraction quality, deeper inference, temporal constraints, and large-scale pipeline optimisation. These are not footnotes. They are the work.

A bad triplet is worse than a missing chunk because it looks structured. Structure gives errors a suit and tie. If the extraction agent turns a conditional requirement into a plain obligation, the graph may retrieve it confidently. If entity resolution fails, related provisions split into separate islands. If temporal logic is not represented, an expired requirement may sit beside a current one like nothing happened.

The system also needs stronger evaluation before high-stakes deployment. The paper does not provide enough detail to treat the reported scores as a production benchmark across regulatory domains. It does not establish expert legal correctness under adversarial questioning. It does not prove that graph connections always represent meaningful legal relationships. It shows a promising architecture and evidence that triplets improve strict retrieval and navigation in the tested setting.

That is enough to take the design seriously. It is not enough to let it approve a submission while everyone goes for coffee.

The correct lesson: chunks answer, triplets investigate

The cleanest takeaway is not that triplets beat chunks. The title says that because titles are allowed to have a little theatre. The operational truth is more precise: chunks are good for retrieving context; triplets are good for retrieving relationships; source sections are necessary for verification.

The paper’s contribution is to combine those pieces inside a multi-agent pipeline and show that the result improves the parts of regulatory QA that ordinary RAG handles awkwardly: strict matching, provenance, and navigation across linked rules.

For compliance leaders, the question is not whether this architecture can produce a fluent answer. Almost everything can now produce a fluent answer. The question is whether the system can reveal the factual skeleton underneath the answer and let a human inspect it before the organisation relies on it.

That is where triplets earn their keep. Not as magic. As plumbing. And in compliance, good plumbing is underrated until something leaks in front of a regulator.

Cognaptus: Automate the Present, Incubate the Future.

Bhavik Agarwal, Hemant Sunil Jomraj, Simone Kaplunov, Jack Krolick, and Viktoria Rojkova, “RAGulating Compliance: A Multi-Agent Knowledge Graph for Regulatory QA,” arXiv:2508.09893, submitted 13 August 2025, https://arxiv.org/abs/2508.09893. ↩︎

TL;DR for operators#

The evidence says “better at strict retrieval,” not “magic graph dust”#

What the system actually builds#

The “ontology-free” move is practical, but not free#

Why triplets help strict retrieval#

The multi-agent design is useful because compliance pipelines break in different places#

Subgraph visualisation is not a dashboard ornament#

What the paper directly shows, what we can infer, and what remains open#

The business value is auditability before accuracy#

Where this would fit in a real compliance stack#

The limits are not decorative; they define the product#

The correct lesson: chunks answer, triplets investigate#