A park sign says: “No vehicles in the park.”
That seems simple until a child arrives on a small bicycle.
A rule has now become a legal interpretation problem. Does “vehicle” mean any device used for transport? Does it mean motor vehicles? Does a child’s bike count? Should the answer change if the rule was meant to protect pedestrians, prevent noise, preserve grass, or stop cars from entering the park?
This is the deceptively small example that runs through Václav Janeček and Giovanni Sartor’s paper on legal interpretation and AI.1 It works because it removes the usual fog around legal technology. No giant contract, no thousand-page statute, no dramatic courtroom scene. Just one ordinary word, one practical dispute, and three generations of AI trying to do something humans do every day: decide what a legal text means.
The paper’s central story is not that AI has “learned the law.” That would be a convenient headline and a bad diagnosis. The more useful story is that each generation of AI moved the interpretive burden to a different place.
Expert systems moved interpretation into the knowledge base. Argumentation systems moved it into explicit conflicts among reasons. LLMs moved it into fluent language generation.
That is progress. It is also a warning. When the machine becomes more fluent, the interpretive problem does not disappear. It becomes easier to miss.
The real question is not whether AI can read legal text
Most discussions about AI and law begin too late, usually with a chatbot producing a confident answer to a legal question. That framing makes the problem look like a contest: lawyer versus model, human reasoning versus machine reasoning, tradition versus automation.
The paper is more patient than that. It starts by treating legal interpretation as a recurring activity across legal practice: judges interpret statutes, lawyers interpret contracts, administrators interpret regulations, and citizens interpret rules when they decide what they are allowed to do.
Interpretation is not always needed. Some legal texts are clear enough for ordinary application. But doubt appears quickly: from ambiguity, vagueness, context, purpose, consequences, precedent, or social value. “No vehicles in the park” is not hard because the words are rare. It is hard because application depends on what the rule is for.
That point matters for AI design. If legal interpretation were only text classification, then the business problem would be straightforward: collect enough examples, train or prompt a model, and measure accuracy. But interpretation often involves choosing among plausible meanings under uncertainty. The system is not merely finding information. It is helping allocate legal meaning.
That is where the history becomes useful.
Stage one: expert systems made interpretation someone else’s problem
The first generation of AI and law was symbolic. Expert systems represented legal knowledge through rules, concepts, and inference engines. The promise was clean: encode the law as structured knowledge, then apply it consistently.
In the park example, the system might contain a rule like:
If x is a vehicle, then x is prohibited in the park.
That looks harmless. The machine can apply it. The difficulty is deciding whether a bicycle is a vehicle.
An expert system has two basic options. It can ask the user or officer at runtime: “Is this device a vehicle?” Or the knowledge engineer can add another rule in advance:
If x is a bicycle, then x is a vehicle.
The second option looks more automated. It is also where the legal politics quietly enters the software.
Once the interpretive rule is embedded, the system will apply it consistently. That sounds attractive until the child on the small Christmas bicycle arrives. A human officer might see that the ordinary purpose of the park rule is not served by banning a slow child’s bike. The system, unless equipped with exceptions or defeasible reasoning, simply chains the rules and reaches the prohibition.
This is the first lesson for legal automation: consistency is not the same as judgment.
Expert systems are transparent, but they are not neutral. They freeze interpretation at the point of system design. The person who builds the knowledge base gains interpretive power that would otherwise sit with the lawyer, official, judge, or citizen applying the rule in context.
For business use, this is still relevant. Many compliance systems today are not called “expert systems,” but they behave like them. A policy engine, contract ruleset, onboarding checklist, sanctions workflow, or claims-processing rulebase can embed legal interpretations as operational logic. The system may be technically correct under its encoded assumptions and still be overbroad, stale, or insensitive to context.
The expensive part is not always building the rule engine. It is deciding which legal interpretation the engine is allowed to make automatic.
Stage two: argumentation systems treated legal meaning as a dispute
The next stage of AI and law addressed a weakness in expert systems: legal interpretation often involves incompatible but defensible positions.
A deductive knowledge base prefers consistency. Legal argument does not. Courts routinely face competing interpretations, each supported by different canons, purposes, precedents, or principles. Argumentation models made that conflict explicit.
Return to the park. One argument says a bicycle is a vehicle because ordinary language can include transportation devices. Another argument says a child’s bicycle should be excluded because the purpose of the rule is to protect the peaceful and safe use of the park, not to prevent children from playing.
Now the system is no longer merely applying a rule. It is representing a structured dispute.
The paper explains this through interpretive canons: ordinary meaning, technical meaning, context, precedent, statutory analogy, legal concepts, general principles, history, purpose, substantive reasons, and intention. These canons do not mechanically determine outcomes. They provide defeasible reasons. A reason can support an interpretation until a stronger reason defeats it.
That is a more realistic model of legal reasoning. It also makes the engineering problem harder.
A simplified version looks like this:
| Interpretive stage | What the AI system represents | What becomes difficult |
|---|---|---|
| Expert system | A selected interpretation encoded as rules | Avoiding rigid, overbroad automation |
| Argumentation model | Competing interpretive claims and attacks | Modeling canons, strength, defeat, and priority |
| LLM system | Plausible interpretive suggestions in language | Verifying grounding, relevance, consistency, and legal legitimacy |
Argumentation systems are attractive because they preserve disagreement. They can show why one reading survives and another fails. In the paper’s example, the ordinary-language argument for including bicycles can be defeated by a purposive argument excluding children’s bicycles. The point is not that purpose always wins. The point is that the system can represent the fight.
This is closer to what lawyers actually do. They do not merely retrieve law. They arrange reasons.
For enterprise legaltech, this suggests a design principle that is still underused: legal AI should not only answer questions; it should expose the structure of disagreement. A system that says “this clause is probably enforceable” is less useful than one that says:
| Claim | Supporting reason | Counter-reason | Required human check |
|---|---|---|---|
| The term may include bicycles | Ordinary language can treat bicycles as vehicles | Purpose may target motorized danger, not children’s play | Confirm statutory definitions, precedent, and local context |
| The term may exclude children’s bicycles | Purpose supports family enjoyment of the park | Text may contain no exception for age or size | Confirm whether purposive reasoning is accepted in the jurisdiction |
| The rule is under-specified | Multiple plausible meanings remain | Authority may already resolve the term | Search binding authority and policy materials |
This is not decorative. It changes workflow. The output becomes a map for review rather than a conclusion disguised as assistance.
Stage three: LLMs made interpretation cheap to generate and costly to trust
LLMs enter the story from a different direction. They do not require legal rules to be encoded in advance. They do not require a formal argument graph. A user can paste a clause, statute, or court passage into a chat window and ask the model to paraphrase it, identify ambiguities, generate alternative readings, connect the provision to context, or articulate arguments under recognized canons.
The result can be immediately useful. In the paper, the authors include a chatbot-style answer to the “No vehicles in the park” question. The output notices that “vehicle” is fuzzy, distinguishes everyday speech from official usage, considers different purposes, and gives a practical rule of thumb. In short, it resembles competent legal reasoning.
“Resembles” is doing a lot of work.
The paper’s most important distinction is that LLM competence is primarily linguistic and discursive. LLMs are good at producing language that fits legal-discursive patterns. That does not mean they possess legal knowledge in the way a curated knowledge base does, or legal reasoning in the way an argumentation model tries to formalize, or normative judgment in the way a lawyer must exercise.
This is why the LLM stage is both powerful and dangerous. Earlier systems made their limitations visible. Expert systems were rigid. Argumentation systems were complex. LLMs are smooth. Smoothness is pleasant in user experience and lethal in governance.
A model can list interpretive arguments without knowing whether the cited authority exists. It can explain a legal canon without knowing whether that canon applies in the relevant jurisdiction. It can produce a balanced answer even when the law is settled. It can sound measured while being wrong. Very polite nonsense remains nonsense; it just invoices better.
RAG helps with sources, but it does not solve interpretation
Specialized legal AI systems usually try to close the knowledge gap with retrieval-augmented generation. RAG connects the model to an external knowledge base, retrieves documents or chunks, and places them into the prompt so the LLM can answer using source material.
That is useful. It is not magic.
The paper is careful about the reason. RAG requires the system to decide what to retrieve. In law, relevance is not merely semantic similarity. A statute, amendment, exception, precedent, commentary, administrative guideline, or contractual definition may matter because of hierarchy, jurisdiction, time, procedural posture, or interpretive method.
Retrieval therefore creates a trade-off among speed, relevance, and completeness. A small retrieval set may miss the controlling authority. A broad retrieval set may overload the prompt or bury the key issue. A fast retrieval system may compress the legal database in ways that distort edge cases.
This matters for business deployment because many AI products are currently sold as if “RAG + legal database” is equivalent to “grounded legal reasoning.” It is not. It is grounded text generation, assuming retrieval found the right ground.
For compliance and contract workflows, the practical implication is simple: retrieval should be auditable. The system should show not only the answer but also what it retrieved, what it did not retrieve, and where the human reviewer must confirm completeness.
A legal RAG system that hides retrieval decisions is like a junior associate who says, “I checked the sources,” and then refuses to say which ones. Charming confidence. Unhelpful habit.
The paper is a conceptual map, not an experimental benchmark
This is important: the paper is not mainly reporting a new experiment. It is a conceptual and doctrinal synthesis of AI approaches to legal interpretation.
The figures and examples serve different purposes:
| Paper element | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Knowledge-based system diagram | Implementation concept | Shows where human users and knowledge engineers sit in symbolic systems | Does not measure system performance |
| “No vehicles in the park” expert-system rules | Mechanism explanation | Shows how interpretive choices become encoded rules | Does not show all possible exception-handling designs |
| Interpretive argument graph | Formal modeling example | Shows how competing canons and attacks can represent legal disagreement | Does not prove argumentation systems scale cheaply |
| Chatbot answer on the park rule | Illustrative example | Shows how LLMs can generate plausible interpretive discussion from a simple prompt | Does not establish reliability across legal tasks |
| Review of hallucination, RAG, consistency, and benchmark issues | Boundary-setting synthesis | Shows why autonomous legal interpretation remains unsafe | Does not provide a new quantitative benchmark |
That distinction improves the article’s business reading. We should not ask, “What metric did the paper improve?” The paper is not that kind of artifact. Its value is architectural: it helps decision-makers see which part of legal interpretation each AI paradigm handles, and which part it quietly transfers to humans.
The verification-value paradox is where business enthusiasm should slow down
The paper’s most commercially relevant section concerns LLM-enabled interpretation. The attractive claim is obvious: LLMs can save time by drafting, summarizing, retrieving, and proposing arguments. The less attractive reality is that legal professionals must verify the output.
The paper cites reported hallucination rates of 17–34% on benchmarked legal queries. Even without treating that range as universal, it is enough to make the workflow problem clear. In legal interpretation, one fabricated authority or false premise can poison the conclusion.
This creates a verification-value paradox:
| LLM benefit | Required control | Business consequence |
|---|---|---|
| Generates interpretive options quickly | Lawyer checks sources, facts, and logic | Time savings shrink |
| Produces fluent legal argument | Reviewer tests whether argument is legally valid | Fluency cannot be used as a quality signal |
| Retrieves and summarizes sources | Reviewer confirms relevance and completeness | RAG shifts work; it does not remove it |
| Standardizes drafting style | Reviewer checks jurisdiction, context, and normative fit | Templates help only after legal judgment |
| Supports agentic workflows | Human must approve before action | Autonomous action is the danger zone |
The paradox does not mean LLMs are useless. It means ROI depends on task selection.
LLMs are likely more valuable where the interpretive space is broad, the source materials are digitized, and the output is used to expand the reviewer’s options. They are less defensible where the task requires final legal advice, direct action, or authoritative interpretation without human validation.
That is an uncomfortable distinction for vendors, because “brainstorming companion for skilled professionals” sounds less glamorous than “AI lawyer.” It is also much closer to the technology’s current risk profile.
The business use case is interpretive support, not legal autopilot
For legaltech builders, compliance teams, and regulated enterprises, the paper points toward a specific product shape.
The strongest near-term systems will not ask the LLM to “decide the interpretation.” They will ask it to support a workflow that preserves human authority and makes review cheaper.
A practical architecture might look like this:
| Layer | Function | Human-control requirement |
|---|---|---|
| Source layer | Retrieve statutes, contracts, cases, policies, definitions, and guidance | Show retrieval scope and source ranking |
| Text layer | Segment provisions, paraphrase clauses, identify ambiguous terms | Require reviewer confirmation of segmentation |
| Argument layer | Generate competing interpretive readings and canons | Mark outputs as candidate arguments, not conclusions |
| Verification layer | Check citations, authority status, jurisdiction, and consistency | Preserve direct links to original sources |
| Decision layer | Record human-approved interpretation and rationale | Human retains final responsibility |
| Audit layer | Log prompt, retrieved sources, model output, edits, and approval | Enable later review and accountability |
This is where hybrid AI becomes interesting. LLMs can extract and organize language. Logic systems can preserve structured rules. Argumentation frameworks can represent conflict and defeat. Retrieval systems can connect the model to current sources. Human lawyers can supply normative judgment and responsibility.
That combination is less cinematic than a fully autonomous legal agent. Good. Cinema is not a compliance strategy.
Agentic legal AI is where the warning becomes sharpest
The paper is especially cautious about agentic AI: systems that not only generate interpretations but also act on them. That matters far beyond legal practice.
Imagine an AI system interpreting a contract clause and automatically refusing payment. Or interpreting a platform policy and suspending an account. Or interpreting an insurance exclusion and rejecting a claim. Or interpreting a procurement rule and disqualifying a supplier.
In all of these cases, the interpretive act becomes operational action.
The park example scales cleanly. Little Tom is not merely told that bicycles may be prohibited. He is fined by an automated system because an LLM-enabled process interpreted “vehicle” to include children’s bicycles without human validation.
That is the moment where legal interpretation stops being a research problem and becomes governance risk.
For enterprises, the control rule should be plain: LLM-enabled interpretation may prepare decisions, but it should not silently become the decision-maker in high-stakes legal contexts. The system can draft, retrieve, compare, flag, and recommend. It should not autonomously impose legal consequences unless the interpretive rule has already been formally approved, audited, and bounded.
Even then, exception handling matters. The old expert-system problem returns wearing a newer interface.
What the paper directly shows, and what Cognaptus infers
The paper directly shows a historical and conceptual progression in AI and legal interpretation. It explains how expert systems, argumentation models, and LLMs each approach the interpretive problem differently. It argues that LLMs can support legal interpretation but should not be treated as authoritative legal interpreters.
Cognaptus infers three operational lessons for business use.
First, legal AI products should distinguish interpretive generation from interpretive authority. Generating a plausible reading is not the same as endorsing it.
Second, legal AI workflows should make disagreement visible. The best output is often not one answer but a structured set of candidate interpretations, supporting reasons, counterarguments, source links, and review tasks.
Third, legal AI governance should focus less on whether a model sounds legally competent and more on where interpretive responsibility sits. If responsibility is unclear, the product is not mature. It is just fluent.
What remains uncertain is the cost curve. Better retrieval, citation verification, legal-specific prompting, fine-tuning, and hybrid architectures may reduce review cost. They may also create new layers that require their own validation. The paper does not settle that empirical question, and nobody should pretend otherwise.
The old AI problems did not vanish; they merged
The most useful reading of the paper is not “symbolic AI failed, argumentation was too hard, LLMs won.”
That is the lazy version.
The better reading is that each paradigm captured one necessary part of legal interpretation:
| AI paradigm | What it captured | What it missed |
|---|---|---|
| Expert systems | Rules, consistency, computable application | Context-sensitive interpretation |
| Argumentation systems | Conflict, canons, defeasible reasoning | Scalable modeling and usability |
| LLMs | Language, fluency, breadth, rapid suggestion | Grounded authority, reproducibility, normative judgment |
Law needs all three: rules, arguments, and language. It also needs responsibility.
That is why the future suggested by the paper is hybrid. LLMs may help extract information from legal texts and translate it into computable form. They may be prompted or trained to work with interpretive canons. Their outputs may be checked using logic, formal argumentation, citation verification, or other legal reasoning standards. Workflows may be redesigned so lawyers get the benefits of AI without surrendering judgment to it.
This is not a retreat from AI. It is a more precise ambition.
Conclusion: AI can widen the interpretive field, but it cannot own the judgment
The small bicycle in the park is doing more work than it first appears.
For expert systems, it exposes the rigidity of encoded interpretations. For argumentation systems, it shows why legal meaning is often a contest among reasons. For LLMs, it reveals the gap between producing a plausible legal discussion and possessing the authority, grounding, and judgment required to settle one.
The practical conclusion is not that legal AI should be avoided. It is that legal AI should be placed where it is strongest: expanding the set of possible readings, surfacing ambiguity, retrieving sources, mapping arguments, checking citations, and helping professionals reason more carefully.
The weakest use case is the one vendors are most tempted to imply: an autonomous interpreter that gives reliable legal advice because it sounds like it has read everything.
It has not. It has read enough language to sound useful. That is valuable, but it is not the same thing as legal judgment.
AI has learned to read between the lines. The responsibility for deciding what those lines legally mean still belongs to humans.
Cognaptus: Automate the Present, Incubate the Future.
-
Václav Janeček and Giovanni Sartor, “Legal interpretation and AI: from expert systems to argumentation and LLMs,” arXiv:2603.05392, forthcoming in International Handbook of Legal Language and Communication: From Text to Semiotics, Springer, 2026. https://arxiv.org/pdf/2603.05392 ↩︎