Reading Between the Lines: How AI Learned to Interpret the Law

A park sign says: “No vehicles in the park.”

That seems simple until a child arrives on a small bicycle.

A rule has now become a legal interpretation problem. Does “vehicle” mean any device used for transport? Does it mean motor vehicles? Does a child’s bike count? Should the answer change if the rule was meant to protect pedestrians, prevent noise, preserve grass, or stop cars from entering the park?

This is the deceptively small example that runs through Václav Janeček and Giovanni Sartor’s paper on legal interpretation and AI.¹ It works because it removes the usual fog around legal technology. No giant contract, no thousand-page statute, no dramatic courtroom scene. Just one ordinary word, one practical dispute, and three generations of AI trying to do something humans do every day: decide what a legal text means.

The paper’s central story is not that AI has “learned the law.” That would be a convenient headline and a bad diagnosis. The more useful story is that each generation of AI moved the interpretive burden to a different place.

Expert systems moved interpretation into the knowledge base. Argumentation systems moved it into explicit conflicts among reasons. LLMs moved it into fluent language generation.

That is progress. It is also a warning. When the machine becomes more fluent, the interpretive problem does not disappear. It becomes easier to miss.

The real question is not whether AI can read legal text

Most discussions about AI and law begin too late, usually with a chatbot producing a confident answer to a legal question. That framing makes the problem look like a contest: lawyer versus model, human reasoning versus machine reasoning, tradition versus automation.

The paper is more patient than that. It starts by treating legal interpretation as a recurring activity across legal practice: judges interpret statutes, lawyers interpret contracts, administrators interpret regulations, and citizens interpret rules when they decide what they are allowed to do.

Interpretation is not always needed. Some legal texts are clear enough for ordinary application. But doubt appears quickly: from ambiguity, vagueness, context, purpose, consequences, precedent, or social value. “No vehicles in the park” is not hard because the words are rare. It is hard because application depends on what the rule is for.

That point matters for AI design. If legal interpretation were only text classification, then the business problem would be straightforward: collect enough examples, train or prompt a model, and measure accuracy. But interpretation often involves choosing among plausible meanings under uncertainty. The system is not merely finding information. It is helping allocate legal meaning.

That is where the history becomes useful.

Stage one: expert systems made interpretation someone else’s problem

The first generation of AI and law was symbolic. Expert systems represented legal knowledge through rules, concepts, and inference engines. The promise was clean: encode the law as structured knowledge, then apply it consistently.

In the park example, the system might contain a rule like:

If x is a vehicle, then x is prohibited in the park.

That looks harmless. The machine can apply it. The difficulty is deciding whether a bicycle is a vehicle.

An expert system has two basic options. It can ask the user or officer at runtime: “Is this device a vehicle?” Or the knowledge engineer can add another rule in advance:

If x is a bicycle, then x is a vehicle.

The second option looks more automated. It is also where the legal politics quietly enters the software.

Once the interpretive rule is embedded, the system will apply it consistently. That sounds attractive until the child on the small Christmas bicycle arrives. A human officer might see that the ordinary purpose of the park rule is not served by banning a slow child’s bike. The system, unless equipped with exceptions or defeasible reasoning, simply chains the rules and reaches the prohibition.

This is the first lesson for legal automation: consistency is not the same as judgment.

Expert systems are transparent, but they are not neutral. They freeze interpretation at the point of system design. The person who builds the knowledge base gains interpretive power that would otherwise sit with the lawyer, official, judge, or citizen applying the rule in context.

For business use, this is still relevant. Many compliance systems today are not called “expert systems,” but they behave like them. A policy engine, contract ruleset, onboarding checklist, sanctions workflow, or claims-processing rulebase can embed legal interpretations as operational logic. The system may be technically correct under its encoded assumptions and still be overbroad, stale, or insensitive to context.

The expensive part is not always building the rule engine. It is deciding which legal interpretation the engine is allowed to make automatic.

Stage two: argumentation systems treated legal meaning as a dispute

The next stage of AI and law addressed a weakness in expert systems: legal interpretation often involves incompatible but defensible positions.

A deductive knowledge base prefers consistency. Legal argument does not. Courts routinely face competing interpretations, each supported by different canons, purposes, precedents, or principles. Argumentation models made that conflict explicit.

Return to the park. One argument says a bicycle is a vehicle because ordinary language can include transportation devices. Another argument says a child’s bicycle should be excluded because the purpose of the rule is to protect the peaceful and safe use of the park, not to prevent children from playing.

Now the system is no longer merely applying a rule. It is representing a structured dispute.

The paper explains this through interpretive canons: ordinary meaning, technical meaning, context, precedent, statutory analogy, legal concepts, general principles, history, purpose, substantive reasons, and intention. These canons do not mechanically determine outcomes. They provide defeasible reasons. A reason can support an interpretation until a stronger reason defeats it.

That is a more realistic model of legal reasoning. It also makes the engineering problem harder.

A simplified version looks like this:

Interpretive stage	What the AI system represents	What becomes difficult
Expert system	A selected interpretation encoded as rules	Avoiding rigid, overbroad automation
Argumentation model	Competing interpretive claims and attacks	Modeling canons, strength, defeat, and priority
LLM system	Plausible interpretive suggestions in language	Verifying grounding, relevance, consistency, and legal legitimacy

Argumentation systems are attractive because they preserve disagreement. They can show why one reading survives and another fails. In the paper’s example, the ordinary-language argument for including bicycles can be defeated by a purposive argument excluding children’s bicycles. The point is not that purpose always wins. The point is that the system can represent the fight.

This is closer to what lawyers actually do. They do not merely retrieve law. They arrange reasons.

For enterprise legaltech, this suggests a design principle that is still underused: legal AI should not only answer questions; it should expose the structure of disagreement. A system that says “this clause is probably enforceable” is less useful than one that says:

Claim	Supporting reason	Counter-reason	Required human check
The term may include bicycles	Ordinary language can treat bicycles as vehicles	Purpose may target motorized danger, not children’s play	Confirm statutory definitions, precedent, and local context
The term may exclude children’s bicycles	Purpose supports family enjoyment of the park	Text may contain no exception for age or size	Confirm whether purposive reasoning is accepted in the jurisdiction
The rule is under-specified	Multiple plausible meanings remain	Authority may already resolve the term	Search binding authority and policy materials

This is not decorative. It changes workflow. The output becomes a map for review rather than a conclusion disguised as assistance.

Stage three: LLMs made interpretation cheap to generate and costly to trust

LLMs enter the story from a different direction. They do not require legal rules to be encoded in advance. They do not require a formal argument graph. A user can paste a clause, statute, or court passage into a chat window and ask the model to paraphrase it, identify ambiguities, generate alternative readings, connect the provision to context, or articulate arguments under recognized canons.

The result can be immediately useful. In the paper, the authors include a chatbot-style answer to the “No vehicles in the park” question. The output notices that “vehicle” is fuzzy, distinguishes everyday speech from official usage, considers different purposes, and gives a practical rule of thumb. In short, it resembles competent legal reasoning.

“Resembles” is doing a lot of work.

The paper’s most important distinction is that LLM competence is primarily linguistic and discursive. LLMs are good at producing language that fits legal-discursive patterns. That does not mean they possess legal knowledge in the way a curated knowledge base does, or legal reasoning in the way an argumentation model tries to formalize, or normative judgment in the way a lawyer must exercise.

This is why the LLM stage is both powerful and dangerous. Earlier systems made their limitations visible. Expert systems were rigid. Argumentation systems were complex. LLMs are smooth. Smoothness is pleasant in user experience and lethal in governance.

A model can list interpretive arguments without knowing whether the cited authority exists. It can explain a legal canon without knowing whether that canon applies in the relevant jurisdiction. It can produce a balanced answer even when the law is settled. It can sound measured while being wrong. Very polite nonsense remains nonsense; it just invoices better.

RAG helps with sources, but it does not solve interpretation

Specialized legal AI systems usually try to close the knowledge gap with retrieval-augmented generation. RAG connects the model to an external knowledge base, retrieves documents or chunks, and places them into the prompt so the LLM can answer using source material.

That is useful. It is not magic.

The paper is careful about the reason. RAG requires the system to decide what to retrieve. In law, relevance is not merely semantic similarity. A statute, amendment, exception, precedent, commentary, administrative guideline, or contractual definition may matter because of hierarchy, jurisdiction, time, procedural posture, or interpretive method.

Retrieval therefore creates a trade-off among speed, relevance, and completeness. A small retrieval set may miss the controlling authority. A broad retrieval set may overload the prompt or bury the key issue. A fast retrieval system may compress the legal database in ways that distort edge cases.

This matters for business deployment because many AI products are currently sold as if “RAG + legal database” is equivalent to “grounded legal reasoning.” It is not. It is grounded text generation, assuming retrieval found the right ground.

For compliance and contract workflows, the practical implication is simple: retrieval should be auditable. The system should show not only the answer but also what it retrieved, what it did not retrieve, and where the human reviewer must confirm completeness.

A legal RAG system that hides retrieval decisions is like a junior associate who says, “I checked the sources,” and then refuses to say which ones. Charming confidence. Unhelpful habit.

The paper is a conceptual map, not an experimental benchmark

This is important: the paper is not mainly reporting a new experiment. It is a conceptual and doctrinal synthesis of AI approaches to legal interpretation.

The figures and examples serve different purposes:

Paper element	Likely purpose	What it supports	What it does not prove
Knowledge-based system diagram	Implementation concept	Shows where human users and knowledge engineers sit in symbolic systems	Does not measure system performance
“No vehicles in the park” expert-system rules	Mechanism explanation	Shows how interpretive choices become encoded rules	Does not show all possible exception-handling designs
Interpretive argument graph	Formal modeling example	Shows how competing canons and attacks can represent legal disagreement	Does not prove argumentation systems scale cheaply
Chatbot answer on the park rule	Illustrative example	Shows how LLMs can generate plausible interpretive discussion from a simple prompt	Does not establish reliability across legal tasks
Review of hallucination, RAG, consistency, and benchmark issues	Boundary-setting synthesis	Shows why autonomous legal interpretation remains unsafe	Does not provide a new quantitative benchmark

That distinction improves the article’s business reading. We should not ask, “What metric did the paper improve?” The paper is not that kind of artifact. Its value is architectural: it helps decision-makers see which part of legal interpretation each AI paradigm handles, and which part it quietly transfers to humans.

The verification-value paradox is where business enthusiasm should slow down

The paper’s most commercially relevant section concerns LLM-enabled interpretation. The attractive claim is obvious: LLMs can save time by drafting, summarizing, retrieving, and proposing arguments. The less attractive reality is that legal professionals must verify the output.

The paper cites reported hallucination rates of 17–34% on benchmarked legal queries. Even without treating that range as universal, it is enough to make the workflow problem clear. In legal interpretation, one fabricated authority or false premise can poison the conclusion.

This creates a verification-value paradox:

LLM benefit	Required control	Business consequence
Generates interpretive options quickly	Lawyer checks sources, facts, and logic	Time savings shrink
Produces fluent legal argument	Reviewer tests whether argument is legally valid	Fluency cannot be used as a quality signal
Retrieves and summarizes sources	Reviewer confirms relevance and completeness	RAG shifts work; it does not remove it
Standardizes drafting style	Reviewer checks jurisdiction, context, and normative fit	Templates help only after legal judgment
Supports agentic workflows	Human must approve before action	Autonomous action is the danger zone

The paradox does not mean LLMs are useless. It means ROI depends on task selection.

LLMs are likely more valuable where the interpretive space is broad, the source materials are digitized, and the output is used to expand the reviewer’s options. They are less defensible where the task requires final legal advice, direct action, or authoritative interpretation without human validation.

That is an uncomfortable distinction for vendors, because “brainstorming companion for skilled professionals” sounds less glamorous than “AI lawyer.” It is also much closer to the technology’s current risk profile.

The business use case is interpretive support, not legal autopilot

For legaltech builders, compliance teams, and regulated enterprises, the paper points toward a specific product shape.

The strongest near-term systems will not ask the LLM to “decide the interpretation.” They will ask it to support a workflow that preserves human authority and makes review cheaper.

A practical architecture might look like this:

Layer	Function	Human-control requirement
Source layer	Retrieve statutes, contracts, cases, policies, definitions, and guidance	Show retrieval scope and source ranking
Text layer	Segment provisions, paraphrase clauses, identify ambiguous terms	Require reviewer confirmation of segmentation
Argument layer	Generate competing interpretive readings and canons	Mark outputs as candidate arguments, not conclusions
Verification layer	Check citations, authority status, jurisdiction, and consistency	Preserve direct links to original sources
Decision layer	Record human-approved interpretation and rationale	Human retains final responsibility
Audit layer	Log prompt, retrieved sources, model output, edits, and approval	Enable later review and accountability

This is where hybrid AI becomes interesting. LLMs can extract and organize language. Logic systems can preserve structured rules. Argumentation frameworks can represent conflict and defeat. Retrieval systems can connect the model to current sources. Human lawyers can supply normative judgment and responsibility.

That combination is less cinematic than a fully autonomous legal agent. Good. Cinema is not a compliance strategy.

Agentic legal AI is where the warning becomes sharpest

The paper is especially cautious about agentic AI: systems that not only generate interpretations but also act on them. That matters far beyond legal practice.

Imagine an AI system interpreting a contract clause and automatically refusing payment. Or interpreting a platform policy and suspending an account. Or interpreting an insurance exclusion and rejecting a claim. Or interpreting a procurement rule and disqualifying a supplier.

In all of these cases, the interpretive act becomes operational action.

The park example scales cleanly. Little Tom is not merely told that bicycles may be prohibited. He is fined by an automated system because an LLM-enabled process interpreted “vehicle” to include children’s bicycles without human validation.

That is the moment where legal interpretation stops being a research problem and becomes governance risk.

For enterprises, the control rule should be plain: LLM-enabled interpretation may prepare decisions, but it should not silently become the decision-maker in high-stakes legal contexts. The system can draft, retrieve, compare, flag, and recommend. It should not autonomously impose legal consequences unless the interpretive rule has already been formally approved, audited, and bounded.

Even then, exception handling matters. The old expert-system problem returns wearing a newer interface.

What the paper directly shows, and what Cognaptus infers

The paper directly shows a historical and conceptual progression in AI and legal interpretation. It explains how expert systems, argumentation models, and LLMs each approach the interpretive problem differently. It argues that LLMs can support legal interpretation but should not be treated as authoritative legal interpreters.

Cognaptus infers three operational lessons for business use.

First, legal AI products should distinguish interpretive generation from interpretive authority. Generating a plausible reading is not the same as endorsing it.

Second, legal AI workflows should make disagreement visible. The best output is often not one answer but a structured set of candidate interpretations, supporting reasons, counterarguments, source links, and review tasks.

Third, legal AI governance should focus less on whether a model sounds legally competent and more on where interpretive responsibility sits. If responsibility is unclear, the product is not mature. It is just fluent.

What remains uncertain is the cost curve. Better retrieval, citation verification, legal-specific prompting, fine-tuning, and hybrid architectures may reduce review cost. They may also create new layers that require their own validation. The paper does not settle that empirical question, and nobody should pretend otherwise.

The old AI problems did not vanish; they merged

The most useful reading of the paper is not “symbolic AI failed, argumentation was too hard, LLMs won.”

That is the lazy version.

The better reading is that each paradigm captured one necessary part of legal interpretation:

AI paradigm	What it captured	What it missed
Expert systems	Rules, consistency, computable application	Context-sensitive interpretation
Argumentation systems	Conflict, canons, defeasible reasoning	Scalable modeling and usability
LLMs	Language, fluency, breadth, rapid suggestion	Grounded authority, reproducibility, normative judgment

Law needs all three: rules, arguments, and language. It also needs responsibility.

That is why the future suggested by the paper is hybrid. LLMs may help extract information from legal texts and translate it into computable form. They may be prompted or trained to work with interpretive canons. Their outputs may be checked using logic, formal argumentation, citation verification, or other legal reasoning standards. Workflows may be redesigned so lawyers get the benefits of AI without surrendering judgment to it.

This is not a retreat from AI. It is a more precise ambition.

Conclusion: AI can widen the interpretive field, but it cannot own the judgment

The small bicycle in the park is doing more work than it first appears.

For expert systems, it exposes the rigidity of encoded interpretations. For argumentation systems, it shows why legal meaning is often a contest among reasons. For LLMs, it reveals the gap between producing a plausible legal discussion and possessing the authority, grounding, and judgment required to settle one.

The practical conclusion is not that legal AI should be avoided. It is that legal AI should be placed where it is strongest: expanding the set of possible readings, surfacing ambiguity, retrieving sources, mapping arguments, checking citations, and helping professionals reason more carefully.

The weakest use case is the one vendors are most tempted to imply: an autonomous interpreter that gives reliable legal advice because it sounds like it has read everything.

It has not. It has read enough language to sound useful. That is valuable, but it is not the same thing as legal judgment.

AI has learned to read between the lines. The responsibility for deciding what those lines legally mean still belongs to humans.

Cognaptus: Automate the Present, Incubate the Future.

Václav Janeček and Giovanni Sartor, “Legal interpretation and AI: from expert systems to argumentation and LLMs,” arXiv:2603.05392, forthcoming in International Handbook of Legal Language and Communication: From Text to Semiotics, Springer, 2026. https://arxiv.org/pdf/2603.05392 ↩︎

The real question is not whether AI can read legal text#

Stage one: expert systems made interpretation someone else’s problem#

Stage two: argumentation systems treated legal meaning as a dispute#

Stage three: LLMs made interpretation cheap to generate and costly to trust#

RAG helps with sources, but it does not solve interpretation#

The paper is a conceptual map, not an experimental benchmark#

The verification-value paradox is where business enthusiasm should slow down#

The business use case is interpretive support, not legal autopilot#

Agentic legal AI is where the warning becomes sharpest#

What the paper directly shows, and what Cognaptus infers#

The old AI problems did not vanish; they merged#

Conclusion: AI can widen the interpretive field, but it cannot own the judgment#