Executive Snapshot
- Client type: Boutique law firm with partners, associates, paralegals, and legal assistants
- Industry: Legal services — civil disputes, contracts, immigration, and corporate compliance
- Core problem: Lawyers and paralegals lost billable time organizing client materials before legal reasoning could begin
- Why agentic AI: The work required document understanding, timeline reconstruction, issue spotting, research routing, drafting, and repeated human review rather than one fixed automation script
- Deployment stage: Prototype-to-pilot design
- Primary result: A human-approved case preparation package replacing scattered folders, informal notes, and repeated memo rework
1. Business Context
The firm handled small and mid-sized matters where clients submitted contracts, court letters, immigration forms, invoices, email chains, scanned PDFs, screenshots, and messaging-app exports through multiple channels. Each new matter required the team to open a file, request missing documents, rename and sort materials, reconstruct what happened, identify deadlines, search for relevant legal authorities, and prepare a first memo for partner review. Errors mattered because a missing notice, unreadable scan, wrong date, or unverified citation could distort early legal advice. Delays also mattered commercially: clients were willing to pay for legal judgment, but less willing to pay for visible administrative preparation.
2. Analytical Point from the Research
The relevant lesson from recent legal-agent and retrieval research is not that AI should act as an autonomous lawyer. The practical value is narrower and more operational: agentic AI can convert fragmented legal inputs into source-grounded intermediate artifacts that humans can verify before advice is given. Multi-agent legal systems such as L-MARS and PAKTON show the usefulness of decomposing legal work into retrieval, evidence filtering, contract or document analysis, and grounded synthesis.12 LawThinker adds an important governance principle: exploration and verification should be treated as paired steps, because errors can otherwise propagate through the reasoning chain.3 Broader Agentic RAG research supports this because static retrieval pipelines struggle with multi-step, context-dependent tasks.4 At the same time, empirical work on AI legal research tools shows that hallucination risk remains material, so the case design keeps lawyers responsible for fact validation, citation checking, and final legal judgment.5
3. Why Simpler Automation Was Not Enough
A rules script could rename files or move attachments into folders, but it could not reliably connect a contract clause to a later email, detect contradictions across screenshots, or distinguish a confirmed fact from a client allegation. A chatbot alone was also insufficient because it would produce answers without durable workflow state, source-linked review controls, or approval gates. The matter-preparation workflow branched whenever documents were missing, scans were unreadable, facts conflicted, deadlines appeared, or research results needed jurisdictional verification. The firm therefore needed a stateful preparation layer: agents could prepare structured drafts, but every legally meaningful output stayed in review status until a lawyer approved it.
4. Pre-Agent Workflow
- Intake and matter opening. A legal assistant or paralegal received the inquiry, created a preliminary matter record, checked basic client and counterparty details, and asked the responsible lawyer whether the firm could proceed.
- Document collection and filing. The client sent materials through email, messaging apps, cloud folders, scans, and walk-in submissions. Staff manually downloaded, renamed, sorted, and de-duplicated documents.
- Manual fact extraction and timeline building. A paralegal or associate read the documents, extracted dates, parties, obligations, payments, filings, communications, and deadlines, then built a chronological timeline.
- Research and risk spotting. An associate manually searched for statutes, cases, regulations, policy guidance, or internal prior memos, then identified evidentiary gaps, deadline risks, contradictions, and client follow-up questions.
- Memo drafting and review loop. The associate drafted an internal case memo or preliminary client memo from scratch. The partner reviewed facts, citations, risks, and recommendations, then sent revisions back before any client-facing update.
Key pain points:
- Preparation work was sequential, so delay in document organization delayed timeline building, research, and memo drafting.
- Source references were inconsistent, making partner review slower and increasing the chance that a factual claim could not be traced back to a document.
- Research and drafting were often restarted after partner feedback because gaps were discovered late.
- The firm had no systematic way to reuse verified timelines, issue checklists, or approved memo structures from similar matters.
5. Agent Design and Guardrails
The post-agent workflow starts with a secure AI-enabled matter workspace. A human still confirms conflict status, engagement scope, confidentiality controls, and whether the matter is permitted for AI-assisted processing. The system then ingests approved client materials and routes them to five agents. The Document Review Agent classifies files, extracts metadata, detects duplicates, flags unreadable scans, and marks potentially sensitive or privileged documents. A paralegal verifies this inventory before downstream work begins. The Fact Timeline Builder creates a sourced timeline with confidence labels. The Risk Issue Spotter maps procedural, evidentiary, limitation-period, privilege, and client-management risks. The Precedent Search Assistant converts reviewed facts into research questions and retrieves candidate authorities or internal prior memos. The Client Memo Drafter prepares a draft only after the associate has reviewed the timeline, issue map, and research leads.
The guardrail is simple: agents may prepare, structure, flag, retrieve, and draft; they may not approve legal conclusions, send client advice, file court documents, or represent citation validity as final. Partner approval remains mandatory before any client-facing communication or next legal action.
| Layer | System role | Human checkpoint |
|---|---|---|
| Intake governance | Open matter workspace and record scope | Lawyer confirms conflict, engagement, confidentiality, and AI-use permission |
| Document understanding | Classify, OCR, index, de-duplicate, and flag documents | Paralegal or associate verifies inventory and sensitive-material flags |
| Fact structuring | Build sourced timeline and uncertainty labels | Associate checks dates, source references, contradictions, and gaps |
| Research support | Retrieve candidate authorities and prior memos | Lawyer verifies jurisdiction, currency, precedential value, and applicability |
| Drafting | Generate preliminary memo and missing-information checklist | Partner approves, revises, or rejects before client use |
| Learning loop | Store approved templates and rejected AI patterns | Firm restricts reuse to anonymized or permitted knowledge assets |
6. One Workflow Walkthrough
A small business client contacted the firm after a supplier allegedly breached a delivery contract. The client sent a signed agreement, several invoice PDFs, email threads, a termination notice, payment screenshots, and WhatsApp messages. The legal assistant opened the matter workspace and confirmed that the documents could be processed under the firm’s AI policy. The Document Review Agent created an inventory and flagged two duplicate invoices, one unreadable scan, and one message export with unclear dates. After a paralegal approved the inventory, the Fact Timeline Builder produced a timeline separating confirmed delivery dates from client allegations. The Risk Issue Spotter flagged a possible notice-period issue and missing proof of receipt. The Precedent Search Assistant suggested research questions on breach, notice, damages, and mitigation. Because one authority was jurisdictionally uncertain, the associate marked it for replacement before memo drafting. The Client Memo Drafter then produced a preliminary memo and missing-information checklist. The partner revised the risk language, approved the client update, and the system archived the final timeline, memo, and review log.
7. Results
Evidence status: Planned pilot design with estimated workflow targets, not production measurement.
| Evaluation layer | Baseline | Pilot target |
|---|---|---|
| Baseline period | Manual review of representative recent matters | Four-week pilot on new low-to-moderate complexity matters |
| Workflow scope | Initial case preparation before substantive advice | Intake, document indexing, timeline drafting, issue map, research leads, and memo draft |
| Process change | Sequential manual preparation | Parallel agent preparation with human verification gates |
| Document organization time | 100% manual effort | 40–60% reduction target for first-pass sorting and indexing |
| Timeline preparation | Built from scratch by paralegal or associate | First draft generated with source references, then reviewed by associate |
| Partner review | Focus split between organization, correction, and judgment | Focus shifted toward fact validation, citation verification, risk framing, and strategy |
| Business effect | Billable hours consumed before legal reasoning became visible to client | More lawyer time allocated to judgment, advice, negotiation posture, and client explanation |
The expected improvement is not a promise that the AI produces final legal work. The measurable pilot goal is narrower: reduce the time needed to create a lawyer-ready preparation package and improve traceability between each memo statement and its source document.
8. What Failed First and What Changed
The first version allowed the memo drafter to use research suggestions before the associate had verified the candidate authorities. This produced a polished but unsafe draft: some citations looked plausible, but one authority was not appropriate for the matter’s jurisdiction. The workflow was changed so memo drafting could only start after the associate marked the timeline, risk matrix, and research leads as reviewed. The system also added a citation-verification field and a revision route: bad authority goes back to the Precedent Search Assistant; unclear facts go back to the Timeline Builder; sensitive-material concerns go to a human reviewer. The remaining limitation is that legal applicability still depends on lawyer judgment.
9. Transferable Lessons
- Do not automate the final legal opinion first. Automate the preparation artifacts that lawyers already need: document index, sourced timeline, issue map, research lead list, and draft memo.
- Treat verification as part of the workflow, not as an afterthought. Each agent output should have a human review owner, source references, confidence labels, and version history.
- Make the management responsibility explicit. The managing partner owns AI-use policy, matter eligibility, data controls, and approval rules; associates and paralegals own review checkpoints; agents own preparation tasks only.
This case shows that agentic AI works best in legal operations when it reduces coordination friction before expert judgment, rather than pretending to replace the expert judgment itself.
References
-
Ziqi Wang and Boqin Yuan, “L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search,” arXiv:2509.00761, 2026. https://arxiv.org/abs/2509.00761 ↩︎
-
Petros Raptopoulos, Giorgos Filandrianos, Maria Lymperaiou, and Giorgos Stamou, “PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements,” arXiv:2506.00608, 2025. https://arxiv.org/abs/2506.00608 ↩︎
-
Xinyu Yang, Chenlong Deng, Tongyu Wen, Binyu Xie, and Zhicheng Dou, “LawThinker: A Deep Research Legal Agent in Dynamic Environments,” arXiv:2602.12056, 2026. https://arxiv.org/abs/2602.12056 ↩︎
-
Aditi Singh, Abul Ehtesham, Saket Kumar, Tala Talaei Khoei, and Athanasios V. Vasilakos, “Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG,” arXiv:2501.09136, 2026. https://arxiv.org/abs/2501.09136 ↩︎
-
Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, and Daniel E. Ho, “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools,” arXiv:2405.20362, 2024. https://arxiv.org/abs/2405.20362 ↩︎