Executive Snapshot

  • Client type: Privacy-conscious SME with lean IT and legal/compliance capacity
  • Industry: Cross-functional back-office operations handling contracts, proposals, bid specifications, and internal approvals
  • Core problem: Sensitive documents had to stay inside private infrastructure, but the legacy local-LLM workflow still relied on serial handoffs, manual exception routing, and repeated manager intervention
  • Why agentic AI: The work was not just extraction; it required context gathering, task decomposition, policy checks, confidence-based escalation, and controlled downstream actions across multiple systems
  • Deployment stage: Pilot design grounded in an implemented old workflow and a redesigned new workflow
  • Primary result: The redesigned workflow reduces routine human coordination, improves auditability, and makes local-model limitations manageable by routing different subtasks to different local models instead of asking one model to do everything

1. Business Context

The client handled a steady stream of sensitive business documents: draft contracts from sales, customer proposals, bid specifications, and related internal notes. These documents could not be sent to public AI services, so the organization used a local LLM environment inside private infrastructure. The problem was that the actual operating process remained highly coordinated by humans. A document would be uploaded, tagged, checked against fixed rules, reviewed by a specialist if risk appeared, then pushed into dashboards and sometimes into Slack, CRM, or accounting tools. Because the files affected legal exposure, pricing commitments, and compliance posture, delays mattered. Even when the first-pass tagging worked, staff still spent disproportionate time deciding which model to use, checking missing context, chasing approvals, and explaining exceptions to managers.

2. Why Simpler Automation Was Not Enough

A script or dashboard could not solve this workflow because the incoming documents were heterogeneous, the relevant context lived in multiple places, and the acceptable action depended on both content and business policy. A chatbot alone would also have been too shallow: the organization needed the system to interpret a document, retrieve supporting context, decide whether confidence was sufficient, escalate only the right cases, and keep an audit trail of why an action was taken. The privacy constraint made the problem harder. Local LLMs were safer, but generally weaker than top public models, and this trade-off was most painful for smaller firms that could not afford one large local model for every task. The improvement therefore could not be “use a better model.” It had to be “route the right subtask to the right local model and only spend heavier inference where it changes the decision.”123

3. Pre-Agent Workflow

Before the agentic redesign, the business operated through a mostly linear local-AI workflow with heavy human coordination around it.

  1. An operations user uploaded a contract, proposal, or bid file into the governed Cognaptus workspace.
  2. A local model performed first-pass tagging for clauses, red flags, entities, and document intent, usually using a fixed routing configuration chosen in advance.
  3. A rule layer evaluated the extracted tags and suggested actions such as “send to legal,” “flag risky terms,” or “suggest edit.”
  4. Staff reviewed high-risk cases, interpreted ambiguous outputs, and updated the dashboard or downstream systems.
  5. When the workflow missed context or produced recurring errors, a developer or manager manually adjusted prompts, plugins, or routing rules in a refinement loop.

Key pain points:

  • Serial coordination burden: The workflow was technically automated in places, but people still coordinated the upload, review, exception handling, status updates, and refinement cycle.
  • Weak context assembly: The system relied too much on the document itself and too little on adjacent enterprise context such as prior contract language, internal rules, approval history, or query logs.
  • One-pass reasoning limits: A single local model often had to both interpret the document and support downstream decisions, which exposed the capability limits of smaller private models.

Pre-agent workflow

4. Agent Design and Guardrails

The new design kept the privacy boundary but changed the operating logic. Instead of treating the document as a single-pass inference problem, Cognaptus treated it as a governed session with planning, retrieval, execution, and confirmation stages.

  • Inputs: contracts, proposals, bid specs, prior redlines, policy rules, approval logs, metadata, and system-level access/action policies
  • Understanding: document normalization, OCR where needed, structured extraction, classification, hybrid retrieval over dense and keyword-oriented stores, and source-linked context assembly
  • Reasoning: hierarchical task planning, policy-aware ranking, confidence checks, threshold-based escalation, and subtask routing to the most suitable local model or agent
  • Actions: create issue summaries, draft suggested edits, trigger approval tasks, write dashboard state, open downstream tickets, and push approved updates to enterprise tools
  • Memory/state: session-level provenance, searchable case history, retrieved evidence, action logs, and outcome traces for later review and refinement
  • Human review points: initial policy/governance setup; any high-risk, low-confidence, or budget-violating case; final approval before externally consequential actions
  • Out-of-scope actions: unsupervised policy changes, silent outbound data sharing, and irreversible business actions without explicit approval

Operationally, the redesign introduced five changes. First, it registered each document in a hybrid retrieval layer so the system could combine semantic matches with exact policy or clause lookup.1 Second, it enriched the task with enterprise context rather than asking the model to reason from the uploaded file alone.3 Third, it created an explicit task plan instead of a single prompt chain, allowing the system to break work into subtasks such as extraction, clause comparison, policy matching, summary drafting, and escalation decision.24 Fourth, it compiled that plan into a more fine-grained execution graph so independent steps could run in parallel or pipeline rather than waiting in a fixed sequence.5 Fifth, it added a task coordinator that tracked quality, latency, and operating cost during execution and could stop, replan, or request human confirmation when thresholds were crossed.4

This design mattered especially because of the local-model constraint. The firm could not rely on frontier public models for sensitive documents, but it also could not afford to run a single large private model for every stage. The practical answer was model routing: smaller local models handled routine extraction, classification, and retrieval support; stronger local models were reserved for ambiguous reasoning or draft generation; and humans reviewed only the cases where both automation paths remained uncertain.

Agentic workflow

5. One Workflow Walkthrough

When a new customer contract arrived with non-standard indemnity language, the system first ingested the file, normalized the content, and registered it in the retrieval layer together with metadata about customer, contract type, and business owner. It then extracted key clauses and compared them against internal legal policy, prior approved language, and similar historical contracts. Instead of asking one model to do everything, the system routed clause extraction and risk labeling to lightweight local specialists, while a stronger local reasoning model handled the conflict analysis between the proposed clause and house standards.

The coordinator then checked whether the output met confidence and policy thresholds. In this case, the indemnity clause was close to a previously approved exception but not identical. Because the confidence score was below the approval threshold and the potential liability was material, the workflow escalated the case to legal review rather than auto-drafting a final response. A lawyer reviewed the highlighted passage, accepted one suggested redline, rejected another, and approved the outbound recommendation. Only then did the system update the dashboard, create a follow-up task for sales, and log the reasoning path, retrieved precedents, model outputs, and final human decision for audit and future reuse.

6. Results

  • Baseline period: Legacy operating model reconstructed from the old workflow
  • Evaluation period: Redesigned agentic pilot specification
  • Workflow scope/sample: Sensitive document intake, triage, compliance/legal escalation, dashboard update, and downstream task triggering
  • Process change: The workflow moved from a serial chain with four recurrent human touchpoints to a design with two primary human control points: governance setup and exception approval
  • Decision/model change: One-pass document tagging was replaced by context-enriched planning, subtask routing across local models, and explicit quality confirmation before action
  • Business effect: Less unproductive communication across analysts, reviewers, and managers; better auditability; more selective escalation; and a higher share of managerial time shifted from chasing approvals to setting policies, risk thresholds, and operating priorities
  • Evidence status: Estimated from workflow redesign and pilot logic, not yet reported as production KPI data

The most important gain was not “full autonomy.” It was structural. In the old workflow, managers and specialists spent energy on communication and governance overhead: clarifying handoffs, rechecking context, interpreting borderline outputs, and deciding whether a case was safe to move forward. In the redesigned workflow, those same people remained in control, but the system assembled context, planned the work, tracked thresholds, and surfaced only the exceptions worth attention. The result was a narrower human role with higher leverage.

7. What Failed First and What Changed

The first redesign attempt still behaved too much like a smarter prompt chain. It improved extraction but did not reliably improve decisions, because ambiguous cases still lacked enough enterprise context and the same local model was being stretched across too many roles. That failure exposed the core operating lesson: local deployment and security do not automatically produce a good workflow. The fix was to separate the work into planned subtasks, enrich each case with rules and prior examples, and route subtasks to different local models according to capability and cost. Even after that change, one limitation remained: small firms still face a hard ceiling on local compute, so the routing policy has to stay disciplined about when stronger inference is truly worth using.

8. Transferable Lesson

  • Do not treat privacy-preserving AI as a single-model problem. In small and mid-sized firms, local security is often affordable before frontier-level local capability is; subtask routing is therefore more practical than trying to make one model do everything.
  • Move human effort upward, not outward. The right goal is not to remove reviewers, but to remove low-value communication and governance work so reviewers spend time only on material exceptions.
  • Plan and confirm before you act. For enterprise workflows, better extraction alone is not enough; the system needs context assembly, explicit task planning, validation, and coordinator-level control over quality, latency, and cost.

This case shows that agentic AI works best when the workflow contains real branching, real policy constraints, and real coordination costs that are too expensive to manage manually but too risky to automate blindly.


  1. Arun S. Maiya, OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit, arXiv:2505.07672. The paper emphasizes local or hybrid deployment for sensitive data, document-ingestion pipelines, hybrid retrieval stores, and extractor/classifier/agent-style pipelines. https://arxiv.org/html/2505.07672 ↩︎ ↩︎

  2. WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models, arXiv:2411.05451. The paper introduces hierarchical thought for workflows and a quality-confirmation stage before downstream use. https://arxiv.org/html/2411.05451 ↩︎ ↩︎

  3. Çağatay Demiralp, Fabian Wenz, Peter Baile Chen, Moe Kayali, Nesime Tatbul, and Michael Stonebraker, Making LLMs Work for Enterprise Data Tasks, arXiv:2407.20256. The paper argues that enterprise performance trails public benchmarks and highlights the need for richer context, rules, and careful latency-cost-quality trade-offs. https://arxiv.org/abs/2407.20256 ↩︎ ↩︎

  4. Orchestrating Agents and Data for Enterprise: A Blueprint Architecture for Compound AI, arXiv:2504.08148. The paper describes task planners, data planners, coordinators, streams, budgets, and QoS-aware replanning/confirmation in enterprise agent systems. https://arxiv.org/html/2504.08148 ↩︎ ↩︎

  5. Teola: Towards End-to-End Optimization of LLM-based Applications, arXiv:2407.00326. The paper proposes primitive-level dataflow graphs, execution-graph optimization, and parallel or pipelined execution instead of coarse linear chains. https://arxiv.org/html/2407.00326 ↩︎