When it comes to automating customer service, generative AI walks a tightrope: it can understand free-form text better than any tool before it—but with a dangerous twist. Sometimes, it just makes things up. These hallucinations, already infamous in legal and healthcare settings, can turn minor misunderstandings into costly liabilities.
But what if instead of trusting one all-powerful AI model, we take a lesson from bees?
A recent paper by Amer & Amer proposes just that: a multi-agent system inspired by collective intelligence in nature, combining LLMs, regex parsing, fuzzy logic, and tool-based validators to build a hallucination-resilient automation pipeline. Their case study—processing prescription renewal SMS requests—may seem narrow, but its implications are profound for any business relying on LLMs for critical operations.
When Regex Meets Reasoning: The Renewal Agent
The system begins with a Renewal Agent, not an LLM, but a rule-based parser powered by regular expressions and fuzzy logic. This agent attempts to:
- Extract codes and intent (e.g.,
1 = renew
,2 = stop
) - Remove polite noise (“please”, “thank you”)
- Score its own confidence based on how complete the keyword extraction is
The result? A structured JSON with a fuzzy confidence score (high, mid, low) which is both interpretable and grounded. If confidence is high, automation proceeds immediately. If not, it escalates.
Not All Customers Are Equal: Evaluator + Customer Importance
Next comes the Evaluator Agent, which fuses two signals:
- RA’s Confidence Score
- Customer Importance Score (computed by a Customer Relationship Agent using fuzzy rules like purchase volume and loyalty)
This dual-fuzzy rule system allows the Evaluator to make context-sensitive decisions:
- Confident + Unimportant → Process directly
- Uncertain + High-value → Call the LLM, but double-check
- Low confidence + Low value → Escalate to human support
This avoids blind trust in either regex or LLMs—and smartly allocates computational effort where it matters.
Gemini vs. ChatGPT: The Dual-Lens Validator
When the system decides to consult an LLM, it uses both Gemini and ChatGPT, independently queried via LangChain4J. The returned outputs—keywords, complaints, tone—are not taken at face value.
Instead, the Validation Agent runs:
- Keyword-level sanity checks: If LLM finds a keyword the RA missed, is that real or hallucinated?
- Complaint sanity checks: Gemini reviews ChatGPT’s complaint extraction and vice versa, rating accuracy on a scale from 1–10.
Only when both LLMs agree, or one convincingly validates the other, does the system act. Otherwise, the fallback is clear: send a clarification SMS or escalate.
This mutual hallucination filtering makes the architecture more robust than either agent alone.
Modular Intelligence: Expert Agents for Specialized Tasks
For actionable tasks like appointment scheduling or pharmacy Q&A, the system delegates to Expert Agents, each powered by different tech:
- Pharmacist Agent → Forwards message to real pharmacist
- Store Management Agent → RAG over internal docs
- Scheduling Agent → LLM + Tool-based database querying
This modularity ensures that not everything needs an LLM. Some tasks are best handled by symbolic logic, others by APIs, others still by generative reasoning.
The Payoff: Hallucination Caught in Action
In stress tests with real-world SMS messages, the system caught several subtle issues:
- ChatGPT hallucinated a keyword
renew
once that wasn’t in the text—caught and discarded. - RA failed to catch “stop” in a vague message—but LLM did. Risk was assessed based on medication type before proceeding.
Rather than rely on a brittle binary, the system makes graded, resilient decisions, treating each case with the right level of scrutiny.
Lessons for Business Automation
The real genius of this architecture isn’t in its components—it’s in how they’re stitched together:
- Use regex/fuzzy logic for grounding
- Use LLMs only when necessary—and validate them
- Assign risk weights using business logic (e.g. customer importance)
- Communicate between agents via shared message pools (Kafka), not rigid hierarchies
It’s a safety-first blueprint that Cognaptus clients can adopt today: start small, make each agent modular, and add LLM reasoning only where truly needed.
As LLMs get better, these scaffolds will remain invaluable—not as crutches, but as guardrails.
Cognaptus: Automate the Present, Incubate the Future