From Graph to Grit: Diagnosing Warehouse Bottlenecks with LLMs and Knowledge Graphs

In the age of Digital Twins and hyper-automated warehouses, simulations are everywhere—but insights are not. Discrete Event Simulations (DES) generate rich, micro-level data on logistics flows, delays, and resource utilization, yet interpreting these data remains painfully manual, fragile, and siloed.

This paper from Quantiphi introduces a compelling solution: transforming raw simulation outputs into a Knowledge Graph (KG) and querying it via an LLM agent that mimics human investigative reasoning. It’s a shift from spreadsheet-style summaries to an interactive AI assistant that explains why something is slow, where the bottleneck is, and what needs attention.

From Simulation Logs to Semantics

Traditional warehouse DES outputs resemble a haystack of timestamps and queues. Analysts rely on pre-written scripts or aggregate statistics to find bottlenecks. This approach breaks when problems are subtle, multi-stage, or masked by variability.

The authors instead convert simulation outputs into a domain-specific Knowledge Graph, with:

Nodes for entities: Suppliers, Workers, AGVs, Forklifts, Storage Blocks
Edges for transitions: e.g., package moved from Worker to AGV
Edge properties: time taken, wait duration, etc.

This structure captures temporal and causal relationships, unlocking a deeper form of querying. For example, finding packages with high AGV-to-Forklift delay now becomes a matter of graph traversal.

Reasoning, Not Just Retrieval

The core novelty lies in the LLM-based reasoning agent layered atop this KG. It supports two paths:

Operational QA Chain: for questions like “Which AGV had the highest utilization?”
- Decomposes into structured sub-questions
- Generates Cypher queries (not SQL!)
- Uses self-reflection after each step to verify accuracy
Investigative Reasoning Chain: for diagnostic prompts like “Why was AuroraFarms’ unloading slow today?”
- Iteratively formulates sub-questions based on prior answers
- Refines hypothesis over multiple KG queries
- Summarizes insights with supporting evidence

This design mimics human analysts: starting broad, drilling down based on anomalies, and cross-checking across dimensions (e.g., worker utilization, AGV waiting time, forklift delays).

Real Results, Not Toy Examples

The framework was tested on a simulated warehouse with AGVs, forklifts, and workers handling supplier deliveries. Three stress scenarios were introduced:

Scenario	Injected Bottleneck	Agent Diagnosis
1	AGV-to-Forklift delays	Pinpointed AGV-to-FL segment with extreme variance (e.g., 2300s delay)
2	Supplier-specific slowdowns	Linked to underutilized workers and congested AGVs (e.g., 2.6% worker utilization)
3	Slow forklift (FL_00)	Identified high wait and operation times (e.g., +29s over avg)

In each case, the LLM agent not only matched human expert intuition—it went further by producing data-backed causal chains.

Step-wise QA Beats Monolithic Queries

Performance on 25 operational questions showed that traditional LLM querying (Direct QA) struggled with even basic aggregation tasks. The step-wise guided agent, on the other hand, achieved Pass@4 = 1.00 across all categories. It succeeded where others failed, especially on multi-part questions like:

“Which supplier had the shortest discharge time, and how many packages were moved?”

By decomposing into two steps and verifying each, the agent avoided the brittle failures of single-shot prompting.

Implications for Industry

This isn’t just another AI layer—it’s a leap in simulation analytics. Warehouse planners can now:

Query Digital Twins in plain English
Identify root causes, not just symptoms
Experiment with “what-if” scenarios (e.g., removing an AGV or altering supplier timings)

Moreover, the approach is adaptable: while this paper focuses on unloading, the same KG+LLM architecture could be applied to order picking, replenishment, or cross-docking.

Final Thoughts

The brilliance here isn’t in using an LLM—but in structuring the problem to let the LLM reason. With a well-defined KG, an iterative QA loop, and careful Cypher generation, the authors show that complex simulation data can become an interactive assistant.

At Cognaptus, we see this as a blueprint for AI-native warehouse optimization—where human planners collaborate with reasoning agents to co-design resilient, efficient systems.

Cognaptus: Automate the Present, Incubate the Future.

From Simulation Logs to Semantics#

Reasoning, Not Just Retrieval#

Real Results, Not Toy Examples#

Step-wise QA Beats Monolithic Queries#

Implications for Industry#

Final Thoughts#