Path of Least Resistance: Why Realistic Constraints Break MAPF Optimism

Robots do not move through warehouses as clean little dots on a grid. They rotate. They accelerate. They wait behind other robots. They lose time in corners. They obey controllers, not PowerPoint arrows.

This is the small operational fact that makes a large amount of path-planning optimism look slightly overdressed.

Multi-Agent Path Finding, or MAPF, usually asks a neat question: given many agents, each with a start and goal location, can we find collision-free paths for all of them? In the standard version, the world is a graph, time advances in discrete steps, and each robot either moves to a neighboring vertex or waits. It is elegant, measurable, and algorithmically productive. It is also not how a differential-drive robot actually behaves when squeezed through a congested warehouse aisle.

The paper behind this article, Analyzing Planner Design Trade-offs for MAPF under ADG-based Realistic Execution, studies that gap directly.¹ Its core message is not that traditional MAPF is useless. That would be too easy, and also wrong. The message is sharper: the objective that looks good inside a simplified planner can remain useful while still being incomplete, and a planner with a more realistic robot model can beat a mathematically more optimal planner using a less realistic model.

That distinction matters. In operations, “optimal” is not a religious title. It is a claim about which world the optimization problem actually represents.

The planning metric is useful, but it is not the execution metric

The first comparison in the paper is between the classic MAPF objective and actual simulated execution performance.

The classic objective is Sum of Costs, usually abbreviated as SoC. It measures the total number of timesteps required for all robots to reach their goals in the discrete MAPF plan. In benchmark terms, lower SoC usually means a better plan. If every robot were a grid token sliding one cell per timestep, SoC would be a strong candidate for measuring fleet efficiency.

The authors ask a more uncomfortable question: does improving SoC reliably improve realistic execution time?

To test this, they generate multiple plans for the same MAPF instances using MAPF-LNS, an anytime planner that keeps refining solutions within a 60-second runtime limit. They then execute those plans in SMART, a physics-based simulation testbed that uses an Action Dependency Graph, or ADG, to coordinate execution under realistic robot constraints. The experiments span six map types from the MovingAI benchmark: empty, random, room, den312d, maze, and warehouse. The robots are modeled as differential-drive platforms with circular footprints, PID controllers, limited translational and rotational velocities, acceleration constraints, and one-meter grid cells.

This part of the paper is main evidence, not a side ablation. It establishes whether the standard optimization target survives contact with a more realistic execution environment.

The result is not a simple takedown. SoC and Average Execution Time, or AET, show a strong positive linear correlation across maps and robot counts. SoC is not nonsense. It captures a real trend.

But the correlation is not perfectly monotonic. For the same SoC value, AET can vary noticeably. That is the crack in the benchmark wall. Two plans can look equally good under the planner’s cost metric while producing different execution times when robot motion, rotations, dependencies, and congestion are taken seriously.

The authors then extract additional features from the ADG: Type-1 edges, Type-2 edges, rotations, conflict pairs, and robot counts. Type-1 edges describe the temporal order of actions within each robot’s plan. Type-2 edges capture precedence constraints between different robots to avoid collisions. These are not decorative graph features; they are traces of how much coordination the execution framework must enforce.

The regression results make the point cleanly.

Predictor set for AET	MAPE reported in the paper	Interpretation
All features	0.0342	Execution time is best explained by a bundle of planning and execution-structure features.
SoC + rotations	0.070	Adding a physically meaningful action feature sharply improves prediction over SoC alone.
SoC alone	0.1778	SoC is the strongest single predictor, but still leaves substantial execution variation unexplained.
Type-1 edges alone	0.2414	Intra-robot action structure matters, but does not replace the global path-cost signal.
Type-2 edges alone	0.2826	Inter-robot dependency count matters, but is not sufficient by itself.
Conflict pairs alone	0.3175	Conflict structure is relevant, but weak as a standalone predictor.
Robot count alone	0.4317	Fleet size alone is a crude predictor; congestion structure matters more than headcount.

The business interpretation is not “throw away SoC.” The better lesson is: keep SoC, but stop pretending it is the whole operating model.

For a warehouse automation team, this distinction is practical. A planner that reduces path cost may still increase execution friction if it creates more rotations, tighter dependency chains, or fragile passing orders. Conversely, a plan with a slightly worse benchmark cost may produce better throughput if it reduces physical awkwardness during execution. The robot does not invoice you for SoC. It invoices you through cycle time, congestion, idle time, and exception handling.

The realistic model wins execution time, then sends the scalability bill

The second comparison asks whether more accurate MAPF models improve execution performance.

The authors compare three planning models:

standard MAPF;
MAPF with rotation;
MAPF with full kinodynamics, including rotation, speed, and acceleration.

They also test variants using a $k$-robust delay model, which adds bounded temporal slack during planning to tolerate execution delays. This part is best read as a mixture of main evidence and sensitivity testing. The comparison among standard, rotation-aware, and kinodynamic models is central. The $k$-robust variants help probe whether delay tolerance improves realistic execution under different levels of congestion.

The main result is unsurprising in direction but important in magnitude: more accurate models generally produce better AET in SMART. The full kinodynamic model achieves the best AET among the tested models, and rotation-aware planning performs much better than standard MAPF.

That sounds like a straightforward case for always using the more realistic model. Naturally, reality refuses to be that tidy.

The cost is scalability. Incorporating more accurate robot models increases computational complexity. The paper reports that modeling rotations during planning can reduce the maximum number of solvable agents by up to 40%. That is not a rounding error. For a warehouse fleet, it is the difference between a planner that works in a research demo and one that survives peak-hour dispatch.

The $k$-robust delay model also behaves differently depending on scale. With fewer agents, its variants perform comparably across different $k$ values. As robot counts increase, however, performance declines significantly, with lower solution quality and reduced scalability. Delay tolerance is not free robustness. It can become planning overhead that hurts when congestion is already high.

The warehouse map adds an important boundary. In that environment, the full kinodynamic model does not significantly improve over the rotation-aware model. The authors use path visualizations to explain why: warehouse paths have fewer shared vertices than some other maps, reducing the opportunity for kinodynamic modeling to create extra benefits.

This is an implementation detail with strategic meaning. The best model depends on the structure of the operating environment. A dense maze-like layout, a random obstacle field, and a warehouse with relatively separated paths do not create the same coordination problem. A planner selected from a benchmark average may be badly matched to a site-specific layout.

Design choice	What the paper shows	Business reading	Boundary
Standard MAPF	Simple and scalable, but ignores physical execution factors.	Useful baseline; risky as the only procurement benchmark.	Can understate rotation, acceleration, and dependency costs.
MAPF + rotation	Improves AET substantially over standard MAPF.	Often a strong practical upgrade because turning is a real cost. Shocking, I know: robots have bodies.	Reduces scalability; not always enough for all environments.
MAPF + kinodynamics	Produces the best AET in tested model comparisons.	Better physical fidelity can improve throughput.	Higher computational burden; limited added benefit in some layouts such as the warehouse case.
$k$-robust planning	Can tolerate bounded delays but weakens under higher robot counts.	Robustness settings should be tuned, not worshipped.	More slack can reduce scalability and solution quality in congestion.

The decision is therefore not “realism good, simplification bad.” The decision is where realism pays for itself.

In a low-congestion layout with predictable routes, full kinodynamic planning may be overkill. In a dense environment where robots frequently share narrow passages, rotation and acceleration constraints may directly determine throughput. The same algorithmic sophistication can be either a performance feature or a computational tax, depending on the floor.

Optimality loses when it optimizes the wrong world

The third comparison is the paper’s most useful business result: model accuracy versus plan optimality.

Here the authors compare planners using different MAPF models and different optimality levels. To obtain optimal solutions, they use CBS for standard MAPF and CBS with rotation for the rotation model. For suboptimal plans, they use PBS with all three models. They also create a naive anytime planner using Prioritized Planning with random restarts, recording the lowest-SoC plan found within 60 seconds.

The purpose of this experiment is not merely to show another performance chart. It tests a decision that operations teams actually face: when planning time is limited, should you spend computation on making a simplified plan more optimal, or on using a more accurate model even if the solution is not mathematically optimal?

The answer is blunt. Planners using more accurate MAPF models consistently achieve better AET, even when less accurate models produce optimal solutions. In the paper’s tested settings, adding rotation reduces AET by 27–33% compared with standard MAPF, and adding full kinodynamic constraints gives an additional 17–20% improvement.

This is the point where benchmark culture should pause for a small wellness check.

An optimal solution under the wrong model is not operationally optimal. It is internally consistent. That is not the same thing. A route plan can minimize discrete timesteps while ignoring turning costs, acceleration limits, and execution dependencies. The resulting plan may be beautiful inside the mathematical abstraction and mediocre on the warehouse floor.

The paper also notes that the optimal planner does not always produce the best AET because SoC is not a perfect metric. That is consistent with the first experiment: if the objective leaves out important execution features, optimizing it harder does not necessarily fix the missing variables.

There is a second bottleneck: the execution framework itself. ADG uses a conservative strategy, assuming robots can be delayed indefinitely. This supports safe execution, but it can compromise solution quality in some cases. In other words, better planning models help, but execution management can still leave performance on the table.

That creates a three-layer interpretation:

Layer	What can go wrong	What the paper suggests
Objective	SoC captures trend but misses rotations, dependencies, and other execution-relevant features.	Add execution-aware objective features rather than relying on path cost alone.
Planning model	Simplified MAPF can optimize a world that robots do not physically inhabit.	Include rotation or kinodynamics when they materially affect execution.
Execution framework	Conservative ADG execution can introduce bottlenecks even after better planning.	Improve execution frameworks so they are robust without being unnecessarily restrictive.

For business users, this is a planner-selection framework, not a robotics trivia lesson. If the fleet problem is time-sensitive, the question is not “which planner has the best benchmark guarantee?” The question is “which planner-model-execution combination gives the best operational throughput under the site’s real constraints and planning-time budget?”

That is a less elegant question. It is also the one that affects payback period.

The paper’s evidence is a trade-off map, not a single leaderboard

It is tempting to read the results as a ranking:

full kinodynamics;
rotation-aware MAPF;
standard MAPF.

That ranking is directionally useful, but too thin. The paper is better understood as a map of trade-offs.

Test or result	Likely purpose	What it supports	What it does not prove
SoC versus AET scatter plots	Main evidence on objective validity	SoC is strongly correlated with realistic execution time but not sufficient.	That SoC should be abandoned.
ADG feature regression	Diagnostic evidence for missing execution factors	Rotations, dependency edges, conflict pairs, and robot count add explanatory power.	That the exact regression model is a deployable production predictor.
Model comparison across standard, rotation, and kinodynamics	Main evidence on model fidelity	More accurate models generally improve AET.	That the most accurate model is always worth its computational cost.
$k$-robust variants	Sensitivity test around delay tolerance	Robust delay modeling can lose value as congestion and robot count grow.	That robustness modeling is inherently bad.
Warehouse path visualization	Exploratory explanation	Layout structure can reduce the marginal benefit of full kinodynamics.	That warehouse settings never need full kinodynamic planning.
Optimality versus model accuracy comparison	Main trade-off evidence	A less “optimal” plan under a more realistic model can beat an optimal plan under a simplified model.	That optimality guarantees are irrelevant.

This matters because enterprise buyers often convert technical benchmarks into procurement shortcuts. They ask for the fastest planner, the most optimal planner, or the most scalable planner. The paper suggests those are incomplete categories.

A better evaluation should ask four questions.

First, what is the real execution objective? If the business objective is throughput, average completion time, SLA adherence, energy use, or congestion reduction, SoC may only be a proxy. It should be tested against the actual operational metric.

Second, which physical constraints dominate the site? If turning and acceleration consume meaningful time, ignoring them biases planner evaluation. If paths are mostly separated and traffic conflicts are limited, a simpler model may be enough.

Third, what planning time budget is available? A planner that gives better AET after long computation may be useless in a live system that must replan quickly after disruptions.

Fourth, how conservative is the execution layer? A safe but overly rigid execution framework can erase some benefits from better planning. Planning and execution should be evaluated together, not as separate brochure features.

The operational lesson: measure the bottleneck you actually own

For warehouse, manufacturing, and logistics firms, this paper has an uncomfortable implication: the planner benchmark is not the system benchmark.

A robotics vendor may show strong results on a standard MAPF benchmark. That is useful information. It is not enough. The operational system also includes robot kinodynamics, local controllers, traffic rules, communication delays, recovery behavior, and execution scheduling. A planner that performs well in the abstract may still generate plans that are awkward to execute.

Cognaptus’ inference from the paper is that fleet-planning evaluation should move toward execution-aware scorecards. A practical scorecard would separate three things:

Evaluation dimension	Example metric	Why it matters
Planning quality	SoC, makespan, success rate	Shows whether the planner can find good discrete solutions.
Execution realism	AET, rotation count, dependency depth, conflict structure	Shows whether plans remain efficient when robots move physically.
Operational resilience	replan latency, delay sensitivity, congestion recovery	Shows whether the system survives real warehouse variability.

The paper directly supports the first two dimensions and points toward the third. It does not fully solve operational resilience, but it makes clear why resilience cannot be inferred from simplified optimality.

The ROI implication is also specific. If a planner upgrade reduces AET by improving model fidelity, the business value may appear as higher throughput, lower fleet idle time, fewer congestion cascades, or reduced need for additional robots. But those benefits are conditional. They depend on the facility layout, traffic density, robot hardware, controller behavior, and how often the system must replan under disruption.

That is not a generic limitation disclaimer. It changes the procurement question. The buyer should not ask, “Is this planner optimal?” The buyer should ask, “Optimal under which model, executed by which robot, in which traffic pattern, under which delay regime?”

Annoying question. Excellent question.

Where the paper’s boundary sits

The paper uses SMART simulation with differential-drive robots, selected benchmark maps, and ADG-based execution. The robot configuration includes circular footprints, constrained translation and rotation, acceleration limits, and PID controllers. These choices make the experiments more realistic than standard MAPF benchmarks, but they are still a controlled simulation environment.

That boundary affects how the results should be used.

First, the exact magnitudes should not be blindly transferred to another robot fleet. The 27–33% AET reduction from adding rotation and the additional 17–20% improvement from kinodynamics are results from the tested settings. A facility with different robot geometry, controller quality, aisle width, payload behavior, or traffic rules may see different values.

Second, the warehouse-map result should not be overgeneralized. In the paper, full kinodynamics adds little over rotation in the warehouse map because the visualized paths have fewer shared vertices. Another warehouse with narrower aisles, more bidirectional traffic, or different task assignment patterns could behave differently.

Third, ADG is both the evaluation bridge and a constraint. It enables execution-aware simulation, but its conservative assumptions can themselves limit performance. A deployment using a different execution framework may shift the trade-off.

Fourth, the paper studies planner design choices, not end-to-end warehouse economics. It does not model labor savings, maintenance cost, battery consumption, exception handling, or integration complexity. Those belong in a deployment study, not in a MAPF paper.

The correct use of the paper is therefore not to copy a planner ranking. The correct use is to copy the evaluation logic.

The benchmark should follow the robot, not the other way around

The quiet achievement of this paper is that it moves MAPF evaluation closer to the operational question.

It does not dismiss SoC. It demotes SoC from “the answer” to “one useful signal.” It does not declare that realistic models always win. It shows that model fidelity can improve execution time while imposing scalability costs. It does not say optimality is irrelevant. It shows that optimality under a simplified model can lose to a less perfect plan under a more faithful model.

That is the real lesson for automation strategy: optimization is only as good as the operational world encoded inside it.

A fleet planner should not be rewarded for solving a cartoon version of the warehouse with heroic precision. It should be evaluated by how its plans execute when robots rotate, accelerate, queue, yield, and occasionally make everyone else wait because physics has once again refused to respect the benchmark table.

Path planning still matters. It just has to plan for robots, not for dots.

Cognaptus: Automate the Present, Incubate the Future.

Jingtian Yan, Zhifei Li, William Kang, Stephen F. Smith, and Jiaoyang Li, “Analyzing Planner Design Trade-offs for MAPF under ADG-based Realistic Execution,” arXiv:2512.09736v2, 2026. ↩︎

The planning metric is useful, but it is not the execution metric#

The realistic model wins execution time, then sends the scalability bill#

Optimality loses when it optimizes the wrong world#

The paper’s evidence is a trade-off map, not a single leaderboard#

The operational lesson: measure the bottleneck you actually own#

Where the paper’s boundary sits#

The benchmark should follow the robot, not the other way around#