A planner stalls.

Not because the goal vanished. Not because the system lacks compute. Not even because the heuristic is completely wrong. It stalls because the heuristic has temporarily stopped saying anything useful. Every nearby state looks equally unpromising, or worse, misleadingly unpromising. The algorithm is still running, naturally. It is very busy being lost.

This is the situation Daniel Platnick, Dawson Tomasz, Eamon Earl, Sourena Khanzadeh, and Richard Valenzano study in Breadth-First Search vs. Restarting Random Walks for Escaping Uninformed Heuristic Regions.1 The paper examines a deceptively practical question in AI planning: when a greedy search method enters an uninformed heuristic region, should it escape by systematically expanding outward with breadth-first search, or by repeatedly launching bounded random walks until one finds a better state?

The easy answer is ideological. Breadth-first search sounds disciplined. Random walks sound like algorithmic shrugging. So the tidy story would be: serious systems use exhaustive local search; random exploration is what you do when you have run out of cleverness.

The paper’s result is more useful, and therefore less tidy. Random walks are not automatically better. They are also not a desperate fallback. They become rational when the geometry of the stalled region makes systematic enumeration expensive and when a single random walk has a high enough chance of hitting an escape. That is the mechanism. Everything else in the paper — the expected-runtime bounds, the EHC variants, the PDDL experiments, the memory comparison — is there to expose when that mechanism holds and when it does not.

The real enemy is not bad search; it is silent guidance

Greedy Best-First Search and Enforced Hill-Climbing work because they assume the heuristic provides useful direction. In planning, a heuristic is a compressed guess about distance or promise. When it works, search can avoid wasting time on irrelevant states. Charming. Almost suspiciously so.

The trouble starts in what the paper calls an Uninformative Heuristic Region, or UHR. This is a region around a current state where no path produces heuristic improvement until an exit is found. Plateaus and local minima are familiar examples. Inside a UHR, the heuristic does not help distinguish “closer” from “farther”. A greedy method is now trying to navigate a room where all signs point to the same wall.

Enforced Hill-Climbing is designed around this problem. It repeatedly tries to make heuristic progress. When direct progress is unavailable, it runs a local breadth-first search to find an escape state: a state whose heuristic value improves. Once it finds such a state, it commits and repeats.

That design embeds a strong assumption. It assumes the right way to escape silence is to examine states outward in strict depth order. This is safe and predictable. It also has a bill attached.

Breadth-first search must test everything shallower than the first escape depth before it can reach the shallowest escape layer. If the local region branches heavily, the number of shallower states can explode. Systematic exploration becomes systematic taxation. Very principled, very expensive, and exactly the kind of thing that looks noble in a textbook and painful in production.

Restarting random walks pay a different bill. A random walk samples a path through the state space up to a depth limit. If it finds an escape, it succeeds. If not, it restarts from the original state and tries again. This avoids maintaining large open and closed lists, but it may revisit the same states repeatedly. It replaces exhaustive coverage with repeated independent chances.

So the comparison is not “order versus chaos”. It is “enumerate the prefix” versus “buy repeated chances”.

Breadth-first search pays before it gets to be lucky

The paper’s first useful move is to express the difference as expected runtime, measured by the number of goal tests or state generations.

Suppose a search task has $g$ goal states uniformly distributed at the shallowest goal depth $d^\ast$, with $S_{<d^\ast}$ denoting states at depths shallower than $d^\ast$ and $S_{d^\ast}$ denoting states at depth $d^\ast$. For breadth-first search, the expected runtime is:

$$ E[B(T)] = |S_{

The important part is not the fraction. It is the first term.

Breadth-first search always pays $|S_{<d^\ast}|$. It cannot avoid examining the shallower frontier. More goals at the escape depth help, because the search is likely to find one sooner once it reaches that depth. But they do not refund the cost of the shallow layers. BrFS has to walk through the lobby before it can check whether the exits are crowded.

Restarting random walks behave differently. If the probability that a single walk of maximum length $\ell$ succeeds is $p_g$, then the paper shows:

$$ E[R^C_\ell(T)] \leq \frac{\ell}{p_g} + 1 $$

This is the whole argument in miniature. BrFS depends on the number of states it must enumerate before reaching the relevant layer. RRW depends on how expensive each attempt is and how likely each attempt is to work.

Low $p_g$ is fatal. If random walks almost never hit an escape, restarting is just repeatedly being wrong with confidence. But as $p_g$ rises, RRW improves quickly because it does not need to exhaust the shallower region. It only needs one successful trajectory.

The crossover condition in the paper makes this explicit:

$$ p_g \geq \frac{\ell} {|S_{

In plain terms: random walks win when a single walk is likely enough to find an escape relative to the amount of systematic work BrFS must do. That phrase “likely enough” is doing all the work. It turns random exploration from folklore into an algorithm-selection condition.

Why deeper regions can favour randomness

The paper’s directed-tree analysis sharpens the intuition. Consider a tree with branching factor $b$, goals uniformly distributed at depth $d^\ast$, and unbiased random walks. In a constant-branching tree, the number of states at depth $d^\ast$ is $b^{d^\ast}$, while the shallower states total $(b^{d^\ast}-1)/(b-1)$.

For a branching factor of 4 and goal depth 6, there are 4096 states at the goal depth. The paper’s Figure 1 shows the expected-runtime comparison as the number of goals increases. BrFS is much faster when there are very few goals. That is not surprising: random walks have poor odds when escapes are rare. But RRW catches up quickly as goals become more numerous, because each random trajectory has a better chance of landing in a useful place.

This produces a slightly counterintuitive lesson. The absolute number of goals needed for random walks to overtake BrFS increases with goal depth. But the density of goals needed at that depth decreases. In larger tree-like spaces, random walks can become competitive even when useful exits are a small fraction of the frontier, because the cost BrFS pays to enumerate the shallower tree grows so severely.

That is the mechanism worth remembering. Random walks are not better because they are random. They are better when systematic search has to examine a combinatorial prefix while random sampling can reach useful exits with non-negligible probability.

Here is the practical version.

Situation inside the stalled region BrFS behaviour RRW behaviour Likely winner
Shallow escape, few alternatives Checks shallow states without replacement May resample the same wrong states BrFS
Deep escape, many possible exits Pays for all shallower layers Needs one successful trajectory RRW
Many transpositions or repeated states Duplicate detection helps Walks may revisit equivalent states BrFS
Low-memory environment Maintains open and closed lists Keeps only one walk in memory RRW
Escapes can lead to dead ends Shallowest escape may preserve resources Fast escape may be a bad escape Often BrFS, but context-dependent

This is where the popular misconception breaks. Random walks are not a universal upgrade. The paper explicitly identifies cases where BrFS should retain an advantage, especially when the escape depth is shallow or when duplicate detection matters. The sober conclusion is not “replace BFS with randomness”. It is “use randomness when the topology makes exhaustive discipline wasteful”. Less bumper sticker, more engineering.

EHC-RRW is the experiment, not the whole story

The theoretical analysis compares BrFS and constant-depth restarting random walks as escape mechanisms. To test the idea in a planning setting, the authors modify Enforced Hill-Climbing. Standard EHC uses BrFS to escape each UHR. The paper introduces two variants:

  1. EHC-RRW$_\ell^C$, which replaces BrFS with constant-depth restarting random walks.
  2. EHC-RRW$^L$, which uses the Luby restart policy, varying walk lengths according to the classic restart sequence.

The Luby policy matters because choosing the “right” walk depth requires information the searcher usually does not have. If the walk limit is too short, random walks cannot reach the escape. If it is too long, they waste time wandering beyond useful opportunities. Luby restarts hedge across depths: many short walks, occasional longer ones, then longer still. It is not clairvoyant, but it is a disciplined way to be ignorant. Finally, a kind of ignorance management we can all respect.

The theoretical contribution here is also careful. Because EHC-RRW is stochastic, it does not have the same deterministic completeness guarantee as EHC. However, on EHC-complete problems — where there are no dead ends, or all dead ends are recognized by the heuristic — the paper shows finite expected runtime for the RRW variants under appropriate conditions. In bounded-UHR STRIPS planning domains, the paper extends this to polynomial expected runtime: EHC-RRW$_\ell^C$ has polynomial expected runtime if $\ell$ is at least the bounded exit distance, and EHC-RRW$^L$ has polynomial expected runtime through the restart-policy guarantee.

This is not a cosmetic modification to EHC. It shows that replacing local BrFS with RRWs does not simply throw away the theoretical safety net in the settings where EHC is already known to behave well. The guarantee changes from deterministic termination bounds to expected runtime bounds, because randomness insists on bringing probability to the meeting.

What the experiments are actually testing

The empirical section is best read as a topology test, not a horse race.

The authors evaluate EHC and the two EHC-RRW variants using Fast Downward on the optimal and agile Autoscale benchmark suites. They use 17 PDDL domains previously classified under the Hoffmann local-search topology taxonomy, tested with the unit-cost FF heuristic. Each problem has a 10-minute time limit and 3.5 GB memory limit, and results are averaged across five runs. Constant-depth RRW is tested with $\ell = 10, 25, 50$. Luby RRW is tested with base multipliers $m = 1, 2, 4$.

That setup has three distinct purposes.

Evidence component Likely purpose What it supports What it does not prove
Expected-runtime theorems Main mechanism Identifies when RRW beats BrFS in expectation Does not claim RRW is universally superior
EHC-RRW formal properties Theoretical extension Shows RRW escape can preserve finite or polynomial expected-runtime guarantees in relevant EHC settings Does not make stochastic EHC deterministically complete
PDDL coverage on 17 domains Main empirical evidence Tests whether topology predicts relative performance in planning benchmarks Does not establish production performance across all agent systems
Walk length and Luby multiplier variants Sensitivity test Shows constant-depth RRW can be parameter-sensitive and Luby restarts can reduce depth-choice fragility Does not solve adaptive restart selection
Implementation appendix Implementation detail and isolation control Explains why the EHC implementation differs from Fast Downward’s existing version Does not introduce a new planner intended to beat all baselines
Memory comparison Resource-behaviour evidence Confirms the expected memory advantage of RRW variants Does not show memory was the limiting factor in these experiments

This distinction matters because a lazy reading would ask, “Which row wins?” The better question is, “Which topology makes the mechanism visible?”

The benchmark results are mixed in exactly the useful way

The coverage table does not crown a single champion. Good. A clean victory would have been suspicious.

In bounded-UHR domains on the optimal track, all methods solve all 180 attempted problems. That matches the strong theoretical expectations. But on the agile track, bounded-UHR domains remain challenging because large instances and high branching factors can make even shallow escape distances expensive. In the logistics domain, the paper notes a maximum exit distance of 1, yet the branching factor is so high that only tens of states are expanded per second. “Shallow” does not mean “cheap” when the hallway has a thousand doors.

On bounded-UHR agile problems, standard EHC has higher total coverage than the RRW variants: 103.6 versus 99.6, 99.6, and 98.0 for constant-depth RRW, and 97.2, 97.4, and 96.4 for Luby variants. This is consistent with the theory: when escapes are shallow, BrFS often benefits from searching without replacement and from finding the nearest escape.

Unbounded-UHR domains are more interesting. On the optimal track, EHC solves 219.6 out of 240 on average. Constant-depth RRW with $\ell=50$ reaches 223.0, and Luby RRW variants reach 222.6, 223.0, and 223.8. On the agile track, EHC solves 97.2 out of 220, while Luby RRW variants reach 107.8, 107.8, and 107.0. The paper notes that the Luby advantage is largely driven by the two pipesworld domains, and that outside those domains performance is similar to EHC.

That is not a weakness. It is the point. RRW helps when the local state-space structure cooperates. It is not a holy solvent poured over planning.

The constant-depth RRW variants also show parameter sensitivity. In unbounded-UHR optimal problems, $\ell=10$ gives 196.0 coverage, while $\ell=50$ gives 223.0. That is the practical cost of choosing a fixed depth. If your walks are too short, you do not reach the exit. If your walks are longer, you may reach deeper escapes — but too much walking can also waste effort. Luby restarts exist because the right depth is often unknown before search begins.

The incomplete domains are the warning label. These are domains with unrecognized dead ends. EHC performs best overall, though the paper notes this category has only three domains: airport, freecell, and mprime. On the optimal track, EHC solves 41.6 out of 90 on average, while the RRW variants solve fewer. On the agile track, EHC solves 25.0 out of 90, again ahead of the RRW variants.

The authors’ hypothesis is plausible and important: in domains where actions can exhaust resources such as fuel, a shallow escape may be safer because it consumes fewer resources before leaving the UHR. A random walk may escape quickly but choose a “bad” escape that leads into a dead-end region. In other words, not every exit is a good exit. Anyone who has followed a parking garage sign into a dead-end ramp will find this deeply unsurprising.

Memory is the quiet business argument

The memory result is simple but operationally relevant. BrFS maintains open and closed lists. The paper describes worst-case memory for BrFS as $O(B^D)$, while constant-depth RRW needs only $O(\ell)$ because it keeps one walk in memory.

The empirical memory comparison shows clear memory advantages for the EHC-RRW variants on EHC-complete domains, even though none of the methods actually ran out of memory in the tested problems. That boundary matters. The experiment confirms the expected direction of the memory effect; it does not show memory was the bottleneck in this benchmark suite.

For business systems, this is where the result becomes more than planner trivia. Many real AI planning workloads are not run in an academic benchmark harness. They sit inside orchestration engines, agentic workflow systems, robotics controllers, automated scheduling tools, route planners, supply-chain simulators, or operations platforms with resource constraints and latency budgets. In those settings, the right local escape mechanism is not merely an elegance question. It affects tail latency, memory footprint, fallback behaviour, and the ability to keep making progress when heuristics become temporarily useless.

The practical translation is not “install RRW”. It is:

Operational signal Engineering interpretation Sensible response
Stalls occur in broad, branching local regions BrFS may spend too much enumerating shallow states Try RRW or hybrid escape
Escape attempts often succeed with short sampled paths Single-walk success probability is likely high RRW becomes attractive
Many equivalent states or transpositions appear Random walks may repeatedly sample duplicates Prefer BrFS or add duplicate-aware sampling
Shallow exits frequently work BrFS gets useful escapes with low overhead Keep BrFS as default
Local escapes often lead to irreversible resource loss Nearest escape may be safer than random escape Bias toward BrFS or evaluate escape quality
Memory pressure dominates Open/closed lists become operational cost Use RRW or portfolio strategies

This is Cognaptus’ inference from the paper, not a direct empirical claim. The paper studies classical planning benchmarks, not enterprise workflow agents or warehouse robots. Still, the mechanism is portable: when a system must navigate large discrete decision spaces with imperfect heuristics, the topology of stalled regions should influence the escape strategy.

The business value is algorithm selection, not algorithm fashion

The business lesson here is diagnostic. Too many AI system discussions treat search behaviour as an implementation detail buried underneath the model. That is increasingly naïve. As agentic systems become planners rather than mere responders, they need reliable ways to recover when their guidance signals flatten out.

A language-model agent decomposing a procurement workflow can get stuck. A robotics planner can face a local configuration plateau. A scheduling optimiser can encounter many equivalent-looking partial assignments. A logistics simulator can spend most of its time proving that obvious-looking branches are useless. In each case, the question is not whether the system has “intelligence”. The question is whether it has a good escape policy when its local heuristic becomes mute.

This paper suggests a useful design pattern: treat escape from heuristic silence as a portfolio decision.

A production-grade planner might use BrFS first when exit distance appears shallow, duplicate density is high, or escape quality matters. It might switch to RRW when local branching becomes expensive, memory pressure increases, or repeated stalls indicate that systematic enumeration is paying too much prefix cost. A Luby-style restart schedule can hedge when the right walk depth is unknown. And a monitoring layer can record which escape mechanism succeeds in which topology, gradually turning local search behaviour into an observable operational metric.

That last point matters. The most useful business output is not a theoretical preference for randomness. It is instrumentation. Measure local branching, duplicate rates, escape depth, walk success rate, memory growth, and downstream quality of chosen escapes. Then choose the escape mechanism dynamically. Radical stuff: letting evidence decide.

Where the result should not be overextended

The paper’s limits are precise, and they affect how practitioners should use it.

First, the core runtime comparison uses assumptions such as uniformly distributed goals at the goal depth and, in the directed-tree analysis, unbiased walks. These are not guaranteed in real planning landscapes. They are analytical models designed to reveal the crossover mechanism.

Second, RRW’s success depends on the single-walk success probability $p_g$. In production, that value is rarely known directly. It must be estimated from observed escape attempts, local topology features, or offline benchmarking.

Third, BrFS handles transpositions better because it performs duplicate detection. In state spaces with many repeated or equivalent states, RRW can behave as though it is searching a larger tree than BrFS. This is not a small detail. It can erase the sampling advantage.

Fourth, not all escapes are equal. In domains with unrecognized dead ends, a fast escape can be a bad commitment. The paper’s incomplete-domain results are a useful warning: escaping quickly is not the same as escaping wisely.

Fifth, the empirical results are based on PDDL planning benchmarks using the FF heuristic under specific runtime and memory limits. They support the paper’s planning claims and illustrate the mechanism. They do not certify that RRW will improve arbitrary business agents, robotics stacks, or optimisation services without domain-specific validation.

These boundaries do not weaken the paper. They prevent the usual interpretive vandalism where every algorithmic result becomes a universal architecture mandate by lunchtime.

Randomness becomes useful when it is bounded, restarted, and measured

The cleanest way to read this paper is as a study in disciplined randomness.

A random walk by itself is not a strategy. It is a sample path. Restarting turns it into repeated opportunity. A depth limit turns it into controlled exposure. A Luby schedule turns uncertainty about the right depth into a principled sequence. Expected-runtime analysis turns the whole thing into an algorithm-selection rule.

Breadth-first search remains the right tool when the escape is shallow, sparse, duplication-heavy, or when the shallowest improvement is safer than an arbitrary one. Restarting random walks become attractive when the stalled region is large, combinatorial, memory-sensitive, and rich enough in exits that a sampled path has a meaningful chance of success.

That is the useful replacement for the misconception. Random walks are not what planners do when they give up. They are what planners should consider when the cost of being orderly exceeds the value of order.

In business terms, the paper points toward AI planning systems that diagnose their own local search conditions instead of blindly applying a single escape routine. The future of agentic planning is not merely better heuristics. It is better behaviour when the heuristic goes quiet.

Very inconvenient for anyone hoping one neat method would win forever. Fortunately, engineering has always been less interested in neatness than in getting unstuck.

Cognaptus: Automate the Present, Incubate the Future.


  1. Daniel Platnick, Dawson Tomasz, Eamon Earl, Sourena Khanzadeh, and Richard Valenzano, “Breadth-First Search vs. Restarting Random Walks for Escaping Uninformed Heuristic Regions,” arXiv:2511.09549, 2025, https://arxiv.org/abs/2511.09549↩︎