Routing Without Running Out: How Bilevel Optimization Rewires EV Logistics

Routes look clean on a dashboard. A line leaves the depot, touches a sequence of customers, maybe bends toward a charging station, and returns home. The illusion is that route planning is still mostly about drawing the shortest useful line.

Electric fleets ruin that illusion rather quickly.

A diesel truck can treat refueling as an annoying but usually minor detail. An electric vehicle cannot. Battery capacity turns distance into feasibility. Charging stations turn geography into detours. A route that looks efficient before charging may become expensive after charging; a route that looks wasteful may avoid a much uglier charging pattern. This is why the Electric Capacitated Vehicle Routing Problem, or E-CVRP, is not merely the old vehicle-routing problem wearing a green jacket. It is a coupled routing-and-energy problem, and coupling is where algorithms go to lose their innocence.

The paper behind this article, Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem, by Yinghao Qin and co-authors, makes one useful move: it stops pretending that routing and charging are the same decision, but it also refuses to treat them as fully separable.¹ The result is a bilevel formulation and a lightweight search algorithm, b-LAHC, that uses routing cost as a cheap guide while calling the more expensive charging optimizer only when it is likely to matter.

That mechanism is the important part. The benchmark wins are useful; the design pattern is more reusable.

The useful lie: route cost is not the answer, but it is a good first filter

The paper’s central trick is to use a partial truth deliberately.

In E-CVRP, a complete solution has two parts. First, vehicles must be assigned customer-serving routes: which customers each vehicle visits, and in what order. Second, given those routes, charging stations must be inserted so the vehicle can actually complete the trip without violating battery constraints. The full cost is not just the customer route. It is the route plus the detours required by charging.

A simplified way to express the relationship is:

$$ \text{Complete cost} = \text{routing cost} + \text{charging detour cost}. $$

The routing cost is cheap to evaluate. The charging detour cost is not, because even for a fixed route the lower-level charging insertion problem can become combinatorially difficult. The authors refer to this fixed-route charging task as the Fixed Route Vehicle Charging Problem, and treat it as the follower problem inside a bilevel structure.

So the algorithm uses the upper-level routing objective as a surrogate: it first asks, “Is this customer route promising before we worry about charging?” Only when a candidate route is close enough to the best routing solution found so far does the algorithm activate the lower-level charging optimizer.

This is a useful lie. It is useful because short customer routes often lead to good complete EV routes. It is a lie because “often” is not “always,” and the paper’s experiments show precisely where the lie becomes dangerous.

That distinction matters for business systems. A practical logistics optimizer cannot afford to run a deep feasibility and charging calculation on every route mutation. It also cannot afford to rank every route by pre-charging distance and hope battery reality politely agrees. The paper’s bilevel design lives between those two mistakes.

Why EV routing breaks the ordinary routing shortcut

Classical capacitated vehicle routing already has enough unpleasantness: customer partitioning, route sequencing, capacity limits, and local minima everywhere. E-CVRP adds battery feasibility and charging stations, which change the meaning of a “good” route.

The authors contrast their bilevel approach with two common modeling habits.

The first is node replication. Charging stations may need to be visited multiple times, so one exact modeling strategy is to replicate charging nodes enough times to represent repeated visits. Mathematically, this is clean. Computationally, it is the sort of cleanliness that makes the room larger every time you sweep. More replicated nodes mean a much larger search space.

The second is pre-enumerating feasible charging station paths between non-charging nodes, then solving a routing problem over that expanded structure. This can reduce some complexity, but the number of feasible charging paths still grows quickly. The paper notes that even with pruning, earlier formulations can generate hundreds or thousands of feasible charging paths on relatively small instances.

The bilevel formulation avoids making every charging detail visible at the top level. It separates the decision into two layers:

Layer	Decision	Operational question	Why it matters
Upper level	Customer-serving routes	Which vehicle visits which customers, in what sequence?	This controls most of the routing geometry and is cheap enough to search repeatedly.
Lower level	Charging insertion	Where must the vehicle stop to remain battery-feasible, and at what detour cost?	This turns a plausible route into an executable EV route.

This is not just decomposition for elegance. The separation changes what the algorithm can afford to do. Upper-level route moves can be explored frequently; lower-level charging optimization can be used selectively.

The paper also imposes a practical bound: between any two successive non-charging route nodes, at most two charging station visits are allowed. The authors justify this as consistent with short- to medium-range EV logistics, where routes requiring more than two consecutive charging stops between service nodes would often be poor candidates for electric delivery in the first place. That assumption is reasonable for the benchmark world of the paper. It is also one of the first boundaries a real deployment team should check before applauding too loudly.

The surrogate works broadly, then fails exactly where ranking gets expensive

The paper’s most interesting evidence is not the leaderboard table. It is the correlation test between the cheap upper-level routing objective and the complete objective after charging.

The authors collect unique solution pairs encountered during search and compare two rankings: one based on the surrogate routing cost, and one based on the complete EV route cost. Across all 17 benchmark instances, the reported Kendall rank correlations average above 0.97, with 15 of 17 instances above 0.95. At first glance, this looks like permission to use the routing surrogate aggressively.

Then comes the uncomfortable detail.

Top-set recall tells a more selective story. On several large X-instances, Recall@1% drops sharply even when rank correlation remains high. The paper reports Recall@1% of 0.0278 for X573 and 0.3169 for X819, while their Kendall correlations remain 0.9811 and 0.9661 respectively. Translation: the broad ranking is highly aligned, but the very top of the list can disagree.

That is not a contradiction. It is exactly what one should expect in hard optimization. A surrogate can be excellent at saying, “Search in this general region.” It can still be poor at saying, “This exact candidate is the best complete EV route.” The closer the competition among candidate routes, the more charging detours can reshuffle the final ranking.

This is the paper’s core correction to the naive two-stage mindset:

Reader belief	Correction from the paper	Business consequence
“Find the best route first, then add charging.”	Routing cost is highly correlated with full EV cost, but the best surrogate routes do not always contain the best complete routes.	A production optimizer should use routing-only search for filtering, not final selection.
“Charging optimization must be run at every step.”	Full lower-level evaluation is expensive and often unnecessary during early search.	Compute should be spent conditionally, especially at scale.
“More vehicles always means higher cost.”	In EV routing, extra routes can reduce charging detours and improve total cost.	Fleet utilization and route count should be search variables, not fixed prejudices.

The sarcastic version: the shortest-looking route is often smart, but occasionally it is just very confident nonsense.

b-LAHC turns the split into a search discipline

The algorithm, b-LAHC, is built around three phases.

First, it creates an initial upper-level route and runs greedy descent. This quickly reduces routing cost and pushes the search into a promising region. The move operators modify route structure through relocation, swaps, edge replacements, inter-route changes, and one especially important operator, M8, which can move a customer into an empty route and thereby increase the number of routes.

Second, it explores neighborhoods using Late Acceptance Hill Climbing. Standard hill climbing accepts a candidate if it improves the current solution. Late acceptance is more forgiving: it may accept a candidate if it improves upon a solution seen several iterations earlier. This gives the search some room to move through temporarily worse terrain instead of panicking at the first uphill step. Very mature behavior, for a heuristic.

Third, it performs final charging refinement. During search, the lower-level optimizer is triggered conditionally when the candidate routing cost is close enough to the best surrogate cost. At the end, the algorithm invokes a more exhaustive charging optimization on the best-found routing plan to verify and polish the returned bilevel solution.

The architecture can be read as a compute budget policy:

Use cheap route-only evaluation to move quickly.
Use conditional lower-level evaluation when a route is promising.
Use exhaustive charging refinement only at the end, where accuracy matters most.

This is why the paper’s “fixed-parameter” claim matters. The authors are not proposing a baroque adaptive machine with many moving parts and an interpretability invoice attached. They configure three main parameters: history length, maximum attempts, and follower activation threshold. The resulting method is a single-point metaheuristic with a relatively transparent decision rhythm.

For logistics systems, transparency is not decoration. Operators need to understand why a plan changed, why a vehicle was split into a new route, or why a charging detour was accepted. A heuristic that performs well but behaves like a haunted spreadsheet is not a great foundation for operational trust.

The benchmark results support scale, not magic

The experimental design uses the IEEE WCCI-2020 EVRP benchmark: 17 instances, including 7 small E-set cases with no more than 100 customers and 10 large X-set cases ranging from 142 to 1,000 customers. The paper evaluates under both fixed evaluation budget and fixed runtime settings, while treating the fixed evaluation budget as the fairer comparison because runtime can be distorted by hardware, language, compiler, and implementation differences.

Under the Max Evals criterion, b-LAHC performs particularly well on the large instances. The paper reports that it surpasses the best-known solution on 9 of 10 X-set instances and underperforms only on X214, with a +0.35% mean gap. On the small E-set, it beats the best-known solution on E33, ties on E23 and E30, and stays close on the remaining four instances, with the largest deviation reported as +0.71% on E76.

Against HHASA-TS, the second-best method in the reported comparison, b-LAHC records 11 wins, 2 ties, and 4 losses at the mean level. The robustness evidence is also meaningful: b-LAHC has lower standard deviation than HHASA-TS on 12 of 17 instances overall, including 9 of 10 large X-instances. The authors report a 39.5% average reduction in standard deviation overall and 46.2% on the X-instances.

This matters because logistics planning does not only need a clever best run. It needs repeatability. A route optimizer that occasionally produces brilliance and occasionally produces confetti is a research demo, not an operations layer.

The convergence evidence adds another useful nuance. On large X-instances, b-LAHC does not dominate early. The paper says it starts weaker than several competitors in the early phase, likely because it begins from a random solution and relies on greedy descent to reach a useful local optimum. Later, however, it dominates on all large instances except X214. That pattern supports the mechanism-first reading: the method’s value is not superior initialization; it is the bilevel search discipline that pays off after the search has had time to exploit the surrogate-plus-follower structure.

The ablations are not footnotes; they explain the mechanism

The paper includes ablation and sensitivity analyses. They should not be read as extra ornaments. They are the tests that tell us which part of the mechanism is doing the work.

Experiment	Likely purpose	What it supports	What it does not prove
Correlation and top-set recall between routing cost and full cost	Main mechanism validation	Routing cost is a strong broad surrogate, but top candidates can be misaligned.	It does not prove route-only optimization is sufficient.
Comparison with state-of-the-art methods	Main performance evidence	b-LAHC is competitive overall and especially strong on large X-instances.	It does not prove dominance under every real fleet constraint.
Convergence analysis	Behavioral evidence	The method’s advantage appears later, after bilevel search structure has time to work.	It does not prove the initialization strategy is best.
Removing greedy descent	Ablation	Greedy descent is important, especially for large instances.	It does not show greedy descent alone is enough.
Removing final charging refinement	Ablation	Final lower-level polishing improves solution quality, especially at larger scale.	It does not mean exhaustive charging should be used throughout the search.
Disabling the bilevel framework into a naive two-stage pipeline	Ablation of the central thesis	Joint bilevel optimization is materially better than decoupled route-then-charge solving.	It does not settle every possible alternative decomposition.
Hyperparameter sensitivity	Robustness/sensitivity test	Moderate settings balance exploration and evaluation effort.	It does not remove the need for recalibration in new operational environments.

The most important ablation is the one that degenerates the method into a naive two-stage pipeline. The authors report consistent deterioration across nearly all instances, with stronger degradation on large cases. This is the direct empirical answer to the common misconception: yes, route cost is a powerful surrogate; no, that does not license solving the upper level first and treating charging as an afterthought.

The M8 operator is another small but revealing detail. Most move operators preserve or reduce the number of routes. M8 can increase the route count by moving a customer into an empty route. In E30, the authors report that without this kind of move, some runs converged to inferior three-route solutions, while better outcomes used four routes. That sounds counterintuitive only if one is still thinking in fuel-truck logic. In EV routing, an additional route can reduce charging detours enough to lower total cost.

This is the kind of operational lesson that can disappear if one only reads the leaderboard. The algorithm did not merely search routes; it allowed the fleet structure to change when battery constraints made “fewer routes” a false economy.

The business value is conditional expensive computation

For Cognaptus readers, the business lesson is not “use b-LAHC tomorrow.” The lesson is more general: many AI and optimization systems should treat expensive reasoning as conditional, not universal.

In EV logistics, the expensive reasoning is charging optimization. In document automation, it might be retrieval plus verification. In financial monitoring, it might be a deep scenario simulation. In customer operations, it might be human escalation. The same pattern appears again and again: cheap signals are usually good enough to narrow the field, but not good enough to decide the final answer.

The paper’s architecture suggests a practical planning template:

System layer	Cheap action	Expensive action	Trigger logic
Candidate generation	Search customer routes using routing-only cost	None	Run broadly.
Candidate filtering	Keep routes near the best surrogate cost	Run charging optimization	Trigger only when the route is promising.
Final decision	Select best complete solution	Exhaustive lower-level refinement	Trigger at termination or before dispatch.

This is a useful way to think about ROI. The goal is not merely to reduce computation. The goal is to move computation to the point in the decision process where it changes the answer.

A fleet-planning vendor could use this pattern to design tiered optimization: fast route exploration for many candidate plans, conditional charging checks for promising plans, and final feasibility refinement before dispatch. A logistics firm could use it to compare depot layouts, vehicle range policies, and charging-station placement strategies without solving every candidate plan at maximum fidelity from the beginning.

But the inference should stay disciplined. The paper directly shows benchmark performance for E-CVRP under its assumptions. Cognaptus infers that the same conditional-computation pattern is valuable for real logistics software. What remains uncertain is how the method behaves when the real world adds traffic, time windows, driver schedules, charger queues, nonlinear charging curves, heterogeneous vehicle batteries, maintenance constraints, and customers who believe “between 9 and 10” means “whenever I happen to be ready.”

Boundaries before deployment enthusiasm gets ideas

The study is strong as an algorithmic contribution, but it is not a complete fleet-management product. That is not a criticism. Papers are allowed to be papers. The problem starts when readers promote benchmark success into operational certainty.

The first boundary is modeling scope. The paper focuses on E-CVRP with homogeneous EVs, cargo capacity constraints, battery constraints, and charging station insertion. It does not fully model every feature that real delivery firms care about: time windows, stochastic traffic, charger availability, queueing, partial charging policy, electricity pricing, driver labor rules, or heterogeneous fleets.

The second boundary is the two-charging-stops assumption between successive route nodes. The authors present it as practical for short- to medium-range EV use. For many urban and regional delivery contexts, that is plausible. For sparse charging networks, long rural routes, emergency logistics, or immature charging infrastructure, it may need re-examination.

The third boundary is benchmark transfer. The IEEE WCCI-2020 benchmark is useful precisely because it creates common ground for comparison. But common ground is still artificial ground. Real fleet geography has depot constraints, business rules, service priorities, charging contracts, and messy operational exceptions.

The fourth boundary is implementation integration. The method’s elegance depends partly on having a clean separation between upper-level routing and lower-level charging. A real system must integrate data quality, map services, vehicle telemetry, charger status, and dispatch workflows. The algorithm can be the engine, but the engine still needs a vehicle.

None of these boundaries weakens the paper’s main message. They locate it.

The real contribution is not a better route; it is a better way to spend search effort

The paper reports strong benchmark outcomes: 10 new best-known solutions overall, 2 matches, and especially strong performance on large X-instances. Those results deserve attention. But the more durable contribution is the mechanism behind them.

The authors show that routing-only cost is a powerful but imperfect surrogate. They use it to search cheaply, not to decide blindly. They show that lower-level charging optimization should be invoked selectively, not mechanically. They show that final refinement matters. They show that EV routing sometimes rewards using more routes, not fewer, because battery detours can overwhelm old route-count instincts.

That is a sober lesson for optimization in the AI era. The fashionable move is to throw larger models, more agents, or more adaptive machinery at every decision. This paper goes in a quieter direction: decompose the problem, identify the cheap signal, measure where it fails, and spend expensive computation only when the failure could affect the answer.

Not glamorous. Very useful. Logistics, unlike conference slides, still has to arrive.

Cognaptus: Automate the Present, Incubate the Future.

Yinghao Qin, Mosab Bazargani, Edmund K. Burke, Carlos A. Coello Coello, Zhongmin Song, and Jun Chen, “Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem,” arXiv:2604.13013, 2026, https://arxiv.org/abs/2604.13013. ↩︎

The useful lie: route cost is not the answer, but it is a good first filter#

Why EV routing breaks the ordinary routing shortcut#

The surrogate works broadly, then fails exactly where ranking gets expensive#

b-LAHC turns the split into a search discipline#

The benchmark results support scale, not magic#

The ablations are not footnotes; they explain the mechanism#

The business value is conditional expensive computation#

Boundaries before deployment enthusiasm gets ideas#

The real contribution is not a better route; it is a better way to spend search effort#