When Algorithms Command: AI’s Quiet Revolution in Battlefield Strategy

Dispatch is rarely elegant. A road closes, a shipment misses its window, a critical machine fails, a storm changes direction, and suddenly the tidy plan becomes a historical artefact. The manager, commander, operator, or incident lead is not looking for a philosophical meditation on uncertainty. They need options, fast, preferably before the situation develops a personality.

That is the practical world behind Johan Schubert, Patrik Hansen, Pontus Hörling, and Ronnie Johansson’s paper on autonomous generation of courses of action for mechanized combat operations.¹ The paper is military in its scenario and vocabulary, but its deeper contribution is architectural: it shows a prototype decision loop that generates many candidate actions, simulates their consequences, scores them, and then compresses the results into clusters that a human decision-maker can inspect.

This matters because the popular image of “AI in command” is usually wrong in one of two ways. Either it imagines a cinematic autonomous commander making lethal decisions with theatrical confidence, or it imagines a dashboard that politely decorates human judgement with a few coloured risk indicators. This paper is neither. It is a search-and-simulation engine for decision support. Less glamorous, more useful. A familiar trade, unfortunately.

The authors’ prototype works on a mechanized battalion scenario on Rådmansö, outside Norrtälje in Sweden. Red forces move along fixed routes derived from the earlier IFD03 Information Fusion Demonstrator. Blue forces are placed and moved across a graph of fourteen “boxes”, each representing a possible combat location. The system then asks a brutally practical question: given a number of available blue platoons, where should they be placed so that red forces are stopped while blue losses remain limited?

That is the paper’s surface problem. The more transferable question is better: how do we build AI systems for operational decisions when the answer is not a prediction, but a set of usable alternatives?

The system does not command; it manufactures evaluated options

The first useful correction is semantic. “Autonomous generation” does not mean autonomous command. The system does not replace the decision-maker with a battlefield oracle. It generates configurations, evaluates them, and presents structured alternatives. The human is still the buyer of the recommendation. The algorithm is the very energetic analyst in the back room, except it does not need coffee and will happily simulate thousands of awkward possibilities without complaining to HR.

The authors begin with the problem of positioning a mechanized battalion, or part of one, before and during an operation. Blue units may include armoured infantry platoons and tank platoons. Red units include armoured infantry platoons and tank platoons as well, with the paper’s implementation using thirteen red armoured infantry platoons from an Infantry Battalion (BMP-3) and three tank platoons from an Independent Tank Battalion (51xT80). Blue platoon types are aligned to U.S.-style combat value tables, such as Infantry Battalion (M2) and Armor Battalion (M1A1).

The important move is abstraction. The battlefield is not modelled as a full physical world. It is modelled as a graph of boxes and edges. Red platoons advance along fixed paths. Blue platoons are assigned to boxes and move across the graph. Combat occurs when opposing units meet in a box. Terrain and weather are explicitly disregarded. That is not a small omission, but it is an honest one. The model chooses tractability over realism, because the objective is to test a decision-support methodology rather than to simulate war in all its messy, mud-splattered detail.

The mechanism has five major stages:

Stage	What the paper does	Why it matters beyond defence
Configuration generation	Creates many possible blue force placements across fourteen boxes	Converts “what should we do?” into a searchable option space
Simulation	Runs event-driven movement and combat between red and blue units	Tests consequences rather than ranking options by intuition
Valuation	Scores outcomes using remaining red and blue combat values	Makes trade-offs explicit, even if the scoring model is simplified
Iterative improvement	Uses rank-order selection and genetic algorithms to find better configurations	Learns where promising options are without brute-forcing the universe
Clustering	Groups similar configurations and presents representative options	Keeps human control meaningful by avoiding a swamp of near-duplicates

This is the article’s central point: the intelligence is not in a single clever prediction. It is in the loop.

First, shrink the impossible search space

The paper’s search problem is enormous. With fourteen boxes and sixteen blue platoons, the number of possible groupings can reach $14^{16}$, or about $2.17 \times 10^{18}$, if all platoons are treated as unique. That is not a planning menu. That is a combinatorial landfill.

So the authors do not try to inspect everything. They begin with 256 pre-generated configurations using a non-stochastic Nearly Orthogonal Latin Hypercube, specifically an S-NOLH(16, 256) matrix. Each column corresponds to a possible platoon. Each row corresponds to a configuration to be evaluated. If fewer than sixteen platoons are used, the system selects the relevant number of columns. The point of NOLH is coverage: the initial configurations are designed to sample the input space broadly while maintaining statistical independence.

That first seed set is then improved by two methods.

The first is rank-order search. Configurations are selected according to their ranking, with better configurations more likely to be chosen. A platoon is then moved to another box, with nearby boxes more likely than distant ones. If the resulting configuration is unique, the system simulates and evaluates it. The new configuration joins the pool, and the worst configuration is removed.

The second method is a genetic algorithm. With 5% probability, the system mutates one configuration by changing the box assignment of one platoon. With 95% probability, it performs crossover between two ranked configurations, choosing each platoon’s box from one parent or the other. Again, unique configurations are simulated and inserted into the pool if useful.

The authors combine these two methods using a probability parameter. Their experiments on ten blue platoons found that a search probability of $p = 0.4$ and genetic-algorithm probability of $0.6$ produced the most favourable outcome among the tested values. That result should be read correctly. It is not a universal law of planning, handed down from the mountain. It is a tuning result for this scenario and implementation. Useful, but not sacred.

The business translation is straightforward. When the option space is too large, do not pretend the system has “considered everything”. Seed the space intelligently, search iteratively, and preserve enough diversity that the algorithm does not become a very fast tunnel-vision machine.

Then simulate events, not slogans

Once a configuration exists, the system evaluates it through an event-driven simulation. The initial state includes both sides’ units, their positions, the graph geography, and forthcoming events. The event queue contains two basic event types: move-to-a-new-box and end-of-combat.

Red platoons move according to the scenario’s fixed route and arrival schedule. Blue platoons move toward assigned destinations, with shortest paths calculated across the graph. The paper assumes all blue units move at 30 km/h. If a blue unit encounters an enemy before reaching its intended destination, or is drawn into combat prematurely, that movement is considered illegal and receives a severe penalty; the unit is discarded from the evaluation.

Combat begins when opposing units occupy the same box. If new units arrive during an ongoing battle, the current combat is interrupted, partial outcomes are calculated, and the combat resumes with updated participants. When combat ends, the system updates the relative combat values of participating units. Units falling below the elimination threshold are removed.

The combat model uses the “box method” and combat power analysis tables from military field manuals and related historical-analysis sources. A combat value of 1.0 corresponds to a fully capable armoured battalion; smaller or different unit types receive adjusted values. If force ratios fall between table values, the system interpolates. Combat outcomes depend on unit type, force ratio, relative combat values, and whether the encounter is, for example, a meeting engagement, hasty attack, deliberate attack, hasty defence, or deliberate defence.

This is where the paper is both interesting and fragile. The model has enough structure to make automated comparison meaningful, but it is still a stylized combat model. It is not learning battlefield reality from live data. It is applying formalised assumptions. In business terms, this is like a supply-chain simulator that uses carefully chosen disruption rules rather than a full digital twin of the planet. It may be very useful. It may also be confidently wrong if the rules miss the variable that matters.

Valuation turns judgement into a visible trade-off

After simulation, the system calculates remaining combat values for both red and blue forces. The valuation function prioritises three things: minimizing red breakthroughs, minimizing blue losses, and maximizing red losses. The paper weights blue-loss minimization and red-loss maximization with parameters $\alpha = 0.2$ and $\beta = 0.1$, producing a simplified value function in which lower values are better.

In the scenario results, a configuration value below zero means the red force is halted and blue wins. This is useful because it gives the system an operational threshold. It is not merely choosing the least bad option. It can identify when a configuration crosses from failure to success under the model’s assumptions.

For executives, the lesson is not the formula. Please do not import combat-value arithmetic into warehouse scheduling unless you are trying to make procurement genuinely insufferable. The lesson is that decision-support systems need explicit objective functions. When the model says “best”, it must be possible to ask: best according to what?

Here, “best” means stopping red forces first, then preserving blue combat value, then imposing red losses. In a logistics system, “best” might mean protecting customer-critical deliveries first, then minimizing cost, then preserving fleet availability. In a hospital operations system, it might mean clinical urgency first, then staff constraints, then equipment utilization. The hierarchy matters. Without it, “optimization” becomes a decorative word for whatever the software happened to prefer.

The evidence shows feasibility, not battlefield omniscience

The implementation is built in MATLAB using the Parallel Computing Toolbox. In each iteration, the system explores and evaluates twelve configurations in parallel. Unique configurations are added to the current pool of 256 configurations, while the same number of worst-valued configurations are discarded. The algorithm is treated as converged when no new configuration appears among the forty best-valued configurations over seventeen consecutive iterations.

The results have three main evidentiary roles.

First, the $p = 0.4$ result is a tuning or sensitivity test. The authors vary the balance between rank-order search and genetic algorithm for the ten-platoon case, repeating simulations 100 times for each value. The best outcome occurs when search is used 40% of the time and GA 60% of the time. This supports the chosen implementation setting; it does not prove that hybrid search has a universally optimal ratio.

Second, the platoon-count experiment is main evidence for feasibility in the Rådmansö scenario. The authors run ten repetitions for each number of blue platoons from one to sixteen. In this scenario, seven blue platoons are the threshold needed to halt the red force. That is a meaningful operational result inside the model: it suggests the system can estimate the minimum force level required under its assumptions.

Third, the runtime and iteration behaviour support the “anytime algorithm” argument. With ten platoons, expected elapsed time is about 456 seconds with a standard deviation of 85 seconds when running on twelve CPU cores. The paper also examines how the best-discovered configuration evolves over iterations for seven and ten platoons. The ten-platoon case finds winning configurations more quickly, while the seven-platoon case has higher early variance because fewer configurations halt the red force.

That last point is subtle and important. The hardest operational case is not always the biggest one. Near a threshold, the search space becomes cruel. There may be only a few winning configurations, and the algorithm must find them before the decision window closes. Every operations leader has seen the business equivalent: when resources are abundant, many plans work; when resources are barely sufficient, only a few plans survive contact with reality.

Clustering is where human control becomes practical

The paper’s most underrated contribution is not the search algorithm. It is the clustering step.

A system could simply present the single best configuration. That would be clean, efficient, and probably dangerous. The authors explicitly note that giving a decision-maker only one algorithmic recommendation leaves them with a narrow choice: accept the system or reject it. In the worst case, that makes the decision-support system easy to ignore.

So the prototype clusters evaluated configurations. The goal is to present a range of effective options without overwhelming the decision-maker with thousands of near-identical placements. The paper considers two similarity measures. The first uses structural similarity: how similarly platoons are placed across boxes. The second combines structural similarity with configuration value, grouping options that are both structurally similar and similarly good or bad. The authors use the second measure.

This turns the interface problem into a governance problem. Human control is not preserved by saying “a human remains in the loop” while showing the human one opaque recommendation. Human control is preserved when the system exposes meaningful alternatives: different clusters, different trade-offs, different vulnerabilities, different ways to win or fail.

The clustering results illustrate this clearly. In the seven-platoon case, the system generates 51 clusters, but only two contain winning configurations. In the ten-platoon case, it generates 44 clusters, with more clusters containing winning configurations. The seven-platoon scenario is therefore not merely “smaller”. It is tighter, more brittle, and more dependent on the right placement.

For seven platoons, the best cluster contains fourteen alternative configurations. The paper’s detailed breakdown shows that the best configuration places one armoured infantry platoon in box 8, five armoured infantry platoons in box 12, and one tank platoon in box 14. More importantly, the cluster view shows that this is not just one magic answer. It is part of a local family of similar high-performing configurations.

That is the part businesses should steal. Not the boxes. Not the platoons. The presentation philosophy.

In a disruption-response system, clustering could show several viable recovery families: reroute through hub A, defer low-priority orders, shift capacity to supplier B, or split fulfilment regionally. In infrastructure operations, it could show several load-shedding or restoration sequences. In manufacturing, it could show alternative production recovery plans. The executive does not need 4,500 variants. They need the best representative options, with enough structure to understand why they differ.

The transferable architecture is “generate, simulate, score, cluster, decide”

The paper is easy to misread as a defence niche study. That would be tidy and wrong. The more useful reading is that it offers a prototype architecture for high-stakes operational AI.

The architecture looks like this:

Define the decision space.
Generate a broad initial sample of possible actions.
Use iterative search to improve candidates.
Simulate consequences through an event model.
Score results according to explicit priorities.
Cluster similar high-performing options.
Present representative alternatives to the human decision-maker.
Re-run as conditions evolve.

That loop is relevant wherever three conditions hold.

First, the decision has many possible configurations. Routing, staffing, inventory allocation, fleet repositioning, emergency response, grid restoration, and hospital capacity management all qualify.

Second, the organization can build a credible simulator. “Credible” does not mean perfect. It means good enough to compare options under known assumptions, with known blind spots. A weak simulator at the centre of an elegant search engine is still a weak simulator. Lipstick, pig, you know the paperwork.

Third, the decision window is short enough that manual exploration is inadequate but long enough for iterative computation. The paper’s ten-platoon runtime of roughly 456 seconds on twelve CPU cores is not instant. But it is plausible for some execution-phase decisions, especially if partial results can be shown as an anytime algorithm.

The business value is not “AI makes the decision”. The business value is faster structured exploration. The system can reveal that a plan family is robust, that only a few configurations work, that a resource threshold has been crossed, or that the supposedly obvious plan is fragile. This is cheaper than learning by failure, which remains the traditional enterprise method and, frankly, a reliable source of consulting revenue.

The boundaries are not footnotes; they define the product

The paper’s limitations are not generic academic modesty. They directly shape how the result should be interpreted.

The red force follows fixed routes. This makes the simulation tractable, but it removes adaptive adversarial behaviour. In a business analogue, it is the difference between modelling a storm track and modelling a competitor that sees your move and changes its own. Useful, but not the same game.

Terrain and weather are disregarded. For battlefield use, that is a major simplification. For business use, the equivalent would be ignoring labour rules, port constraints, equipment wear, or supplier incentives. Sometimes simplifications are acceptable. Sometimes they are where the real decision lives.

Combat outcomes are based on field-manual-style tables and historical assumptions. That gives the model structure and explainability, but it also means the outputs inherit the assumptions of those tables. The system is not discovering physics from first principles; it is applying a codified doctrine of consequences.

The experiment is one stylized scenario. Rådmansö is a useful testbed, especially because the red movement patterns are grounded in the earlier IFD03 simulation logs, but one scenario cannot validate general battlefield performance. It shows feasibility. It does not certify deployment.

Finally, the human-system interaction remains an open question. The authors explicitly identify future research questions around offline evaluation, using partial outcomes as seeds for dynamic replanning, grouping results for clearer decision support, and designing interaction between the system and the decision-maker. That last item is not cosmetic. In high-stakes AI, the interface is part of the control system.

What Cognaptus infers for operational AI

What the paper directly shows is feasibility: a prototype can generate thousands of blue-force configurations, simulate their outcomes, identify a threshold force level in a stylized scenario, and present clustered alternatives rather than a single answer.

What Cognaptus infers is broader but bounded: the same design pattern can support business operations where decisions are combinatorial, time-sensitive, and simulation-friendly. The immediate candidates are logistics disruption response, emergency resource allocation, manufacturing recovery planning, infrastructure restoration, and complex field-service dispatch.

What remains uncertain is whether organizations can build simulators trustworthy enough for the loop to matter. The search algorithm can be clever, but it cannot rescue a bad model of reality. It can only optimize inside it. This is the quiet cruelty of operational AI: the glamorous part depends on the boring part. Data definitions, event models, constraint mapping, scenario logs, objective functions, and validation discipline. The usual suspects, standing around looking underfunded.

The paper therefore points to a more sober future for AI decision support. Not autonomous command. Not dashboard theatre. Not a chatbot with a clipboard. The useful system is a disciplined option factory: generate, simulate, score, cluster, and let the human decide with more than vibes and a spreadsheet.

That may be less cinematic than “AI commander”. It is also more likely to survive procurement, audit, and reality. A rare hat-trick.

Cognaptus: Automate the Present, Incubate the Future.

Johan Schubert, Patrik Hansen, Pontus Hörling, and Ronnie Johansson, “Autonomous generation of different courses of action in mechanized combat operations,” arXiv:2511.05182, 2025, https://arxiv.org/abs/2511.05182. ↩︎

When Algorithms Command: AI’s Quiet Revolution in Battlefield Strategy#

The system does not command; it manufactures evaluated options#

First, shrink the impossible search space#

Then simulate events, not slogans#

Valuation turns judgement into a visible trade-off#

The evidence shows feasibility, not battlefield omniscience#

Clustering is where human control becomes practical#

The transferable architecture is “generate, simulate, score, cluster, decide”#

The boundaries are not footnotes; they define the product#

What Cognaptus infers for operational AI#