When the Streets Flood, Let the AI Drive: Reinforcement Learning for Climate‑Resilient Cities

A flooded street is not only a drainage problem.

It is a transport problem, a budget problem, an insurance problem, a public-trust problem, and, if the city waits long enough, a very expensive lesson in pretending that yesterday’s weather statistics are still a planning manual.

Copenhagen is a useful place to begin because the paper’s case is not imaginary. In 2011, the city experienced a major cloudburst that flooded streets, disrupted roads and rail, and caused damage estimated at around 6 billion Danish kroner. The new research paper, Artificial Intelligence for Climate Adaptation: Using Reinforcement Learning for Climate Change-Resilient Transport, uses Copenhagen’s inner city as the testbed for a larger question: how should a city decide where, when, and how much to invest in flood adaptation between 2024 and 2100?¹

That question sounds like engineering. It is partly engineering. But the more difficult part is sequencing.

A city can install soakaways, bioretention planters, storage tanks, permeable pavement, and other adaptation measures. It can also do nothing, which is the cheapest policy until it suddenly becomes the most expensive one. The hard problem is not naming possible interventions. The hard problem is deciding the order, timing, and location of interventions when climate outcomes are uncertain, infrastructure effects decay over time, and transport disruption appears through multiple channels.

This is where the paper’s use of reinforcement learning matters. The authors are not using AI mainly to predict the next flood. That would be useful, but it would also be the obvious headline. The more interesting contribution is that they turn climate adaptation into a sequential decision problem. The system learns adaptation pathways: staged portfolios of interventions that trade off investment cost against avoided infrastructure damage, travel delay, and cancelled trips.

In plain business language, this is not “AI predicts disaster.” It is closer to “AI stress-tests capital allocation over 76 years.” Less cinematic, more useful. Annoying how often that happens.

Copenhagen is the case, but the planning dilemma is general

The study models Copenhagen’s inner city as 29 traffic assignment zones. For each year-like decision step in the 2024–2100 horizon, the system can apply adaptation actions across zones. Each action has an effect, cost, maintenance burden, and lifetime. Once deployed, an intervention remains active for its specified lifetime, while effectiveness can decay.

The paper’s integrated assessment model links four components:

Module	What it does	Why it matters for planning
Rainfall projection	Samples future rainfall events under RCP2.6, RCP4.5, and RCP8.5 climate scenarios	Gives the planning model different climate futures rather than one convenient forecast
Flood model	Maps rainfall into water accumulation and flood depth	Translates climate exposure into physical disruption
Transport simulation	Routes simulated trips across walking, cycling, and driving networks under flooded conditions	Converts water depth into mobility consequences
Impact accounting	Computes infrastructure damage, travel delay, trip cancellation, adaptation cost, and maintenance cost	Makes the adaptation problem comparable in economic terms

The transport simulation is concrete enough to matter. The paper uses 84,000 trips within Copenhagen’s inner city: 2.7% by car, 19.3% by bicycle, and 78.0% on foot. Water depth reduces travel speeds differently by mode. Some routes become slower. Some trips are rerouted. Some become impossible and are treated as cancelled. The paper values cancelled trips at 80% of the original no-flood trip cost, while being careful that this is not the full economic loss of the lost activity. It is an estimate of the value of the trip to the traveler.

That detail matters because it shows what the model is actually optimizing. It is not optimizing “urban happiness,” “resilience,” or any of the other magnificent nouns that appear in policy decks when the spreadsheet has become tired. It optimizes a defined economic reward:

$$ R = -(\text{Infrastructure Damage} + \text{Travel Delays} + \text{Trip Cancellations} + \text{Action Costs} + \text{Maintenance Costs}) $$

Higher reward means lower total cost. The reinforcement learning agent is rewarded for reducing total long-term economic impact, not for building more infrastructure.

That distinction is central. A naive adaptation policy can spend heavily and reduce flood impacts, while still being a poor policy because the extra investment and maintenance costs exceed the avoided losses. The paper’s random-control baseline demonstrates exactly this problem.

The AI is not a forecaster; it is a policy explorer

The reinforcement learning formulation treats the city as a graph. Nodes represent the 29 traffic assignment zones. Edges encode spatial proximity. Each node carries features related to infrastructure damage, travel delays, trip cancellations, and current intervention status.

This graph structure is not decorative. Flooding and transport disruption are spatial phenomena. Water accumulates in places, spills across terrain, blocks links, and shifts trips through the network. A spreadsheet of zones would miss part of that structure. A graph policy can learn that decisions in one zone may matter because of nearby zones, connected routes, and shared disruption patterns.

The authors use a graph convolutional neural network inside a PPO reinforcement learning agent. The policy maps the current city graph into node-wise action decisions. In effect, the model sees a changing urban network and outputs a spatially distributed intervention plan.

A useful way to read the architecture is this:

Technical design choice	Operational meaning
Graph representation of zones	The city is treated as an interdependent network, not a list of isolated districts
Node-level actions	The policy can choose different interventions in different zones
Action masking	Already active interventions are excluded from infeasible repeated choices
Intervention lifetimes and decay	Infrastructure is modeled as a long-lived but not magical asset
Reward as negative total cost	The agent is penalized both for flood damage and for overspending on adaptation

This is why the paper is more interesting than a generic “AI for climate” story. It is not merely automating a known decision. It creates a planning sandbox in which different sequences of decisions can be explored before a city locks itself into concrete, asphalt, tanks, maintenance contracts, and future regret.

The reduced tests show why static optimization runs out of road

Before applying the model to the full Copenhagen case, the authors compare reinforcement learning with Bayesian Optimization on reduced problem instances. This is not the main policy result. It is a methodological validation test.

That matters because Bayesian Optimization is a strong tool for expensive black-box simulations, but it is not naturally built for a massive sequential spatial policy problem. The authors therefore compare methods only in smaller settings where Bayesian Optimization remains feasible: Experiment A covers 5 years and 10 zones; Experiment B covers 10 years and 29 zones.

The results are reported as total reward, where less negative is better:

Reduced test	Climate scenario	Bayesian Optimization	Reinforcement learning	Interpretation
Experiment A	RCP2.6	-119.58 ± 0.25	-118.84 ± 1.10	RL performs slightly better
Experiment A	RCP4.5	-120.10 ± 0.38	-119.01 ± 0.85	RL performs better
Experiment A	RCP8.5	-121.24 ± 0.30	-119.23 ± 1.28	RL performs better
Experiment B	RCP2.6	-121.19 ± 1.45	-117.87 ± 0.57	RL advantage grows
Experiment B	RCP4.5	-120.51 ± 1.71	-117.09 ± 0.64	RL advantage grows
Experiment B	RCP8.5	-123.71 ± 1.42	-119.88 ± 1.54	RL advantage grows

The paper reports that RL’s advantage ranges from about 73 million to 200 million DKK in Experiment A, and about 332 million to 383 million DKK in Experiment B. The important pattern is not just “RL wins.” The more useful interpretation is that the advantage becomes larger as the problem becomes more spatially and temporally complex.

This is what one would expect if the core value of reinforcement learning is adaptivity. Bayesian Optimization can work well when the search problem is static and relatively contained. But the full adaptation problem is not contained. With 29 zones, 8 possible actions per zone, and a 77-year horizon, even the simplified action-plan space becomes roughly:

$$ (8^{29})^{77} \approx 4 \times 10^{2016} $$

That is not a search space. That is a polite way of telling brute force to go home.

The full Copenhagen case shows disciplined investment, not infrastructure maximalism

The full case study evaluates Copenhagen’s inner city over 2024–2100. The reinforcement learning policy is compared against two simple baselines under the intermediate RCP4.5 scenario:

Strategy	Role in the experiment	What it tests
No Control	No adaptation measures are deployed	The cost of doing nothing
Random Control	Adaptation actions are selected randomly	Whether spending without coordination helps
Reinforcement Learning	A learned policy selects staged actions by zone and time	Whether adaptive sequencing improves total outcome

The RL policy achieves the highest cumulative reward, exceeding No Control by 22% and Random Control by 408%. The second comparison is especially important. Random Control reduces some flood impacts, but it does so by spending too much, too often, and too incoherently. It is the infrastructure equivalent of panic-buying.

The RL policy behaves differently. It reduces infrastructure damage, travel delays, and trip cancellations while keeping adaptation and maintenance costs more moderate. The paper describes a gradual and targeted investment pattern rather than aggressive early deployment everywhere.

This is the business lesson hiding inside the technical result: the model is not rewarded for appearing proactive. It is rewarded for being cost-effective over time.

That should interest more than city governments. Transport operators, insurers, infrastructure investors, and engineering consultancies all face versions of the same problem. The practical question is rarely “Can we spend money to reduce risk?” Of course we can. Give enough people enough budget and they will reduce something, possibly including the budget itself. The harder question is whether the staged portfolio of interventions produces enough avoided loss to justify its timing, placement, and maintenance obligations.

The learned pathway is spatially selective

The most revealing result is not the total reward. It is the structure of the learned adaptation pathway.

Under RCP4.5, the reinforcement learning strategy focuses mainly on four of the eight available measures:

Adaptation measure	Share of actions in the learned RCP4.5 pathway
Soakaways	57%
Bioretention planters	28%
Storage tanks	13%
Porous asphalt	2%

On average, the policy applies 4.41 actions per zone and 1.68 adaptation measures per year.

This is not a glamorous finding, which is why it is useful. The agent does not produce a heroic citywide rebuild. It discovers a relatively sparse portfolio dominated by specific measures, deployed gradually and unevenly across geography. Bioretention planters and soakaways appear across nearly all regions. Storage tanks concentrate mainly in central zones. Porous asphalt appears mostly in a few specific zones.

The paper’s evidence here is exploratory but valuable. It suggests that the model is not simply learning “more adaptation is better.” It is learning that intervention type, geography, and timing interact.

For a real planning office, that is where decision support begins. A model that says “flood risk will rise” is informative but not operationally sufficient. A model that says “these zones tend to justify earlier soakaways, these central zones tend to justify storage tanks, and some expensive surface interventions appear only in limited places” is closer to a capital planning conversation.

Still, this should not be mistaken for a procurement recommendation. The paper does not prove that Copenhagen should now go buy exactly this bundle of interventions. The results are conditional on the model’s rainfall sampling, flood simulation, transport assumptions, cost parameters, and adaptation-effect estimates. The right business reading is not “copy the pathway.” It is “copy the decision workflow, then localize the data.”

Climate beliefs change the cost of being wrong

The paper’s climate-scenario experiment is one of its most practically useful parts because it separates the climate a policy is trained for from the climate that later materializes.

The authors train policies under one climate belief—RCP2.6, RCP4.5, or RCP8.5—and test them under realized scenarios that may differ. This is a robustness and sensitivity test, not a second main thesis. Its purpose is to examine what happens when planning assumptions are wrong.

Belief scenario used for training	Realized RCP2.6	Realized RCP4.5	Realized RCP8.5	Average reward
RCP2.6	-107.42 ± 1.55	-107.53 ± 1.54	-109.94 ± 1.19	-108.30
RCP4.5	-107.42 ± 1.55	-107.67 ± 1.29	-109.45 ± 1.15	-108.18
RCP8.5	-110.18 ± 1.27	-110.66 ± 0.90	-113.03 ± 1.78	-111.29

The intermediate RCP4.5-trained policy has the best average reward across realized scenarios. Policies trained under more pessimistic RCP8.5 assumptions show robustness logic, but they also carry higher precautionary investment costs. Optimistic policies can work well if the future remains mild, but they degrade under more severe realized conditions.

This is the kind of result business readers should treat carefully. The paper does not prove that “moderate assumptions are always best.” It shows that in this modeled Copenhagen environment, with these costs, interventions, and rainfall scenarios, intermediate-scenario training produced the most cost-effective average result. The broader lesson is not “always pick the middle forecast.” The broader lesson is that scenario belief is itself a strategic input.

That is highly relevant to boards, insurers, infrastructure funds, and public agencies. Climate adaptation is often presented as a moral urgency or technical requirement. It is both, but it is also a belief-management problem. If an organization plans around mild assumptions, it may under-adapt. If it plans around extreme assumptions, it may over-invest too early. Reinforcement learning does not remove that governance judgment. It makes the consequences of the judgment more visible.

What the paper directly shows, and what Cognaptus infers

The paper’s direct contribution is a computational framework. The business interpretation should not outrun that evidence.

Layer	What is supported	What should not be overclaimed
Direct paper result	RL can learn coordinated spatial-temporal adaptation pathways in a Copenhagen flood-transport simulation	RL has not been proven to produce real-world optimal Copenhagen infrastructure policy
Method validation	RL outperforms Bayesian Optimization in reduced tractable settings	The test is not a universal benchmark across all optimization methods
Full case evidence	RL beats no-control and random-control baselines under RCP4.5	Random control is intentionally weak; stronger planning baselines may narrow the gap
Scenario analysis	Climate-belief assumptions change performance and robustness	RCP4.5 is not universally the “correct” planning belief
Business inference	The framework can become a planning sandbox for staged capital allocation	It is not yet an automated city-planning authority, mercifully

The strongest business use case is decision support before irreversible investment.

A city could use this kind of framework to compare adaptation pathways before committing budget. A transport operator could evaluate network vulnerability under different rainfall and intervention assumptions. An insurer could use similar simulations to understand how public adaptation pathways change future exposure. An infrastructure investor could use the method to assess whether resilience capex has credible avoided-loss logic rather than ceremonial sustainability language.

For AI consultants and enterprise teams, the lesson is also architectural. The value does not come from placing a chatbot on top of climate reports. It comes from connecting domain simulators, economic accounting, scenario logic, and sequential decision optimization. In other words, the valuable system is not a talking model. It is a decision environment.

A chatbot can explain the flood. This framework asks what to build before the next one.

The implementation boundaries are not footnotes; they define the product

The limitations are not fatal, but they are structural.

First, the environment is simulation-based. The flood model uses SCALGO Live, a simplified event-based tool. The paper assumes uniform rainfall distribution of unspecified duration over the study area. That is a legitimate modeling choice for the study, but it means learned policies are conditional on the flood model’s behavior.

Second, rainfall scenarios are discrete. The paper evaluates RCP2.6, RCP4.5, and RCP8.5 rather than probabilistic evolving climate trajectories. Real decision-makers do not live inside a neat three-scenario menu. Beliefs update, emissions pathways change, local planning constraints evolve, and climate models improve. The authors explicitly identify probabilistic climate information and adaptive belief updating as future work.

Third, the reward is economically focused. Infrastructure damage, travel delay, trip cancellation, intervention cost, and maintenance cost are measurable and necessary. But adaptation policy also has distributional consequences. Which neighborhoods benefit first? Who bears disruption during construction? Which groups lose access when trips are cancelled? The paper recognizes that future models should include social well-being and equity objectives. That is not decorative ethics; it changes the objective function.

Fourth, training is computationally expensive. The authors note that scaling to larger study areas, more adaptation measures, and richer climate scenarios will require efficiency improvements, possibly through surrogate models or metamodels. This matters for commercialization. A planning sandbox that takes too long or costs too much to run becomes an academic artifact with excellent intentions and poor product-market fit.

Fifth, the current baselines are useful but not exhaustive. No Control and Random Control clarify the value of coordinated adaptation. Bayesian Optimization provides a comparison in reduced settings. But real planning offices use heuristics, engineering rules, budget constraints, political constraints, and expert judgment. A next practical benchmark would compare RL-generated pathways against expert-designed adaptation plans under the same assumptions.

These boundaries do not make the framework weak. They tell us how to use it. It should be treated as an exploratory policy laboratory, not a machine that prints the correct infrastructure plan while everyone else politely resigns.

The management lesson is staged commitment under uncertainty

The managerial value of this paper is not that AI can “solve climate adaptation.” That sentence is too large to be useful and too convenient to be trusted.

The real lesson is narrower and better: reinforcement learning can help organizations reason about staged commitment under uncertainty.

Climate adaptation is expensive because it binds the future. Build too little and floods impose recurring losses. Build too much too early and the city pays for unnecessary assets and maintenance. Build in the wrong place and risk simply moves through the network. Wait too long and options disappear. This is exactly the type of problem where a sequential decision system can add value.

For Cognaptus readers, the paper offers a template for evaluating AI in infrastructure-heavy domains:

Question	Why it matters
Does the AI operate inside a realistic decision environment?	Otherwise it is only forecasting, not planning
Are costs and benefits explicitly represented?	Otherwise “optimization” becomes a slogan
Can the policy adapt over time?	Long-horizon risk rarely arrives in one clean installment
Are spatial dependencies modeled?	Infrastructure networks fail through connections, not isolated dots
Are scenario assumptions visible?	Hidden beliefs become hidden liabilities
Are limitations tied to decisions?	Generic caution is cheap; decision-relevant boundaries are useful

This is also why the case-first reading is stronger than a standard paper summary. The Copenhagen example reveals the real problem: not whether RL is mathematically interesting, but whether it can help planners avoid two bad instincts—doing nothing because uncertainty is uncomfortable, or doing everything because uncertainty is frightening.

The learned policy chooses neither. It spends, but selectively. It adapts, but gradually. It prepares for risk, but still counts the bill.

That is a refreshingly adult version of AI.

Conclusion: let the AI drive the simulation, not the city

The paper’s best contribution is not a claim that reinforcement learning should take over urban planning. The authors are more disciplined than that. They frame the system as exploratory decision support: a way to examine long-term trade-offs, adaptation pathways, and robustness under climate uncertainty.

That framing is exactly right.

Cities need better ways to test infrastructure decisions before those decisions become concrete. Climate risk is too nonlinear for static planning, too expensive for trial and error, and too political for black-box automation. A reinforcement-learning integrated assessment model can help by making the decision space visible: what to build, where to build, when to wait, and how assumptions about the future change the cost of being wrong.

So yes, when the streets flood, let the AI drive.

But only inside the simulation first.

The steering wheel outside still belongs to planners, engineers, citizens, and the people who will have to live with the drainage system after the dashboard demo is over.

Cognaptus: Automate the Present, Incubate the Future.

Miguel Costa, Arthur Vandervoort, Carolin Schmidt, João Miranda, Morten W. Petersen, Martin Drews, Karyn Morrisey, and Francisco C. Pereira, “Artificial Intelligence for Climate Adaptation: Using Reinforcement Learning for Climate Change-Resilient Transport,” arXiv:2603.06278, 2026, https://arxiv.org/abs/2603.06278. ↩︎

Copenhagen is the case, but the planning dilemma is general#

The AI is not a forecaster; it is a policy explorer#

The reduced tests show why static optimization runs out of road#

The full Copenhagen case shows disciplined investment, not infrastructure maximalism#

The learned pathway is spatially selective#

Climate beliefs change the cost of being wrong#

What the paper directly shows, and what Cognaptus infers#

The implementation boundaries are not footnotes; they define the product#

The management lesson is staged commitment under uncertainty#

Conclusion: let the AI drive the simulation, not the city#