Agents Without Time: When Reinforcement Learning Meets Higher-Order Causality

Opening — Why this matters now

Reinforcement learning has spent the last decade obsessing over better policies, better value functions, and better credit assignment. Physics, meanwhile, has been busy questioning whether time itself needs to behave nicely. This paper sits uncomfortably—and productively—between the two.

At a moment when agentic AI systems are being deployed in distributed, partially observable, and poorly synchronized environments, the assumption of a fixed causal order is starting to look less like a law of nature and more like a convenience. Wilson’s work asks a precise and unsettling question: what if decision-making agents and causal structure are the same mathematical object viewed from different sides?

Background — Two worlds that never talked

In AI, decision-making agents are formalized through POMDPs. Agents act, observe, update memory, and repeat. Causality is implicit, linear, and unquestioned.

In the foundations of physics, especially quantum information, researchers study higher-order processes—objects that don’t just transform data, but transform entire processes. These frameworks explicitly allow for indefinite or non-classical causal order, provided logical consistency is preserved.

Until now, these two traditions developed in parallel. Same word—agency—entirely different mathematics.

Analysis — The key identification

The paper’s core contribution is a clean equivalence:

Every deterministic finite-memory POMDP agent corresponds exactly to a one-input process function.

This is not a metaphor. It is a bijection.

A deterministic agent is defined by:

A policy: how memory selects actions
A memory update: how observations rewrite memory

A one-input process function is defined by:

A fixed-point condition guaranteeing consistency
A higher-order structure that maps functions to functions

Wilson shows that:

AI View	Physics View
Policy + Memory Update	Process Function
Environment queried step-by-step	Local operations inserted
Behavioral equivalence of agents	Equality of process functions

Crucially, two agents are behaviorally identical if and only if they induce the same process function. No hand-waving, no approximation.

Findings — Why the decomposition matters

A technical but pivotal result is that any one-input process function decomposes into two parts:

One component independent of observations (the policy)
One component dependent on observations (the memory update)

This mirrors the agent architecture exactly. The higher-order object isn’t exotic—it’s simply a reframing of what finite-memory agents already are.

Where things get interesting is when the paper moves beyond the single-agent case.

Implications — Multi-agent systems without a timeline

For observation-independent decentralized POMDPs, the framework extends naturally to multi-input process functions. These allow agents to coordinate without committing to a predefined causal order.

In plain terms:

Agents may influence each other’s actions
Without a fixed “who acts first” structure
While remaining logically consistent and well-defined

This opens the door to strategies that are provably inaccessible to standard sequential policies.

From a business and systems perspective, this reframes coordination problems:

Traditional Multi-Agent RL	Process-Function Perspective
Fixed execution order	Order emerges implicitly
Coordination via messaging	Coordination via fixed points
Time-indexed decisions	Causality as a resource

Conclusion — A quiet but radical bridge

This paper does not promise immediate performance gains, benchmarks, or demos. What it does instead is more dangerous: it dissolves a boundary.

Decision-making agents and higher-order causal processes are the same object wearing different uniforms. Once that is accepted, questions that sounded philosophical—Can causal structure be optimized? Can time be learned?—become engineering problems.

For AI practitioners building distributed agent systems, the takeaway is simple but profound: your agent architecture already lives in a higher-order world—you just haven’t named it that way yet.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Two worlds that never talked#

Analysis — The key identification#

Findings — Why the decomposition matters#

Implications — Multi-agent systems without a timeline#

Conclusion — A quiet but radical bridge#