A battery pack arrives at an end-of-life processing facility. The easy story says the operator should recover as much value as possible while doing the sustainable thing. The harder story starts five minutes later, when someone has to decide whether to stop, reuse the pack, remove the cover, strip the thermal shield, extract a module, test it, recycle it, or finally admit defeat and dispose of what remains.
This is where circular-economy rhetoric becomes shop-floor arithmetic. Every extra step may reveal more value. Every extra step also burns labour, tools, fixture capacity, diagnostic time, and safety margin. “Recover more” is not a decision rule. It is a slogan with a logistics bill attached.
The paper State-Augmented Graphs for Circular Economy Triage by Fox and colleagues attacks exactly this problem.1 Its contribution is not a new battery-recycling dataset, nor a dazzling neural network pretending to be a forklift. It is a decision framework: represent disassembly as a graph, augment each state with disassembly history, and recursively compare two choices at every point — stop now and commit to a circular-economy route, or continue disassembling and pay the cost of finding out what comes next.
That sounds modest. It is not. In operational systems, “when to quit” is often the expensive question.
Circular-economy triage is not a sustainability ranking problem
A common misunderstanding is to treat circular-economy triage as a ranking exercise: reuse is better than repurpose, repurpose is better than recycle, recycle is better than disposal, and therefore the system should simply push products upward in the hierarchy whenever possible.
That framing is useful for posters. It is less useful for disassembly planning.
The paper frames triage as a sequential operational decision. At each stage, the operator has only partial information. A product’s condition may be estimated through inspection, sensors, diagnostics, or component-level tests. Some routes are physically impossible until certain access steps have been completed. Some routes are unsafe unless isolation or shielding procedures have already happened. Some routes are economically unattractive once the labour and processing cost of reaching them has been counted.
The right question is therefore not:
Which circular-economy route is theoretically best?
It is:
Given what has already been done, what is now feasible, what is the item’s estimated condition, and is the expected value of continuing higher than the value of stopping?
That single change — from static ranking to sequential choice — is the paper’s main intellectual move.
The mechanism starts by remembering how the object got here
Conventional disassembly sequencing planning usually represents a product as a graph. Nodes represent products, subassemblies, or components. Edges represent feasible disassembly steps. This is already more realistic than a flat checklist because real products have precedence constraints: the thermal shield cannot be removed before the cover; a module cannot be extracted before the pack has been opened; diagnostics may require access that has not yet been created.
But the authors argue that an ordinary graph is still missing something essential: history.
Two paths can arrive at what looks like the same physical component, while having completed different prior steps. In a standard graph, those paths collapse into the same node. That is convenient, and sometimes wrong. The component may be the same, but the decision context is not. One path may have already completed a safety procedure. Another may not. One path may have consumed more labour. Another may have left a route legally or physically inadmissible.
The paper’s solution is a state-augmented graph. Instead of representing a decision state only as a physical node $v$, it represents it as an augmented state:
Here, $v$ is the physical item and $\tau$ is the history of disassembly steps already executed.
This is the quiet engineering trick. Once history is included in the state, the next decision can depend only on the current state. That restores the Markov property, which is what allows dynamic programming or reinforcement-learning-style rollouts to evaluate decisions recursively without dragging the entire past around as informal memory.
The practical translation is simple:
| Ordinary disassembly graph | State-augmented disassembly graph | Operational consequence |
|---|---|---|
| “This is module $M_1$.” | “This is module $M_1$ after isolation, cover removal, extraction, and diagnostics.” | The system knows which routes are now allowed. |
| Prior costs may be implicit or lost. | Prior costs are part of the decision context. | Stop-vs-continue comparisons include actual access cost. |
| Same component states can collapse together. | Same component can appear in multiple decision states. | Safety gates and route eligibility can differ by history. |
| Good for structural sequencing. | Good for adaptive triage. | More suitable for condition-aware circular-economy routing. |
This is why the paper is better read as a mechanism paper than as a results paper. The numerical examples matter, but they are there to show how the mechanism behaves.
The stop-vs-continue decision is the core algorithm
Once the state includes history, the paper defines two kinds of actions.
The first kind is a primitive disassembly step: remove a cover, isolate high voltage, extract a module, run diagnostics. These actions move the system to another augmented state and impose a cost.
The second kind is a terminal circular-economy route: reuse, repurpose, recycle, or dispose, depending on what is feasible for the current item.
The utility of a circular-economy option is defined as retained value minus cost:
Here, $H_v$ is a diagnostic health score for the item, and $\tau$ captures the history that affects access cost, process cost, and route admissibility. Reuse and repurpose can depend heavily on condition. Recycling and disposal may be treated as more fixed in the paper’s illustrative example.
At each augmented state, the decision rule compares the best terminal route with the best continuation route:
This is the “stop or strip” logic. Stop if the best admissible circular-economy option is better. Continue if paying for another disassembly step creates enough future value to justify itself.
The paper also adds an admissibility function. This function decides which actions are allowed at a state. It can encode precedence constraints, access gates, safety and regulatory conditions, and resource limits. That matters because a profitable route on paper may be invalid in reality. Reusing a pack before high-voltage isolation is not “aggressive value recovery.” It is how spreadsheets try to injure people.
The EV battery example is illustrative evidence, not an ROI study
The paper’s numerical example uses an electric-vehicle battery pack with two modules. The authors are careful about the status of the example: the values are realistic in shape but arbitrary in magnitude, and the unit is utility rather than necessarily money. The purpose is not to estimate the true economics of battery recovery. The purpose is to show how the framework changes decisions when condition, access history, and route constraints change.
That distinction is important. The example is main evidence for the logic of the framework. It is not empirical validation against industrial data.
The setup is deliberately small. The pack can be reused, repurposed, or recycled. Modules can be reused, recycled, or disposed of. Pack disposal is excluded because the paper assumes a large lithium-ion battery cannot simply be thrown away. Pack recycling is allowed after high-voltage isolation. Pack reuse requires isolation and health above 0.9. Pack repurposing requires isolation, health above 0.7, and thermal shield removal. Module reuse requires extraction, diagnostics, and health of at least 0.8. Module recycling and disposal are allowed after extraction.
The disassembly steps carry costs: high-voltage isolation, cover removal, thermal shield removal, module extraction, and module diagnostics. The example also assigns retained-value functions for reuse and repurpose using residual value ratios based on health scores.
The three cases then show three different decisions.
| Case | Diagnostic situation | Optimal logic shown by the paper | What it demonstrates |
|---|---|---|---|
| Case A | High-health pack, reported pack health 0.92 | Reuse the pack; utility 210. Repurpose is feasible but lower at 147 after added access cost. | When the whole item is good enough, deeper disassembly can destroy value by adding unnecessary cost. |
| Case B | Moderate pack health 0.82, one strong module 0.92, one weaker module 0.72 | Pack reuse is invalid. Pack repurpose gives 70. Extracting modules gives 115 for reusing one module and -25 for recycling the other, total 90. | Component-level recovery can beat whole-pack repurposing, but the margin is narrow. |
| Case C | Degraded health; no health-dependent routes are admissible | Recycle the whole pack; utility 30. Module-level recycling is negative in the example. | The economically rational operator choice can be environmentally weaker, exposing a policy gap. |
Case A is the easiest to understand and the easiest to overlook. A high-health pack can be reused after the necessary safety step. Continuing disassembly only adds cost. The framework does not worship disassembly for its own sake. It allows the system to stop.
Case B is the real demonstration. The pack itself is not healthy enough for reuse, but one module is strong. Repurposing the whole pack produces utility 70. Extracting the modules and reusing the healthier one, while recycling the weaker one, produces total utility 90. That is better, but not by much. A small increase in extraction cost, diagnostic cost, or uncertainty could flip the decision back to repurposing the whole pack.
This is exactly the kind of decision humans often make through experience and rough heuristics. The framework makes the trade-off explicit.
Case C is the policy-relevant one. The model selects whole-pack recycling because module-level recycling is too costly under the assigned values. Economically, that is rational for the operator. Environmentally, it may be suboptimal because it fails to separate materials at the module level. The point is not that operators are villains. The point is that circular-economy outcomes depend on utility gaps. If policy wants deeper disassembly, it has to change the economics of the step where the system rationally quits.
The paper separates three jobs that businesses often confuse
One useful way to read the framework is as a separation of jobs.
First, someone must estimate condition. That may come from sensors, manual inspection, diagnostics, computer vision, vibration analysis, electrical tests, or model-based inference. The paper does not solve that problem. It treats health scores as inputs.
Second, someone must define route admissibility. Which options are allowed after which steps? What health threshold is required? What safety procedure must happen first? Which resource constraints bind? The paper offers a way to encode these rules, not a universal list of them.
Third, the planner must decide whether to stop or continue. That is the core optimization problem the paper formalizes.
This separation is valuable because many AI projects jumble the three together. A company may train a visual model to inspect components and then assume the decision problem is solved. It is not. A health score is not a routing policy. It is one input into a routing policy.
For business implementation, the architecture would look more like this:
| Layer | Business question | Possible system component |
|---|---|---|
| Condition assessment | What state is the item in? | Sensors, inspection models, diagnostic tools, human assessment |
| Admissibility rules | What is legally, physically, and operationally allowed now? | Rule engine, safety checklist, workflow system |
| Utility model | What is each route worth after cost? | Market-price model, processing-cost model, environmental weights |
| Sequential planner | Should we stop or continue? | Dynamic programming, rollout planner, eventually RL under uncertainty |
| Execution layer | Who does the next step? | Human operator, robot cell, work-order system |
The paper directly contributes to the planning and representation layers. It does not claim to deliver the full industrial stack. Good. Papers that solve everything usually solve PowerPoint.
The business value is auditability before autonomy
The obvious automation story is robotic disassembly: let machines inspect, strip, route, and optimize the recovery process. That may eventually happen. But the nearer-term business value is more boring and more useful: auditable decision support.
A recovery operator could use a framework like this to standardize triage decisions across facilities. Instead of relying only on senior technicians’ tacit judgment, the operator can encode route rules, health thresholds, access costs, and utility values. The system can then explain why it recommends reuse, repurpose, recycling, or further disassembly.
That explanation matters. End-of-life processing often sits at the intersection of safety, environmental compliance, resale economics, and labour allocation. A black-box recommendation that says “extract module” is not enough. A useful system says:
- pack reuse is inadmissible because health is below threshold;
- pack repurpose is admissible but has utility 70;
- module extraction plus one reuse and one recycle has expected utility 90;
- the margin is 20 utility units and sensitive to extraction cost;
- therefore continue disassembly only if the facility has the necessary labour and bench capacity.
This is not glamorous AI. It is operational control. Very often, that is where the money is hiding.
The policy implication is a computable subsidy gap
Case C gives the most interesting policy interpretation. The model chooses whole-pack recycling because deeper module-level recycling is not attractive under the utility assumptions. If a policymaker wants deeper disassembly, the framework can identify where incentives must intervene.
This matters because circular-economy policy often speaks in route preferences: reuse more, recycle better, recover critical materials, reduce waste. The paper’s framework translates those preferences into decision points. It can show where the operator’s private optimum diverges from the environmental objective.
That creates a practical subsidy question:
How much would the utility of module-level recycling need to improve before the optimal policy changes?
The paper does not estimate real subsidy levels. It does, however, show the formal place where such a calculation would sit. That is useful. Policy design becomes less about moral encouragement and more about changing the value of the step where rational operators currently stop.
What the paper shows, what Cognaptus infers, and what remains uncertain
The paper is careful about its boundaries, and the article should be too. The framework is promising because it formalizes a real decision problem. It is not yet a validated industrial ROI engine.
| Category | What can be said | Boundary |
|---|---|---|
| Directly shown by the paper | A state-augmented disassembly graph can encode history, enforce Markov-style recursive evaluation, and compare stopping with continued disassembly. | The demonstration is a worked example, not a benchmark against industrial operations. |
| Directly shown by the EV example | Optimal routing can shift from pack reuse to module recovery to pack recycling depending on health and cost assumptions. | The utility values are illustrative and unitless. |
| Business inference | The framework could support auditable triage systems for EV batteries, electronics, medical devices, and machinery recovery. | Real deployment needs calibrated health scores, route values, processing costs, and safety rules. |
| Policy inference | Utility gaps can identify where subsidies or incentives may change operator behaviour. | The paper does not estimate real subsidy amounts or environmental life-cycle impacts. |
| AI inference | RL may become useful once uncertainty and stochastic outcomes are introduced. | The current model is deterministic; RL is future work, not an experimental result in the paper. |
This table is not a polite disclaimer ritual. It protects the real value of the paper. The mechanism is worth understanding precisely because it does not pretend that circular-economy automation is already solved.
Where the framework may be too much — and where it is not enough
State augmentation has a cost. When the same physical component can be reached through multiple histories, the graph duplicates states. That is the point, but also the computational burden. The authors note that state growth can be controlled by merging equivalent trajectory signatures and pruning dominated states. Still, if a process is simple, static, and already well-modelled, a more direct optimization formulation such as mixed-integer linear programming may be sufficient.
There is also a deeper modelling issue: health aggregation. In the example, pack health can be treated as an aggregation of module health. The paper notes that simple aggregation can be crude. A minimum rule, for instance, might condemn an entire assembly because one replaceable component is bad. A weighted average may hide a critical defect. In real operations, “health” is not just a number. It is a contested summary of failure modes, repairability, warranty risk, safety exposure, and resale expectations.
The current framework is deterministic. The authors identify uncertainty as future work: uncertain diagnostic data, stochastic disassembly outcomes, and fuzzy condition assessments. That is where reinforcement learning may become relevant. In the present paper, RL is not the evidence. It is a future solver family made more plausible by the Markov formulation.
The most useful near-term implementation would probably be hybrid: deterministic rules where safety and compliance are non-negotiable, probabilistic models where diagnostics are uncertain, and human review where utility functions encode contested values. No, the robot should not be allowed to discover regulatory compliance through trial and error. Some learning experiences are best left unexperienced.
The managerial lesson: make quitting explicit
Many operational failures come from hidden stopping rules. Workers stop because the next step feels too costly. Managers push deeper recovery because sustainability targets say they should. Finance asks why labour hours exploded. Policy asks why valuable materials remain mixed. Everyone is partly right, which is how organizations become expensive.
The paper’s contribution is to make the stopping rule explicit.
At each state, the system asks whether the best admissible circular-economy route is better than the value of paying for another step. It remembers how the item got there. It respects safety and access gates. It uses health scores without pretending they are the whole decision. It can expose where economics and environmental goals diverge.
For businesses, the immediate lesson is not “deploy reinforcement learning in the recycling plant.” The lesson is more basic: build a triage model before buying the automation theater. Define the states, the histories, the gates, the costs, the health inputs, and the terminal routes. Then decide where AI should help — condition assessment, policy optimization, routing recommendation, or multi-station coordination.
Circular economy will not be operationalized by ranking routes in a brochure. It will be operationalized by thousands of small stop-or-continue decisions, made under safety constraints, uncertain condition data, and very real processing costs.
The paper gives those decisions a formal grammar. That is less flashy than a robot arm. It is also more likely to survive contact with the factory floor.
Cognaptus: Automate the Present, Incubate the Future.
-
Richard Fox, Rui Li, Gustav Jonsson, Farzaneh Goli, Miying Yang, Emel Aktas, and Yongjing Wang, “State-Augmented Graphs for Circular Economy Triage,” arXiv:2512.15824v2, 2026. ↩︎