When ERP Meets Attention: Teaching Transformers to Pack, Schedule, and Save Real Money

Furnace loading is not the glamorous side of artificial intelligence. No one gives a keynote about choosing which pile of titanium scrap should enter an induction furnace.

Which is precisely why it is useful.

The paper Enterprise Resource Planning Using Multi-type Transformers in Ferro-Titanium Industry applies a Multi-Type Transformer, or MTT, to two classic combinatorial optimization problems: the Knapsack Problem (KP) and the Job-Shop Scheduling Problem (JSP). It then pushes the method into a real manufacturing allocation case: selecting raw materials for a ferro-titanium furnace batch.¹

The tempting headline is simple: transformers can solve ERP optimization.

The more useful headline is narrower and better: transformers may become practical approximate decision engines when an industrial planning problem can be translated into the right combinatorial form. That translation is not a detail. It is the product.

Start with the furnace, because that is where the abstraction earns its salary

The industrial case in the paper involves an induction furnace used by a ferro-titanium manufacturer. The operational task is easy to describe and annoying to solve: load exactly 1,800 lb of material into the furnace while minimizing raw material cost.

The available inventory includes 14 types of raw material, such as prepared titanium solids, mixed turnings, and other titanium scrap categories. Each material has a unit cost, ranging in the paper from about $0.40 to $1.10 per pound, and each has an availability or usage limit. The production planner needs a batch recipe that fills the furnace and avoids overspending.

That sounds like procurement arithmetic until the combinations start multiplying. Every material type can appear in different amounts. Availability constraints matter. Exact loading matters. Cheaper material is attractive, but only if the resulting combination satisfies operational constraints. Anyone who has watched a spreadsheet become a decision ritual knows the genre.

The paper maps this case into a 0–1 knapsack-style problem. That move is clever, but it is also where the strongest boundary of the paper appears.

A standard 0–1 knapsack problem asks: given a set of items, each with a weight and a value, which items should we select so total weight stays under capacity and total value is maximized? The furnace problem is not naturally that. It is closer to continuous or fractional blending with an exact-fill requirement and a cost-minimization objective.

So the authors reshape the problem.

First, they discretize inventory. Instead of treating material piles as continuous quantities, they generate discrete “containers” or batches. Each generated item represents a quantity of a material type. The paper tests item counts from 50 to 100, effectively changing the granularity of the artificial inventory representation.

Second, they convert cost minimization into value maximization. Since the MTT model is designed for knapsack-style maximization, the authors define a reference price above the maximum material cost. The virtual value of an item becomes the “saving” from using that item instead of paying the reference price:

$$ v_i = (p_{\mathrm{ref}} - c_i) w_i $$

where $c_i$ is the material’s unit cost and $w_i$ is the item weight.

Lower-cost materials therefore receive higher virtual value. Maximizing virtual value pushes the model toward lower-cost furnace loads. Because all transformed values are positive, the model is also encouraged to fill as much capacity as possible. If it slightly underfills, the paper suggests post-processing, such as adding filler material.

This is the key business lesson. The transformer is not magically understanding metallurgy. It is solving a carefully reformulated combinatorial proxy for a manufacturing decision. The proxy is useful because it preserves enough of the operational structure to generate actionable candidate plans.

That distinction matters because “AI for ERP” is otherwise an excellent way to spend money on demos that cannot survive Tuesday morning production meetings.

The model is not just attention; it is attention that respects entity types

The technical contribution sits behind the factory case.

The paper uses a Multi-Type Transformer architecture, drawing on the idea that many combinatorial optimization problems contain different categories of entities. A standard transformer treats tokens as broadly similar objects distinguished by embeddings. That is powerful for language, but ERP problems are structurally heterogeneous.

A machine is not a job. A job operation is not a precedence edge. A capacity node is not an item. Pretending otherwise is not elegant abstraction; it is structural laziness wearing a GPU.

The paper represents KP and JSP as heterogeneous graph problems:

Problem	Representation	What the model must notice
Knapsack Problem	A bipartite structure between item nodes and a capacity node	Item value, item weight, residual capacity, feasibility of selection
Job-Shop Scheduling Problem	A disjunctive graph of operations, precedence edges, and machine-conflict edges	Job order, machine exclusivity, operation duration, partial schedule state

For KP, the model must decide which items are worth selecting under capacity. For JSP, it must decide how operations should be sequenced across machines while respecting job precedence and machine non-overlap.

The shared idea is that type-specific attention can learn different relational patterns for different entity interactions. Item-capacity relations are not the same as job-machine relations. Multi-type attention gives the model a way to encode that difference without building a completely separate solver for every ERP subproblem.

This is where the paper’s general ERP argument comes from. If many enterprise planning tasks can be represented as typed graphs, then a model architecture that handles heterogeneous relations may be reusable across procurement, packing, scheduling, routing, and allocation.

That is the promise. The evidence is more selective.

The benchmark says: packing is ready earlier than scheduling

The paper benchmarks MTT on KP and JSP before applying it to the ferro-titanium case. The results are not symmetrical.

For knapsack instances with 50 to 100 items, the reported optimality gaps are very small: roughly 0.0011 to 0.0019. The MTT solution values are close to the OR solver values across all listed problem sizes.

For JSP, the gaps are much larger: roughly 0.024 to 0.039. They also worsen as problem size increases. That does not make the JSP result useless. It does mean scheduling is a tougher beachhead than packing.

The right interpretation is not “MTT solves ERP.” It is “MTT is much more convincing on static packing-style problems than on dynamic sequencing problems.”

Test in the paper	Likely purpose	What it supports	What it does not prove
KP benchmark, 50–100 items	Main evidence for static resource allocation	MTT can closely approximate OR-solver values on generated knapsack instances	That it beats mature knapsack heuristics in runtime, robustness, or implementation cost
JSP benchmark	Stress test and cross-problem generalization evidence	MTT can represent scheduling structure and produce non-random solutions	That neural scheduling is production-ready for complex shop floors
Ferro-titanium mapped KP	Industrial case validation	A real furnace-loading problem can be transformed into a learned knapsack decision pipeline	That the model directly solves continuous blending, full chemistry constraints, or all ERP planning tasks
Inventory granularity from 50 to 100 items	Sensitivity-style exploration	Discretization granularity affects solution quality; 90 items performs best in the reported case	That 90 is a universal best setting for other factories or materials

The distinction between KP and JSP is operationally important.

Knapsack-style allocation is static. You choose a subset under constraints. Many procurement, blending, inventory allocation, budget selection, and capacity assignment problems look like this after reformulation.

Job-shop scheduling is temporal. Decisions interact through time. A bad sequence can ripple through machines, precedence chains, setups, calendars, and disruptions. The paper’s own results reflect this: the JSP gaps are an order of magnitude larger than the KP gaps.

So if an ERP team wants to experiment with neural combinatorial optimization, the first target should probably not be the entire shop-floor schedule. The first target should be a repeated, bounded, static allocation problem where approximation can be checked and repaired.

Boring? Yes. Also how enterprise systems actually improve.

The ferro-titanium result is stable, not miraculous

The industrial table is the paper’s most business-relevant result.

After mapping the furnace-loading problem to KP, the authors compare MTT against an OR-Tools baseline across different generated item counts. The reported optimality gaps are stable:

Generated item count	OR solver value	MTT value	Optimality gap	Reported time
50	2728	2647	0.029	09 s
60	2743	2670	0.026	12 s
70	2753	2677	0.027	15 s
80	2765	2693	0.026	18 s
90	2768	2697	0.025	23 s
100	2779	2705	0.026	28 s

The best reported gap occurs at 90 items, around 0.025. The rest remain between roughly 0.026 and 0.029.

That is a useful pattern. It suggests the model is not collapsing when the generated inventory representation changes from 50 to 100 items. It also suggests there is a granularity trade-off: too coarse and the model has fewer combinations to work with; too fine and the search structure becomes harder.

But the result should not be exaggerated. The paper does not show that 90 containers is an optimal industrial rule. It shows that, under this specific transformation and generated dataset, 90 items gave the best MTT gap. In another factory, with another inventory structure, another capacity, and another material-cost distribution, the best discretization could change.

The reported time column also needs careful reading. The benchmark table reports times on a single NVIDIA A100 GPU. The paper’s conclusion discusses fast inference, but the runtime reporting is not standardized enough to support a broad speed claim against classical solvers. A fair deployment comparison would need CPU solver time, GPU inference time, batching details, preprocessing cost, post-processing cost, and the cost of maintaining the model.

The business implication is still positive, but more specific: MTT can provide stable approximate candidate plans for a repeated allocation problem after careful reformulation. That is already valuable. It just is not the same as declaring OR solvers obsolete. The funeral for operations research has been scheduled many times; somehow the body keeps attending meetings.

The useful architecture is a hybrid, not a replacement fantasy

The strongest practical design is not “replace the solver with a transformer.”

It is a hybrid pipeline.

The ERP system provides inventory, cost, capacity, and operational constraints.
A transformation layer converts the planning problem into a typed graph or discrete knapsack-like representation.
The MTT model generates a candidate solution quickly.
A feasibility checker verifies capacity, availability, and operational constraints.
An OR solver, heuristic, or repair routine improves or validates the candidate.
The final recommendation returns to the planner with cost, constraint status, and explanation.

In this architecture, MTT is not the judge. It is the fast proposal engine.

That role fits the evidence better. The KP benchmark shows very small gaps. The furnace case shows stable gaps around 2.5% to 2.9%. The JSP benchmark shows that harder dynamic scheduling remains less reliable. A hybrid workflow lets the organization exploit the model where it is strong while keeping exact or rule-based safeguards where feasibility matters.

For ERP teams, this has a practical advantage: adoption can begin with decision support rather than full automation. The planner does not need to trust a neural model blindly. The system can show a recommended loading plan, compare it with current practice, highlight constraint satisfaction, and record whether the human accepts or edits the suggestion.

That feedback loop becomes valuable training and governance data. It also makes the deployment politically easier. In manufacturing, the phrase “the model decided” is not comforting. “The model proposed three feasible lower-cost plans, and the solver verified them” is much less likely to get thrown out of the room.

The business value is in repeated decisions with stable structure

The paper’s relevance is not limited to ferro-titanium. The broader pattern is any ERP decision that repeatedly asks:

Which inputs should we use?
Which orders should receive scarce inventory?
Which production batch should be prioritized?
Which material combination gives acceptable cost under capacity and quality constraints?
Which allocation is close enough to optimal that the remaining gap is cheaper than additional computation or human delay?

The best early use cases share four traits.

Trait	Why it matters
Repeated structure	The model benefits when similar problem instances recur over time
Clear feasibility constraints	Outputs can be automatically checked before use
Measurable objective	Cost, value, makespan, or utilization can be compared against baselines
Acceptable approximation	A small gap is tolerable if the decision is faster, more scalable, or easier to reuse

This is why the furnace-loading case is a better anchor than a general ERP slogan. It has a measurable target. It has clear capacity. It has inventory limits. It can be reformulated into a discrete selection problem. The output can be compared with an OR baseline.

That is the kind of narrow operational wedge where AI decision systems become useful without needing mythology.

The more ambitious path is cross-problem generalization: using a shared MTT-style backbone across packing, scheduling, routing, and allocation. The paper gestures in that direction by treating KP and JSP under one multi-type framework. But the experimental results also show why this has to be staged. Packing first. Scheduling later. Dynamic planning last, after someone has done the unglamorous work of constraints, calendars, disturbances, and baseline comparisons.

Where the paper is strongest, and where it still needs tightening

The paper is strongest in three places.

First, it shows a credible representation strategy. Heterogeneous graph formulation is the right language for many ERP problems because ERP entities are typed: materials, capacities, jobs, operations, machines, constraints, orders, suppliers. A transformer architecture that respects type relationships is more plausible than a generic sequence model bolted onto a database export.

Second, it reports a clear difference between KP and JSP performance. That difference is valuable because it prevents the wrong deployment lesson. If both results were marketed as equally strong, the paper would be less useful. The gap between packing and scheduling tells practitioners where to begin.

Third, the ferro-titanium case demonstrates the importance of problem translation. The reference-price transformation is not a mathematical ornament. It is the bridge between cost minimization and value maximization. The discretization of material inventory is not preprocessing trivia. It defines the actual problem the model can solve.

The paper is weaker where many applied neural optimization papers are weak: comparison discipline.

A stronger study would include more classical heuristics, more neural baselines, ablations of the multi-type architecture, clearer runtime reporting, and more detailed robustness tests under distribution shift. For the furnace case, it would also help to separate the effects of item granularity, inventory generation assumptions, exact-fill repair, and any omitted chemical-composition constraints.

These are not ceremonial limitations. They affect deployment interpretation.

If a simpler heuristic reaches the same gap with less engineering, the business case changes. If a small shift in inventory distribution damages the model, monitoring becomes central. If post-processing is required for exact filling, then the operational system is not purely neural. If runtime excludes transformation and validation, the ROI estimate is incomplete.

Still, those concerns do not erase the contribution. They locate it.

What Cognaptus would infer for an ERP pilot

A sensible pilot inspired by this paper would not begin with a grand platform. It would begin with one recurring allocation problem.

For example: choose raw material inputs for a batch, allocate scarce components across orders, or select inventory lots under capacity and cost constraints. The pilot would build three baselines: current human rule, OR solver or strong heuristic, and MTT-generated candidates. It would track objective value, constraint violations, runtime, edit rate by planners, and stability under changing inventory.

The model would not be allowed to send decisions directly to production at first. It would generate recommendations. Every recommendation would pass through deterministic validation. When the model fails, the failure would be classified: infeasible, too costly, underfilled, unstable, or operationally unacceptable for reasons not encoded in the mathematical model.

That last category matters. ERP optimization often fails not because the algorithm is bad, but because the model omits the reason the planner keeps saying no.

A good AI system should make those omissions visible.

Over time, the organization can decide whether the learned solver is best used as a recommendation engine, a warm-start generator for OR, a scenario simulator, or a fallback planner when exact methods are too slow. The answer may differ by problem type.

That is the real business pathway from this paper: not “buy transformer, save money,” but “identify repeated combinatorial decisions, encode their structure, learn good candidates, verify them, and gradually move the boundary between human planning and automated optimization.”

Less cinematic. More deployable.

The transformer is useful only after the problem has been made honest

The paper’s most important lesson is not that attention can solve knapsack. We already knew neural models could approximate structured decision problems under the right conditions.

The lesson is that ERP optimization becomes AI-ready only after someone has done the translation work: turning messy operational decisions into typed entities, feasible constraints, measurable objectives, and validation routines.

In the ferro-titanium case, the furnace is the discipline. It forces the model to confront capacity, cost, availability, and exact loading. It exposes the difference between a benchmark and an operating environment. It also reveals why the strongest near-term role for transformer-based optimization is not autonomous command, but structured decision support.

MTT looks promising for static packing and allocation. It is less mature for scheduling. Its industrial case is useful, but still mediated by discretization, objective transformation, and validation. That is not a weakness to hide. It is the map for implementation.

ERP systems do not need more AI theater. They need better decisions at the points where combinatorics quietly tax the business every day.

If transformers can help pack the furnace cheaper, validate the plan faster, and let planners spend less time wrestling spreadsheets, that is already real money.

Not magic. Just attention, finally pointed at something useful.

Cognaptus: Automate the Present, Incubate the Future.

Samira Yazdanpourmoghadam, Mahan Balal Pour, and Vahid Partovi Nia, “Enterprise Resource Planning Using Multi-type Transformers in Ferro-Titanium Industry,” arXiv:2601.20696, https://arxiv.org/html/2601.20696. ↩︎

Start with the furnace, because that is where the abstraction earns its salary#

The model is not just attention; it is attention that respects entity types#

The benchmark says: packing is ready earlier than scheduling#

The ferro-titanium result is stable, not miraculous#

The useful architecture is a hybrid, not a replacement fantasy#

The business value is in repeated decisions with stable structure#

Where the paper is strongest, and where it still needs tightening#

What Cognaptus would infer for an ERP pilot#

The transformer is useful only after the problem has been made honest#