Opening — Why this matters now

Everyone wants AI agents that can “act.” Few can explain what that actually means in a market context. Generating text is trivial. Simulating decisions under constraints—price, inventory, demand elasticity—is where things start to look suspiciously like… economics.

The uncomfortable truth is this: most AI systems today can talk like consumers, but they don’t behave like them. They lack price sensitivity, memory of past purchases, and—perhaps most critically—any coherent response to incentives.

This paper introduces MALLES, a multi-agent LLM economic sandbox designed to close that gap. Not by adding more prompts, but by forcing models to live inside a simulated economy where decisions have structure, feedback, and consequences.

In other words: we’re watching LLMs graduate from chatbots to economic actors.


Background — Context and prior art

Economic simulation is not new. It has simply been… underwhelming.

Traditional approaches fall into three camps:

Approach Strength Weakness
Rule-based / survey models Interpretable Rigid, poor generalization
Deep learning demand models Predictive power Requires heavy feature engineering
Agent-based simulations (ABM) Rich interactions Computationally expensive, brittle

Recent LLM-based systems (e.g., EconAgent, LLM Economist) introduced semantic agents—entities that can reason, negotiate, and explain.

But they suffer from three structural issues:

  1. Data sparsity — Real transaction data is fragmented across categories
  2. OOD failure — Models collapse when encountering new products
  3. Numerical blindness — LLMs treat price like decoration, not signal

In short: they can simulate dialogue, but not demand curves.


Analysis — What MALLES actually does

MALLES takes a different stance: stop treating LLMs as storytellers, and start treating them as decision functions.

Formally, the system tries to approximate:

$$ \hat{a}_i = \hat{D}(X^{obs}_i, Z_i, \hat{\rho}_i; \theta) $$

Where behavior is reconstructed from observable inputs, latent profiles, and learned preferences.

1. Cross-category preference alignment

Instead of training on narrow product categories, MALLES aggregates transaction data across thousands of categories.

This creates a transfer effect:

Training Strategy Data Size Generalization Practical Use
Single-category Small High precision, low transfer Mature products
Full-category Large Strong transfer New products / sparse data

The theoretical intuition is straightforward: more data reduces variance, but cross-category similarity adds a regularization effect.

2. Multi-agent discussion (yes, actual “thinking”)

Instead of one LLM pretending to be everyone, MALLES splits roles:

  • Dealer (decision maker)
  • Service agent (market interface)
  • Manufacturer (constraints)

They engage in structured dialogue before producing a decision.

This does two things:

  • Compresses high-dimensional inputs into actionable signals
  • Avoids local optima (single-agent tunnel vision)

It’s less “AI thinking,” more organizational simulation.

3. Mean-field stabilization

Here’s where it gets quietly sophisticated.

Individual decisions are noisy. Markets are not.

MALLES introduces a mean-field variable $\mu_t$ to represent aggregate market behavior and feeds it back into agents.

Without Mean-Field With Mean-Field
High variance decisions Stabilized outputs
Fragmented context Population-aware decisions
Poor consistency Convergent behavior

This bridges micro decisions and macro patterns—something most agent systems conveniently ignore.

4. Numerical alignment (the real differentiator)

The system explicitly trains LLMs to care about:

  • Price
  • Discounts
  • Quantity

Using attention constraints and loss functions, e.g.:

$$ L_{attn} = E[KL(A_i || A_i^*)] $$

Which enforces attention toward economically relevant features.

This is subtle but critical: it converts LLMs from language models into economic approximators.


Findings — What actually improves

The results are not just incremental—they reveal a structural shift in capability.

Core performance comparison

Model Hit Rate Quantity Error Stability Time Cost
LLM Economist 0.38 0.95 0.42 Low
EconAgent 0.57 0.97 0.33 Low
FinCon 0.80 1.32 Low Medium
MALLES (Base) 0.70 0.94 0.38 Low
MALLES (Enhanced) 0.77 0.65 0.35 High

Key observations

  1. Accuracy vs Stability Trade-off Resolved MALLES maintains strong hit rates while significantly reducing variance.

  2. Numerical reasoning actually emerges Quantity prediction improves meaningfully—something most LLM systems fail at.

  3. Cross-category training works Models trained on broader datasets generalize better in sparse settings.

Ablation insights

Component Removed Impact
Post-training Collapse in price sensitivity
Multi-agent discussion Reduced strategy diversity
Mean-field Higher variance, unstable outputs

In other words: each component is not optional—it’s structural.


Implications — What this means for business

This is where things get interesting (and slightly uncomfortable).

1. Simulation replaces intuition

Instead of guessing:

  • “Will this discount work?”
  • “How will customers react?”

You simulate it.

At scale.

Repeatedly.

2. Decision-making becomes testable

MALLES introduces a closed-loop system:

Step Function
Strategy input Pricing / promotion / inventory
Simulation Multi-agent evaluation
Feedback Performance metrics
Adjustment Iterative refinement

This is effectively A/B testing for entire strategies, not just UI buttons.

3. AI agents become economic infrastructure

Not tools. Not assistants.

Infrastructure.

The same way databases replaced spreadsheets, these systems may replace:

  • Market research surveys
  • Demand forecasting heuristics
  • Manual pricing strategies

4. Risks (because there are always risks)

  • Behavioral manipulation: hyper-accurate preference modeling can be weaponized
  • Data dependency: garbage transaction data → garbage economic agents
  • Overfitting to past markets: structural shifts still break models

In short: powerful, but not neutral.


Conclusion — From language to logic

MALLES is not just another “multi-agent framework.”

It’s a signal that LLMs are evolving from:

systems that describe the world → systems that simulate it

The real breakthrough isn’t that agents can talk to each other.

It’s that their conversations now produce numerically grounded, economically consistent decisions.

And once that happens, the line between simulation and strategy becomes… negotiable.

Cognaptus: Automate the Present, Incubate the Future.