Opening — Why this matters now
Everyone wants AI agents that can “act.” Few can explain what that actually means in a market context. Generating text is trivial. Simulating decisions under constraints—price, inventory, demand elasticity—is where things start to look suspiciously like… economics.
The uncomfortable truth is this: most AI systems today can talk like consumers, but they don’t behave like them. They lack price sensitivity, memory of past purchases, and—perhaps most critically—any coherent response to incentives.
This paper introduces MALLES, a multi-agent LLM economic sandbox designed to close that gap. Not by adding more prompts, but by forcing models to live inside a simulated economy where decisions have structure, feedback, and consequences.
In other words: we’re watching LLMs graduate from chatbots to economic actors.
Background — Context and prior art
Economic simulation is not new. It has simply been… underwhelming.
Traditional approaches fall into three camps:
| Approach | Strength | Weakness |
|---|---|---|
| Rule-based / survey models | Interpretable | Rigid, poor generalization |
| Deep learning demand models | Predictive power | Requires heavy feature engineering |
| Agent-based simulations (ABM) | Rich interactions | Computationally expensive, brittle |
Recent LLM-based systems (e.g., EconAgent, LLM Economist) introduced semantic agents—entities that can reason, negotiate, and explain.
But they suffer from three structural issues:
- Data sparsity — Real transaction data is fragmented across categories
- OOD failure — Models collapse when encountering new products
- Numerical blindness — LLMs treat price like decoration, not signal
In short: they can simulate dialogue, but not demand curves.
Analysis — What MALLES actually does
MALLES takes a different stance: stop treating LLMs as storytellers, and start treating them as decision functions.
Formally, the system tries to approximate:
$$ \hat{a}_i = \hat{D}(X^{obs}_i, Z_i, \hat{\rho}_i; \theta) $$
Where behavior is reconstructed from observable inputs, latent profiles, and learned preferences.
1. Cross-category preference alignment
Instead of training on narrow product categories, MALLES aggregates transaction data across thousands of categories.
This creates a transfer effect:
| Training Strategy | Data Size | Generalization | Practical Use |
|---|---|---|---|
| Single-category | Small | High precision, low transfer | Mature products |
| Full-category | Large | Strong transfer | New products / sparse data |
The theoretical intuition is straightforward: more data reduces variance, but cross-category similarity adds a regularization effect.
2. Multi-agent discussion (yes, actual “thinking”)
Instead of one LLM pretending to be everyone, MALLES splits roles:
- Dealer (decision maker)
- Service agent (market interface)
- Manufacturer (constraints)
They engage in structured dialogue before producing a decision.
This does two things:
- Compresses high-dimensional inputs into actionable signals
- Avoids local optima (single-agent tunnel vision)
It’s less “AI thinking,” more organizational simulation.
3. Mean-field stabilization
Here’s where it gets quietly sophisticated.
Individual decisions are noisy. Markets are not.
MALLES introduces a mean-field variable $\mu_t$ to represent aggregate market behavior and feeds it back into agents.
| Without Mean-Field | With Mean-Field |
|---|---|
| High variance decisions | Stabilized outputs |
| Fragmented context | Population-aware decisions |
| Poor consistency | Convergent behavior |
This bridges micro decisions and macro patterns—something most agent systems conveniently ignore.
4. Numerical alignment (the real differentiator)
The system explicitly trains LLMs to care about:
- Price
- Discounts
- Quantity
Using attention constraints and loss functions, e.g.:
$$ L_{attn} = E[KL(A_i || A_i^*)] $$
Which enforces attention toward economically relevant features.
This is subtle but critical: it converts LLMs from language models into economic approximators.
Findings — What actually improves
The results are not just incremental—they reveal a structural shift in capability.
Core performance comparison
| Model | Hit Rate | Quantity Error | Stability | Time Cost |
|---|---|---|---|---|
| LLM Economist | 0.38 | 0.95 | 0.42 | Low |
| EconAgent | 0.57 | 0.97 | 0.33 | Low |
| FinCon | 0.80 | 1.32 | Low | Medium |
| MALLES (Base) | 0.70 | 0.94 | 0.38 | Low |
| MALLES (Enhanced) | 0.77 | 0.65 | 0.35 | High |
Key observations
-
Accuracy vs Stability Trade-off Resolved MALLES maintains strong hit rates while significantly reducing variance.
-
Numerical reasoning actually emerges Quantity prediction improves meaningfully—something most LLM systems fail at.
-
Cross-category training works Models trained on broader datasets generalize better in sparse settings.
Ablation insights
| Component Removed | Impact |
|---|---|
| Post-training | Collapse in price sensitivity |
| Multi-agent discussion | Reduced strategy diversity |
| Mean-field | Higher variance, unstable outputs |
In other words: each component is not optional—it’s structural.
Implications — What this means for business
This is where things get interesting (and slightly uncomfortable).
1. Simulation replaces intuition
Instead of guessing:
- “Will this discount work?”
- “How will customers react?”
You simulate it.
At scale.
Repeatedly.
2. Decision-making becomes testable
MALLES introduces a closed-loop system:
| Step | Function |
|---|---|
| Strategy input | Pricing / promotion / inventory |
| Simulation | Multi-agent evaluation |
| Feedback | Performance metrics |
| Adjustment | Iterative refinement |
This is effectively A/B testing for entire strategies, not just UI buttons.
3. AI agents become economic infrastructure
Not tools. Not assistants.
Infrastructure.
The same way databases replaced spreadsheets, these systems may replace:
- Market research surveys
- Demand forecasting heuristics
- Manual pricing strategies
4. Risks (because there are always risks)
- Behavioral manipulation: hyper-accurate preference modeling can be weaponized
- Data dependency: garbage transaction data → garbage economic agents
- Overfitting to past markets: structural shifts still break models
In short: powerful, but not neutral.
Conclusion — From language to logic
MALLES is not just another “multi-agent framework.”
It’s a signal that LLMs are evolving from:
systems that describe the world → systems that simulate it
The real breakthrough isn’t that agents can talk to each other.
It’s that their conversations now produce numerically grounded, economically consistent decisions.
And once that happens, the line between simulation and strategy becomes… negotiable.
Cognaptus: Automate the Present, Incubate the Future.