Opening — Why this matters now
Everyone wants LLMs to think harder. Enterprises, however, mostly need them to think correctly — especially when optimization models decide real money, real capacity, and real risk. As organizations scale, optimization problems grow beyond toy examples. Data spills into separate tables, constraints multiply, and naïve prompt‑to‑solver pipelines quietly collapse.
The paper behind today’s discussion introduces LEAN‑LLM‑OPT, a system that delivers an unfashionable but effective message: large language models do not fail because they are too small — they fail because we ask them to do too much at once.
Background — From prompting to orchestration
Early attempts at LLM‑driven optimization followed a simple logic: describe the problem, ask the model to generate a formulation, solve it. This works — briefly — for small linear programs where all information fits neatly into the prompt.
Once datasets become external, heterogeneous, and large‑scale, performance degrades sharply. Prior work tried to patch this gap via:
| Approach | Core Limitation |
|---|---|
| Prompt engineering | Fragile and non‑scalable |
| Fine‑tuning | Expensive and domain‑specific |
| Solver‑aware models | High training cost, limited portability |
LEAN‑LLM‑OPT takes a different route: system design over model size.
Analysis — What LEAN‑LLM‑OPT actually does
LEAN‑LLM‑OPT separates optimization modeling into what must be reasoned and what can be standardized.
The agentic workflow
Instead of a single monolithic prompt, the system uses:
- Upstream planner agents — construct a step‑by‑step modeling workflow based on problem type
- Downstream generator agent — follows this workflow to produce the final formulation
- Tooling layer — handles data retrieval, parsing, and bookkeeping
This division offloads mechanical tasks and preserves cognitive bandwidth for constraint logic and coefficient placement — precisely where LLMs struggle most.
Crucially, workflows are interpretable, modular, and reusable. No retraining required.
Findings — Results that actually matter
The results are blunt.
Large‑Scale‑OR benchmark (execution accuracy)
| Model | Overall Accuracy |
|---|---|
| GPT‑4.1 | 14.85% |
| gpt‑oss‑20B | 25.74% |
| Gemini 3 Pro | 52.48% |
| LEAN‑LLM‑OPT (GPT‑4.1) | 85.15% |
| LEAN‑LLM‑OPT (gpt‑oss‑20B) | 80.20% |
The same base models jump from near‑random correctness to production‑grade reliability — without additional training.
Even more interesting: workflow‑only or tools‑only variants collapse to near zero accuracy. Structure and tooling are complements, not substitutes.
Real‑world validation: Airline revenue management
On Singapore Airlines–style fare capacity allocation problems, LEAN‑LLM‑OPT reaches 93% execution accuracy, while base GPT‑4.1 fails entirely. The model is not discovering optimization theory; it is being guided through it.
Implications — Why this should change how you build AI systems
Three uncomfortable takeaways for AI teams:
-
Model upgrades are not strategy Older and smaller models, when properly orchestrated, outperform newer giants used naïvely.
-
Workflow design is now an AI primitive Operations research quietly becomes a design manual for agentic AI.
-
Cost, privacy, and control improve together Open models + structured workflows reduce both inference cost and governance risk.
This aligns naturally with emerging ideas of assured autonomy — systems that are verifiable, constrained, and auditable by design.
Conclusion — The unglamorous future of useful AI
LEAN‑LLM‑OPT does not make LLMs smarter. It makes them behave. In enterprise optimization, that distinction matters far more than parameter counts or leaderboard wins.
The future of applied AI will look less like a larger brain and more like a better organization chart.
Cognaptus: Automate the Present, Incubate the Future.