The dominant paradigm in LLM agents today is autoregressive reasoning: think step by step, commit token by token. This approach works decently for small tasks — write a tweet, answer a math question — but it quickly falters when the goal requires deep planning, multiple decision branches, or adapting to partially observable environments. Imagine trying to plan a vacation or operate a flight search website while thinking only one move ahead.

Enter SIMURA — the Simulative Reasoning Architecture. Instead of blindly following an LLM’s first thought, it proposes a structured method of thinking ahead, using the LLM itself as a world model to simulate potential outcomes of actions. The difference? SIMURA doesn’t ask, “What should I do next?” It asks, “What would happen if I did this next?”

🧠 Simulation as Deliberation

The inspiration is human cognition. Psychologists like Daniel Kahneman distinguish between System 1 (fast, intuitive responses) and System 2 (slow, deliberative reasoning). SIMURA implements the latter. It splits agentic reasoning into three roles:

  1. Policy proposes several plausible next actions
  2. World Model simulates the outcome of each one
  3. Critic scores those outcomes against the goal and selects the best

Unlike prior agents that either chain-of-thought their way into errors or hardcode control flows, SIMURA turns planning into a self-contained loop of thought experiments. And crucially, all steps are carried out in natural language — using discrete summaries of observations, predicted states, and action intentions, making the whole process interpretable and modular.

📦 Hierarchies: From Abstract Intents to Executable Actions

SIMURA makes a subtle but powerful architectural decision: it separates simulated actions from actual API calls. Instead of rolling out click-by-click predictions (which are fragile and low-level), SIMURA simulates higher-level intentions like “click the cheapest flight” or “navigate to home page”.

These simulated actions are represented in language and mapped to actual interface commands only at execution time. This makes planning more transferable across tasks and environments, as shown in the figure below:

Layer Function Representation
Observation HTML, UI tree Raw text / structure
Belief State Encoded summary of current view Natural language
Policy Output Candidate next steps (abstract intent) Natural language
World Model Predicts next state given intent Natural language
Actor Converts selected intent into actual action API command

This hierarchy not only enhances interpretability, but also reduces error propagation and accelerates planning — each high-level simulation can encapsulate multiple concrete actions.

🧪 Flight Search, Reinvented

To test SIMURA, the authors created FlightQA, a novel benchmark that simulates real-world flight search tasks using live data from sites like Google Flights. Each task adds more constraints (e.g., round-trip, non-stop, under $1000, morning departure), testing how well the agent can plan and adapt.

The baseline — BrowsingAgent — achieved 0% success rate.

SIMURA with autoregressive planning did slightly better at 14.4%.

But SIMURA with world model simulation achieved a 32.2% success rate — a 124% improvement over its own autoregressive version.

Even more striking, the agent reduced action errors from 93% to 1.1%, and drastically cut down repetitive failures.

🌐 Toward a Truly General Agent

This isn’t just about web browsing. The bigger idea is to treat LLMs not just as text generators, but as generative simulators of the world. By formalizing a belief state and simulating forward using language, SIMURA turns any environment into a playground for internal planning — even when it’s partially observable or filled with distractions.

Where most current agents are overfitted templates or prompt-chained scripts, SIMURA is modular, extensible, and increasingly autonomous. With better caching, multimodal perception, and parallelized rollout, this could be the architectural seed for a generalist assistant that genuinely thinks before it acts.

For now, SIMURA is just a research prototype. But its message is clear:

The next leap in agent intelligence won’t come from better prompts — it will come from better planning.


Cognaptus: Automate the Present, Incubate the Future