The chatbot that cannot check the door
A useful AI assistant can write an email, summarize a meeting, explain a regulation, or generate a plan for fixing a server problem. Then something inconvenient happens: the real world disagrees.
The meeting transcript missed one speaker. The regulation changed in one jurisdiction. The server error was not caused by the code but by two services fighting over the same port. The customer sounded satisfied in the chat log but cancelled the contract two days later. The model can still talk. Beautifully, even. But it cannot always live inside the situation long enough to notice that its first answer has become stale, incomplete, or simply wrong.
That is the starting point for Hong Su’s paper, Human Simulation Computation: A Human-Inspired Framework for Adaptive AI Systems.1 The paper is not arguing that AI should imitate human personality, produce warmer small talk, or wear a digital cardigan and say “as a fellow human” with alarming confidence. Its target is more structural: AI systems need a closed-loop architecture in which thinking, action, learning, reflection, feedback, and scheduling continuously modify one another.
In other words, the paper asks a deceptively practical question: what would it mean for an AI system to stop merely responding and start operating?
The answer is Human Simulation Computation, or HSC. Its central claim is that intelligence should be treated not as a one-shot transformation from input to output, but as an ongoing adaptive process. That process must include action, not as a final execution step, but as a way to acquire missing context, test assumptions, generate feedback, and improve future reasoning.
For business readers, this is the interesting part. HSC does not provide a benchmark victory or a ready-made product architecture. It gives something less flashy and more useful: a control-loop vocabulary for thinking about agents that must keep working after the prompt ends.
The paper’s real object is not “human-like AI”; it is a closed adaptive loop
The easiest way to misunderstand HSC is to read the phrase “human simulation” as anthropomorphism. That would be convenient, because anthropomorphism is easy to mock. Also usually deserved.
But the paper’s “human” reference is not about personality. It is about process. Humans do not solve the world as isolated tasks. They observe, form goals, notice anomalies, act, collect feedback, remember failures, reinterpret goals, reuse routines, and revisit unresolved problems during idle time. The paper argues that AI systems designed for open environments need an analogous structure.
The proposed loop can be summarized as follows:
| HSC component | What it does in the paper | Operational translation |
|---|---|---|
| Thinking | Decides what matters, whether to act, how to act, and when to act | A goal-aware reasoning layer that filters context and selects candidate actions |
| Action | Interacts with the environment and helps reasoning by acquiring feedback | Tool use, probing, querying, testing, simulation, user clarification, or environment manipulation |
| Reflection | Reviews prior thinking, actions, outcomes, and even reflection methods | Post-action evaluation, root-cause analysis, process review, and method revision |
| Learning | Records outcomes, processes, reasons, action sequences, and feedback on time | Memory updates, procedure libraries, experience logs, policy updates, or fine-tuning inputs |
| Activity scheduling | Decides when to think, learn, reflect, or revisit issues | Background jobs, idle-time review, anomaly queues, priority systems, and exploration triggers |
This mechanism-first reading matters because HSC is not mainly a new prompting trick. Prompting tricks improve a model’s answer inside an inference episode. HSC is concerned with what happens across episodes.
A language model can produce a reasoning chain. HSC asks whether the system can later ask: Did that reasoning work? What changed after the action? What should be remembered? Should this issue be revisited during low-load time? Which unresolved questions are still worth attention, and which have become computational swamp land?
That last question is underrated. Many agent systems today are designed to be enthusiastic, not disciplined. They keep retrying, expanding, reflecting, and “improving” until the bill arrives wearing a small hat. HSC’s scheduling layer is useful because it treats attention as a scarce resource. The system should prioritize important issues, novel issues, and high-information deviations—but also suppress long-unsolved items when they keep consuming resources without progress.
That is less glamorous than “agentic intelligence.” It is also closer to operations.
Why language alone cannot teach the feedback loop
The paper’s critique of language-only learning is straightforward. LLMs learn from text, and text records descriptions of thinking, actions, and consequences. But descriptions are not the same as the interactive processes that produced them.
A maintenance manual can describe what happens when a machine overheats. It cannot expose the AI system to every unusual sequence of sensor drift, delayed alarm, operator habit, and environmental condition that turns overheating into a real operational problem. A conversation log can describe an angry customer. It cannot fully encode the difference between a customer who is venting and a customer who is about to disappear.
The paper’s point is not that language is useless. Obviously not. Language is a powerful interface for compressed human experience. The narrower claim is that language alone does not provide intrinsic mechanisms for goal formation, environmental interaction, action-based verification, or long-term adaptation.
The difference is easiest to see in three layers:
| Layer | Language-only system | HSC-style system |
|---|---|---|
| Reasoning | Produces a plausible answer from context | Produces a candidate judgment under goals and constraints |
| Verification | Relies on linguistic consistency or external evaluation | Acts to obtain feedback and compare prediction with outcome |
| Adaptation | Improves only through separate training, prompting, or human correction | Records process, feedback, and reflection for future behavior |
This is why the paper repeatedly treats action as part of cognition. Action is not just “do the thing.” Action is also “find out whether the thing is true.”
For a customer-service agent, action might mean asking for order history before proposing compensation. For a trading assistant, it might mean checking market liquidity and position exposure before suggesting a strategy. For a document-processing workflow, it might mean requesting the original spreadsheet rather than confidently explaining a chart copied into a PDF at suspicious resolution. AI dignity begins with admitting that screenshots are not databases.
The paper’s theoretical section frames this as a verification problem. An agent predicts the consequence of an action; the environment returns feedback; the discrepancy between expected and observed results becomes the correction signal. Without interaction, the system cannot compute that discrepancy. Wrong internal beliefs may persist because nothing forces them to collide with reality.
This is not an experimental result. The paper does not show a benchmark where HSC beats a baseline agent. Its “proof” section is a theoretical justification: interaction expands the available information scope, grounds prediction errors, and allows internal strategies to evolve through learning and reflection. That supports the framework’s logic. It does not yet prove implementation performance.
That boundary matters. HSC gives us an architecture to reason with, not a finished engineering recipe.
Difference detection is the paper’s quiet computational argument
One of the paper’s stronger ideas is that human-inspired reasoning reduces search by focusing on differences.
Real environments contain too many factors. Exhaustively considering every possible cause, action, and consequence is impossible. Humans cope by noticing deviations: something changed, something failed, something appeared in the wrong place, something persisted longer than expected, something did not match the usual pattern.
HSC formalizes this intuition as main-feature-oriented reasoning. The system should identify the main feature or problem that distinguishes the current situation from normal conditions. If the difference is large enough, it deserves attention. If the current scope is inadequate, the system should expand the scope—often through action.
This gives HSC a practical computational angle. Adaptive intelligence is not just more reasoning. It is better filtering.
| Human-inspired strategy | Mechanism | Business interpretation |
|---|---|---|
| Main-feature reasoning | Focus on deviations from expected conditions | Reduce diagnostic noise in operations, compliance, and customer workflows |
| Wider-scope consideration | Check whether the current context is too narrow | Avoid brittle answers caused by missing business, technical, or user context |
| Continuous thinking before action | Re-evaluate as new information arrives | Support staged decisions rather than premature automation |
| Generalization and degeneralization | Reuse methods, then specialize when reuse fails | Build reusable playbooks without pretending every client is identical |
| Oppositional and holistic thinking | Compare alternatives and inspect system-level patterns | Reduce premature convergence on one explanation |
| Risk avoidance and positive reframing | Evaluate damage and long-term stability | Make agents safer under repeated failure or ambiguous incentives |
| Candidate action planning | Prepare multiple action paths and rescue options | Prefer reversible, testable actions over brittle automation |
This is where the paper becomes more than a philosophical complaint about LLMs. Difference-based reasoning offers an operational design principle: agents should not simply ingest more context; they should learn which differences are worth context expansion.
For businesses, that distinction is expensive. A naive agent may ask for everything, retrieve everything, summarize everything, and still miss the one abnormal signal. A useful agent should detect that the user’s problem is not “the report failed,” but “the report failed only after the data source was migrated,” or “only for one region,” or “only when two approval workflows overlap.”
That is not magic. It is structured attention.
Action is how the system buys missing context
The paper’s most business-relevant move is to reposition action as a participant in thinking. This is stronger than saying agents need tools. Tool use can still be shallow: call API, paste result, continue talking. HSC asks for a deeper loop: action should change the reasoning state.
Consider a common enterprise case. A finance team asks an AI agent why monthly revenue recognition differs from the forecast. A language-only system may list possible reasons: delayed invoices, changed assumptions, customer churn, currency effects, classification errors. Fine. A useful analyst-agent should do something else. It should inspect the variance by product line, check whether the difference is concentrated in new contracts, compare booking dates with recognition dates, look for one-time adjustments, and then update its hypothesis.
The action is not clerical. It is epistemic. It buys information.
HSC uses this same logic in several forms:
- Actions expand reasoning scope when the current context is insufficient.
- Actions identify hidden objects and background conditions.
- Actions verify candidate methods against environmental feedback.
- Actions create learning material when a new issue lacks enough examples.
- Actions help reflection by collecting evidence about what actually happened.
This point also separates HSC from simple “autonomous execution.” A dangerous agent acts because it can. A useful agent acts because the action is tied to uncertainty reduction, goal alignment, feedback acquisition, or long-term learning.
That distinction should become a design rule. Before giving an AI system a tool, ask what cognitive role the tool serves.
| Tool or action type | Weak design | HSC-style design |
|---|---|---|
| Search | Retrieve more text | Resolve a specific uncertainty or broaden a narrow scope |
| Database query | Pull records | Test a hypothesis about a deviation |
| User clarification | Ask generic follow-up questions | Ask for the missing variable that blocks safe action |
| Simulation | Generate scenarios | Compare candidate actions before irreversible execution |
| Workflow execution | Complete a task | Execute, observe outcome, record process, and update future procedure |
The same tool can be intelligent or decorative. The difference is whether the agent knows why it is acting.
Reflection should review the process, not just apologize after failure
Many AI products use “reflection” as a polite word for retrying the answer with more tokens. HSC gives reflection a larger role.
Reflection examines the whole process: thinking, action, learning, feedback, and reflection itself. It asks whether unresolved problems remain, whether a better method exists, whether a problem–method pair can be extracted, and whether the result can generalize to other contexts.
That is important because post-hoc correction is not the same as learning. A system can apologize for a mistake and then repeat the same mistake tomorrow with excellent formatting. Reflection only matters if it changes future behavior.
The paper emphasizes several reflective patterns: widening the scope of review, finding reasons behind outcomes, discovering non-obvious associations, and recording reflection results on time. These are mundane ideas until they are missing. Then they become the reason automation keeps failing in exactly the same place.
A business workflow version might look like this:
| After-action question | What it prevents |
|---|---|
| What prediction did the agent make before acting? | Silent drift between reasoning and outcome |
| What feedback arrived after the action? | Ignoring real-world consequences |
| Which part of the context was missing? | Repeated hallucination from incomplete scope |
| Was the method wrong, or was the goal wrong? | Optimizing the wrong behavior |
| Should this become a reusable procedure? | Losing operational learning |
| Should this issue be deprioritized after repeated failure? | Infinite retry loops disguised as persistence |
This is where HSC becomes useful for agent governance. Many organizations discuss AI governance as if it were mainly about policy documents and approval gates. Those are necessary, but they do not give the agent an internal habit of examining its own process. HSC suggests that governance must also be architectural: agents need memory, feedback logging, process review, and scheduled reflection.
Not a vibes-based “let us think deeply” reflection. A boring, inspectable, auditable reflection. The best kind, unfortunately.
Scheduling turns agent intelligence into operations
The activity scheduling section may look secondary at first. It is not. Scheduling is what turns an adaptive loop from a concept into an operating system.
The paper distinguishes trigger sources from activity scheduling. Trigger sources explain why the system initiates thinking or action: goals, environmental feedback, abnormal conditions, or interaction with other agents. Activity scheduling decides when and how thinking, learning, and reflection should be organized.
This matters because real agents cannot reflect on everything. They need attention policy.
HSC proposes several scheduling priorities:
- issues important to basic operation and long-term goals;
- issues that are new or insufficiently understood;
- high-entropy issues that differ meaningfully from normal patterns;
- unresolved questions that may benefit from further analysis;
- background learning and reflection during idle or low-load periods;
- suppression of items that remain unsolved for too long without progress.
Translated into enterprise architecture, this looks very familiar. It resembles incident queues, anomaly detection, background batch jobs, root-cause review, and backlog prioritization. The difference is that HSC places these mechanisms inside the adaptive intelligence loop, not outside it as a human operations layer.
That is a useful shift. Today, many AI systems are reactive services. They wait for a prompt, produce an output, and forget the operational residue unless someone builds a separate memory or monitoring pipeline. HSC implies that serious agents should have a day job and a night shift.
The day job handles active tasks. The night shift reviews strange failures, compresses repeated procedures into routines, revisits unresolved anomalies, and updates what the system treats as normal.
For Cognaptus-style automation, this is directly relevant. A business process agent should not only handle invoices, customer requests, compliance checks, or research summaries. It should notice which tasks repeatedly fail, which exceptions are growing, which user corrections appear often, and which workflows deserve redesign. The practical value is not that the agent becomes “more human.” The value is that it stops treating every Tuesday as its first day at work.
What the paper directly supports, and what Cognaptus infers
Because this paper is conceptual, the line between direct claim and practical inference must be kept clean.
| Claim type | What the paper supports | What it does not prove |
|---|---|---|
| Direct framework claim | HSC integrates thinking, action, learning, reflection, feedback, and scheduling into a closed loop | That a specific implementation will outperform existing agents |
| Theoretical argument | Interaction provides feedback needed to verify and improve reasoning | That feedback-driven updates will always converge safely |
| Human-strategy catalog | Difference detection, scope expansion, oppositional thinking, and on-time learning can guide adaptive agents | That the listed strategies are complete or optimally formalized |
| Action-centered claim | Action can acquire information, test assumptions, and promote learning | That more autonomy is always better |
| Scheduling claim | Reflection and learning should be prioritized by importance, novelty, entropy, and unresolved status | That a particular scheduler will be efficient under real enterprise constraints |
Cognaptus’ business inference is therefore cautious but strong: HSC is best read as a design checklist for moving from prompt products to operational agents.
A prompt product asks: “What should the model say?”
An operational agent asks:
- What goal is active?
- What is abnormal?
- What context is missing?
- What action would reduce uncertainty?
- What outcome was predicted?
- What feedback arrived?
- What should be learned?
- What should be scheduled for later review?
- What should be forgotten, suppressed, or deprioritized?
That is the difference between a talking system and a living workflow.
The uncomfortable implication is that many so-called agents are still chatbots with tool belts. Useful, yes. Transformational, occasionally. But if they do not maintain goals, detect deviations, act to verify assumptions, learn from outcomes, and schedule reflection, they are not very agentic. They are enthusiastic interns with API access. A thrilling phrase, depending on your insurance coverage.
The implementation gap is the real research agenda
The paper’s limitations are not minor. They define the next step.
First, HSC is not empirically validated in the paper. There are no deployment studies, no benchmark comparisons, no ablations, and no cost measurements. The theoretical section explains why interaction and feedback should matter, but it does not measure how much they matter under specific architectures.
Second, the framework depends heavily on mechanisms that are difficult to implement safely: long-term memory, goal updates, reward reinterpretation, action scheduling, environmental probing, and autonomous reflection. Each one can improve adaptation. Each one can also create failure modes. A system that learns goals poorly may rationalize bad behavior. A system that schedules reflection badly may waste resources. A system that acts to gather context may violate privacy, policy, or user expectations if boundaries are not explicit.
Third, the paper treats human thinking strategies as useful computational guides, but the formalization remains broad. Terms such as “living better,” “importance,” “high entropy,” and “positive reframing” require domain-specific definitions before they can become reliable engineering components. In a robot, “living better” may include energy, safety, and task completion. In an accounting workflow, it may mean auditability, correctness, latency, and policy compliance. In a trading assistant, it must include risk limits before profit fantasies start wearing lab coats.
Fourth, feedback is not automatically truthful. The environment can be noisy, delayed, adversarial, or mismeasured. A customer’s silence is feedback, but not always satisfaction. A successful workflow completion is feedback, but not always correctness. HSC correctly emphasizes grounded feedback; implementation must still decide which feedback deserves trust.
These limitations do not weaken the paper’s usefulness. They locate it. HSC should not be sold as a ready-made architecture for enterprise autonomy. It should be used as a conceptual blueprint for designing agents whose intelligence accumulates through operation.
The practical design pattern: make every action leave a trace
The most useful business translation of HSC is simple: every meaningful action should leave a learning trace.
That trace does not need to mean model fine-tuning. Often it should not. It may be a structured log, a procedure update, a memory entry, a failure case, a user correction, a confidence adjustment, or an item added to a review queue.
A minimal HSC-inspired enterprise agent could follow this pattern:
- Goal framing: define the operational goal and constraints before acting.
- Difference detection: identify what is abnormal or uncertain in the current case.
- Scope check: decide whether the current context is enough.
- Context action: search, query, ask, simulate, inspect, or test to reduce uncertainty.
- Candidate comparison: evaluate possible actions and risks.
- Execution: act only when the expected benefit justifies the risk.
- Feedback capture: record what happened after the action.
- Reflection: compare prediction with outcome and identify causes.
- Learning trace: store reusable process knowledge, not just the final answer.
- Scheduling: revisit unresolved, novel, high-impact, or high-entropy cases later.
This pattern is not science fiction. It is closer to disciplined workflow engineering. The paper’s contribution is to connect these pieces into an adaptive loop and explain why language-only AI will not naturally acquire the whole loop from text.
The biggest business value, then, is not “human-like AI.” It is cheaper diagnosis, safer autonomy, and compounding process knowledge.
An AI system that can learn from every invoice exception, every failed support response, every compliance escalation, every bad forecast, and every user correction becomes more valuable over time. Not because it has feelings. Because it has operational memory and feedback discipline.
That is the version of “human simulation” worth taking seriously.
From answer machines to adaptive systems
The old interface taught us to judge AI by the answer in front of us. Was it fluent? Was it correct? Was it useful? Those questions still matter, but they are too small for agents operating in open environments.
The better question is: what happens after the answer?
Does the system act to verify? Does it notice when reality pushes back? Does it learn the reason for failure? Does it update its future procedure? Does it schedule unresolved issues for later review? Does it stop wasting effort on dead ends? Does it know when the missing context is more important than the next paragraph?
Human Simulation Computation does not give all the engineering answers. It does, however, give a sharper frame: adaptive AI is not a model that talks like a human. It is a system that can participate in a loop of thinking, action, feedback, reflection, learning, and scheduling.
The chatbot answers the question.
The agent checks the door, notices the lock is broken, remembers that this happened last month, changes the maintenance routine, and schedules a review before the next failure.
Less glamorous. More alive.
Cognaptus: Automate the Present, Incubate the Future.
-
Hong Su, “Human Simulation Computation: A Human-Inspired Framework for Adaptive AI Systems,” arXiv:2601.13887, 2026. ↩︎