A factory line does not need a chatbot with feelings. It needs a control system that can tell the difference between a harmless deviation, a costly delay, and a situation that deserves to interrupt a human operator before the machine becomes expensive sculpture.
That is the useful way to read Computational Concept of the Psyche by Anton Kolonin and Vladimir Krykov.1 The paper’s title sounds as if we are about to attach a synthetic soul to a machine, perhaps with a dashboard of emotions and a tasteful blue glow. Fortunately, the core argument is more operational than theatrical: an intelligent agent should not only predict the next state of the world; it should manage its own state of needs while acting under uncertainty, risk, and resource limits.
The distinction matters because enterprise AI is drifting from response generation toward autonomous operation. A model that writes a report can be judged by relevance and style. An agent that changes a workflow, escalates a risk, adjusts a machine, or spends compute on a long investigation needs a different internal economy. It needs a way to ask: Which need is urgent, what action could satisfy it, what could go wrong, and what does this cost?
The paper proposes one answer: treat the psyche as an operating system, intelligence as a decision engine, and needs as first-class variables inside the agent’s state space. That sentence sounds grand. The rest of this article translates it into something less mystical and more useful.
The paper is not asking machines to feel; it is asking agents to account for why they act
The easy misconception is that a “computational psyche” means giving AI human-like emotions. That is not where the serious content sits. In this paper, emotions are closer to control signals than personality decorations. A negative state is not there so the agent can sulk in a corner. It is a signal that a need has become unsatisfied and should influence action priority.
The proposed architecture begins with a nested view:
| Layer | Paper’s framing | Practical translation |
|---|---|---|
| System | Intelligent agent | The whole autonomous system operating in an environment |
| Psyche | Operating system | The internal state-management layer that tracks needs, actions, sensations, and memory |
| Intelligence | Decision-making system | The mechanism that chooses future states and actions |
| Subsystems | Fast and slow cognition | Neural/associative models for fast patterning, symbolic structures for interpretable reasoning |
| State space | Sensations, needs, actions | What the system perceives, what it wants to maintain or satisfy, and what it can do |
The paper’s most useful move is to place needs inside the same computational frame as observations and actions. Standard reinforcement learning often starts with states, actions, and rewards. Many enterprise agents start with prompts, tools, and policies. Here, the state is broader: the agent has sensations from the environment, possible actions, and an internal needs matrix whose values can change over time.
That change is not cosmetic. A monitoring agent at 9:00 a.m. and the same monitoring agent during a critical outage may observe similar variables but should not behave with the same urgency. The difference is not merely in the external state. It is in the internal priority of needs: safety, continuity, cost control, operator attention, regulatory exposure, and perhaps exploration when the system encounters unfamiliar behavior.
That is why “need-driven” architecture is not the same as “reward maximization with better branding.” It changes what is being optimized.
Mechanism one: the agent lives in a state space made of sensations, needs, and actions
The paper defines the agent’s psyche space as a combination of three spaces: sensations, needs, and actions. Sensations represent what the agent perceives. Actions represent what it can do. Needs represent the internal variables that make some outcomes meaningful and others irrelevant.
A simple way to read the mechanism is:
Environment → sensations → internal state
Internal needs → motivational pressure
Decision system → selected action
Action → new environment and new internal state
Memory → evidence for future decisions
This is not a decorative diagram; it changes the design problem. If needs are explicit, the agent can evaluate actions not only by whether they are predicted to work, but by which internal and operational pressures they relieve.
For a business agent, “needs” should not be anthropomorphic. They could be operational variables such as:
| Business agent need | What it could mean operationally | Typical failure if ignored |
|---|---|---|
| Continuity | Keep a process within acceptable operating bounds | The agent optimizes a local task while damaging throughput |
| Safety | Avoid high-severity states even if probability is low | The agent treats rare hazards as statistical noise |
| Cost discipline | Minimize compute, energy, or human attention cost | The agent escalates or reasons endlessly because it can |
| Predictability | Reduce surprise between expected and observed states | The agent fails to notice that its world model is stale |
| Exploration | Learn unfamiliar states when safe and valuable | The agent becomes brittle outside known patterns |
| Compliance | Avoid actions outside policy or regulatory bounds | The agent succeeds technically and fails institutionally, a popular genre of automation comedy |
This is Cognaptus’ inference, not a direct enterprise experiment from the paper. The paper itself is more abstract and AGI-oriented. But the mapping is natural: if an artificial psyche is a state-management layer, then enterprise agents can implement a less romantic version of it as operational need management.
Mechanism two: motivation is the product of long-term priority and current dissatisfaction
The paper formalizes motivation with two key vectors. The first is a long-term priority vector, $x$, which reflects the enduring importance of needs. The second is a current actualization or dissatisfaction vector, $y$, which reflects how urgent those needs are at a specific moment. The motivational vector is expressed as:
The notation is simple, but the design implication is substantial. A need becomes behaviorally important only when it is both important in principle and currently actualized. Safety may have a very high long-term priority, but if no hazard is visible, it may not dominate every action. Exploration may have moderate long-term priority, but when the agent encounters a new regime, its actualization rises.
This gives us a cleaner vocabulary for agent configuration. Instead of merely writing rules such as “always avoid risky actions,” we can define:
| Parameter | Meaning | Business design question |
|---|---|---|
| $x$ | Long-term priority profile | What should this agent consistently care about? |
| $y_t$ | Current need actualization | What is currently unsatisfied, threatened, or uncertain? |
| $z_t$ | Motivational pressure | Which needs should drive the next decision? |
The authors describe $x$ as something like a personality profile: relatively stable in the short term, but potentially changeable through long-term learning or belief change. For business use, “personality” is probably the wrong interface label unless one enjoys confusing procurement teams. The more useful term is operating profile.
A warehouse optimization agent, a trading surveillance agent, and a customer-support triage agent should not have the same operating profile. One should care heavily about physical throughput and exception handling. Another should care about false negatives in suspicious activity. Another should balance customer satisfaction against escalation cost. The architecture gives a place to put those differences without pretending they are just prompt style.
Mechanism three: utility includes gains, losses, probability, predictability, and energy cost
The paper then moves from needs to decision-making. It argues that classical scalar reward is too narrow. A serious agent must evaluate possible transitions by combining positive outcomes, negative outcomes, probabilities, risk preferences, predictability, and energy efficiency.
In the paper’s simplified decision illustration, future states have utilities and probabilities. A statistically attractive option may offer higher payoff with moderate probability, while another option may offer smaller payoff with higher certainty. The authors connect this to prospect theory: subjects may prefer safer outcomes even when expected-value arithmetic points elsewhere.
The practical point is not that AI should inherit every human bias. Please, no. The point is that utility is not a clean scalar handed down from the sky. Different consequences have different weights, and the same gain/loss profile can be valued differently by different agents.
The paper’s decision logic can be summarized as follows:
| Element | Role in the paper | Why it matters for agents |
|---|---|---|
| Utility $U$ | Value of transitioning into a future state | Captures whether a future state satisfies prioritized needs |
| Probability $P$ | Evidence-based likelihood of that transition | Prevents the agent from choosing fantasies with excellent utility |
| Evidence count $C$ | Accumulated experience of state transitions | Allows probability estimates to depend on observed transitions |
| Constraint matrices | Mutual exclusion and dependency among variables | Keeps impossible or jointly required states from being treated as optional |
| Energy efficiency $E(a)$ | Cost of acting | Prevents intelligence from becoming an expensive way to say “maybe” |
| Predictability | Gap between expected and actual state | Turns surprise into a learning signal |
The most business-relevant part is the explicit separation between three kinds of feedback:
- Explicit reinforcement: did the action directly satisfy or fail a need?
- Predictive reinforcement: did reality match expectation?
- Energy reinforcement: did the action consume resources efficiently?
That is a useful decomposition. Many deployed AI workflows measure only the first category, and even that poorly. A support bot is judged by resolution rate. A forecasting agent is judged by forecast error. A workflow agent is judged by task completion. But autonomous systems also need to know when their model of the environment is decaying and when the cost of thinking or acting is no longer justified.
A task-only agent asks: “Did I complete the instruction?”
A need-aware agent asks: “Did this action improve the system state enough, with acceptable risk and cost, given what I expected to happen?”
The second question is longer. It is also closer to how real operations are managed.
Mechanism four: memory is a stack, not a pile of chat history
The paper proposes a four-layer memory architecture:
| Memory layer | Paper’s description | Enterprise analogy |
|---|---|---|
| Long-term episodic memory | Full logs of agent-environment interaction | Event logs, traces, audit records, historical cases |
| Model memory | Learned approximation or symbolic invariant patterns | Neural model, rules, graph, process model, anomaly model |
| Short-term memory | Current operational context | Active case context, current workflow state, recent observations |
| Attention focus | Current situation | The immediate query, alert, sensor event, or decision point |
This is where the paper becomes surprisingly practical. It explicitly compares the stack to retrieval-augmented generation: a database or log store, a model, a context window, and a prompt or current focus. But the paper’s framing is richer because the memory stack is connected to action, needs, and evidence counts rather than only retrieval.
The long-term episodic layer matters because it preserves evidence. The model layer matters because raw history is too large to act on directly. Short-term memory matters because decisions depend on local context. Attention matters because the agent cannot process everything with equal priority.
The useful lesson is not “RAG is like cognition,” which is the sort of sentence that sounds clever until someone has to maintain the system. The useful lesson is that agent memory needs governance across three limits the paper names: retention horizon, scope of modalities, and precision. In business language: how much history is kept, which signals are included, and how accurately they are stored.
Those are not backend details. They determine whether the agent can notice recurring failures, whether it forgets old but important cases, and whether it can justify its decisions after the fact.
The ping-pong experiment is preliminary evidence, not proof of artificial psyche
The paper’s empirical section uses a minimal single-player ping-pong environment. The agent learns to play against a wall. The environment is deliberately small; nobody should confuse it with enterprise autonomy or general intelligence.
The experiment’s purpose is best read as main preliminary evidence for the proposed learning mechanism, not as an ablation study, benchmark comparison, or robustness test. It asks whether a needs-based learning setup can be implemented and whether changing the priority profile affects learning behavior.
The agent operates in a four-dimensional need space:
| Need dimension | Meaning in the experiment | Evidence role |
|---|---|---|
| Happy | Positive reinforcement from bouncing or hitting successfully | Direct gain signal |
| Sad | Avoidance of negative reinforcement from failure | Direct loss-avoidance signal |
| Novelty | Detection of new states | Exploration incentive |
| Expectedness | Predictability of experienced situations | World-model confidence signal |
The key reported observation is simple but important: when the priority profile gives equal weight to positive feedback and avoidance of negative feedback, learning slows down, and under some models, board configurations, and reinforcement delays, game-skill acquisition becomes impossible. Negative feedback suppresses exploration because attempts at new strategies are punished. When positive feedback is prioritized over negative feedback, learning remains stable across the reported experimental conditions.
That finding is not a universal law. The paper does not provide a large benchmark suite, confidence intervals, or enterprise-scale comparisons. It does not prove that “positive reinforcement always beats balanced feedback.” What it does support is narrower and more interesting: in a need-weighted learning system, the operating profile changes whether the agent explores enough to learn.
That is a result worth carrying into business design. Over-penalized agents can become safe in the least useful way: they stop trying. A compliance-heavy automation system may avoid mistakes by escalating everything. A forecasting agent may avoid novel hypotheses because novelty increases error risk. An operations agent may cling to known procedures long after the process regime has shifted. Perfect caution is not intelligence; it is sometimes just failure wearing a helmet.
What the paper directly shows, and what business readers should infer
The article becomes clearer if we separate the evidence from the extrapolation.
| Claim | What the paper directly supports | Cognaptus business interpretation | Boundary |
|---|---|---|---|
| Needs can be modeled as part of agent state | The formal architecture includes a needs matrix/vector inside the state space | Enterprise agents can encode operational priorities explicitly rather than hiding them in prompts | The paper does not provide an enterprise implementation |
| Motivation can be computed from priority and dissatisfaction | The paper defines motivation through long-term priority and current actualization | Agent configuration should distinguish stable operating priorities from current urgency | Real priority tuning remains a design and governance problem |
| Decision-making should include utility and probability | The formalism selects future states using utility and transition probability | Agents should evaluate both business value and likelihood, not only predicted next action | The exact utility function is not validated across domains |
| Predictability and energy efficiency are meaningful internal needs | The paper includes expectation-reality gap and energy cost in reinforcement | Agents should track surprise and cost as first-class metrics | Measuring these well can be non-trivial in messy workflows |
| Negative reinforcement can suppress exploration | The ping-pong experiment reports slower or failed learning under equal positive/negative weighting | Excessive penalty design can make business agents brittle and passive | The evidence is preliminary and environment-specific |
| Four-layer memory can support lifelong learning | The architecture links episodic logs, models, context, and attention | Agent systems need event logs, learned models, active context, and current focus as governed layers | The paper sketches architecture rather than delivering a production memory system |
This table is the difference between useful reading and AGI theatre. The paper is strongest as a conceptual and computational architecture. It is weaker as empirical proof. That is acceptable, as long as we do not pretend otherwise.
The business value is not synthetic emotion; it is need-aware control
For Cognaptus readers, the most practical application area is not a general-purpose digital mind. It is control under competing objectives.
The paper itself points to process control, industrial automation, smart homes, and human-machine interfaces. That choice is sensible. These domains already have continuous state streams, control signals, constraints, and expensive failure modes. They also often resist pure black-box delegation because operators need to understand why a system recommends an intervention.
A need-aware architecture could support three business functions.
First, alert prioritization. Many systems already produce alerts. The problem is that alerts are cheap to generate and expensive to interpret. A needs-based agent could score alerts not only by anomaly magnitude but by which operational needs they threaten: safety, uptime, cost, compliance, or data quality.
Second, operator recommendation. In a human-machine interface, the agent does not need to seize full control. It can recommend actions with a traceable explanation: the current state, the relevant need, the predicted transition, the evidence count, the expected utility, and the cost. That is less glamorous than autonomous AGI. It is also much easier to sell to adults with liability exposure.
Third, agent governance. Modern AI governance often focuses on policies: do not do X, escalate Y, log Z. Policies are necessary, but they do not fully describe trade-offs. Need profiles could make governance more operational by specifying what the agent should optimize, what it should avoid, when it should explore, and when uncertainty should trigger human review.
The architecture also offers a cleaner ROI framing. The paper’s “survival energy” concept is biological in origin, but in business systems it maps to a conventional unit of resource cost and operational value. Compute cost, human attention, downtime risk, energy use, and delay can all be treated as part of an internal economy. The point is not to reduce everything to money immediately. The point is to force the agent to account for cost rather than treating intelligence as free.
That small act of accounting would already improve many agent designs.
Where the paper is still thin
The limitations are not decorative; they materially affect interpretation.
The first boundary is empirical scale. The experiment is a minimal ping-pong learner. It demonstrates an effect of reinforcement weighting in a small environment, but it does not validate the architecture in real industrial systems, enterprise workflows, multi-agent settings, or open-world environments.
The second boundary is specification difficulty. The paper says needs can be represented as vectors, matrices, or tensors. That is formally flexible, but flexibility is not the same as implementation guidance. In production, the hard question is not whether to define a needs matrix. It is how to define one without making it arbitrary, unstable, or politically convenient.
The third boundary is utility calibration. If the agent’s long-term priority vector is wrong, the system may optimize the wrong thing very confidently. The paper’s framework gives us a place to encode priorities, but it does not solve the social and managerial problem of choosing those priorities.
The fourth boundary is interpretability. The paper proposes a hybrid neuro-symbolic architecture, with possible transfer between associative neural models and symbolic representations. That is an attractive direction, and the memory stack supports it conceptually. But the paper does not show a mature method for extracting reliable, human-configurable logic from learned transition models. The authors list this as future work, and that is exactly where it belongs.
Finally, the paper’s AGI framing is larger than its evidence. That does not make the paper useless. It simply means the strongest reading is architectural, not triumphal.
A useful artificial psyche may look boring from the outside
The most interesting version of this idea may not look like a talking machine with a simulated inner life. It may look like an automation layer that knows when uptime matters more than exploration, when surprise deserves investigation, when compute should be conserved, and when avoiding every possible error creates a larger operational failure.
That is the paper’s practical contribution: it gives agent design a mechanism for internal priority, not just external instruction. Needs become variables. Motivation becomes computable. Utility becomes risk-aware. Memory becomes layered evidence rather than an overloaded chat transcript. The experiment, small as it is, reminds us that the wrong balance of reward and punishment can make a learner less capable, not more disciplined.
The next generation of business agents will not be judged only by whether they can answer questions. They will be judged by whether they can manage trade-offs over time. Prediction is useful. Purpose is harder. Accounting for needs is one way to begin.
And if that sounds less romantic than building a machine psyche, good. Useful systems usually begin when metaphors are forced to pay rent.
Cognaptus: Automate the Present, Incubate the Future.
-
Anton Kolonin and Vladimir Krykov, “Computational Concept of the Psyche,” arXiv:2603.15586v2, 2026. https://arxiv.org/abs/2603.15586 ↩︎