Delegating to the Almost-Aligned: When Misaligned AI Is Still the Rational Choice

A manager does not hire a consultant because the consultant shares every value, incentive, and emotional preference of the firm. The consultant wants fees. The doctor wants throughput. The lawyer wants billable hours. The cloud provider wants usage. Humanity, somehow, survives this scandal.

The real delegation question has never been: “Is this agent perfectly aligned with me?” It is: “Will things go better if I let this agent decide here?”

That small word, here, does most of the work.

The paper “A Decision-Theoretic Approach for Managing Misalignment” formalizes this uncomfortable but useful idea: rational AI delegation is not a purity test. It is a decision under uncertainty, balancing three variables that usually get discussed separately: epistemic accuracy, value alignment, and reach.¹

The paper’s central move is not to say that misalignment is harmless. That would be cute, and wrong. Its point is sharper: the same AI system can be too misaligned for universal autonomy and still be rational to use in a bounded workflow. Alignment is not a switch. It is one term in a delegation calculus.

The delegation problem has three moving parts, not one

The popular alignment story is attractively simple: first align the model, then delegate. This is clean enough to fit on a governance slide, which is usually a warning sign.

The paper replaces that binary story with a principal-agent frame. A principal, Alice, must decide whether to act herself or delegate to an agent, Bob. Bob may differ from Alice in three ways:

Variable	Meaning in the paper	Business translation
Epistemic accuracy	How good the agent’s beliefs are	Does the system know more, forecast better, or classify more reliably?
Value alignment	Whether the agent’s utility matches the principal’s	Does the system optimize the same thing the business actually cares about?
Reach	What decision problems the agent can access	Does the system unlock new opportunities or expose the firm to new failure modes?

Most AI governance discussions over-index on the second row. Alignment matters, obviously. A system optimizing the wrong objective can make a very expensive mess with impressive confidence. But alignment alone does not decide delegation, because non-delegation also has a cost.

A human analyst may be better aligned but slower, narrower, and less informed. An AI agent may be imperfectly aligned but more accurate, cheaper to run, and able to search a larger action space. The rational question is whether the AI’s advantages outweigh the expected cost of its mismatch.

That is the mechanism the paper builds toward.

Universal delegation is where alignment purists are mostly right

The paper first considers the strongest possible form of delegation: trusting an agent across all decision problems in scope.

Under shared values, this already requires an extreme epistemic condition. The principal must totally trust the agent. In simplified terms, whenever the principal learns that the agent expects some option to meet a threshold, the principal must be willing to accept that expectation.

This is not “the AI seems good on benchmarks.” It is closer to: “Whenever the AI says the expected value is high enough, my own conditional belief should move all the way with it.” That is a much stronger standard than ordinary confidence.

Then the paper adds value uncertainty. Now Bob may not only believe different things; he may care about different things. Theorem 3.4 shows that universal delegation requires a kind of posterior alignment: once Alice conditions on Bob’s behavioral profile, her preferences and comparative beliefs must line up with Bob’s.

That sounds abstract, so here is the operational version: if you want to delegate everything, you cannot merely believe the agent is useful. You must believe that, conditional on what drives the agent’s decisions, its choices preserve your own ranking of the relevant acts and probabilities.

This is why broad autonomy remains a hard sell. The problem is not that AI systems fail a vibes-based trust exercise. The formal requirement is brutal even for transparent Bayesian agents. Neural systems with opaque objectives do not make the requirement easier. Funny how that works.

Scoped delegation is where the paper becomes useful

The paper’s more practical insight arrives when we stop asking whether Alice should trust Bob with everything.

A limited decision domain changes the logic. Alice may rationally delegate within one class of problems even if she would reject Bob as a general decision-maker. The paper illustrates this with a rain-bet example: Alice can prefer Bob’s choices over a narrow type of bet even while still disagreeing with Bob conditionally about some decisions.

That is the distinction many deployment debates blur.

A model can be unacceptable as a free-ranging business agent and acceptable as a scoped system that performs invoice triage, flags suspicious transactions, drafts candidate responses, routes support tickets, or monitors operational anomalies. These are not the same delegation act. Treating them as the same is how governance becomes theater with a spreadsheet attached.

The mechanism is:

universal delegation demands near-total trust;
scoped delegation only needs expected improvement inside a defined distribution of tasks;
therefore, the acceptable degree of misalignment depends on the workflow, not on the model in isolation.

This is the paper’s answer to the misconception that an AI should either be accepted once “broadly aligned” or rejected whenever imperfectly aligned. Both instincts are too coarse. The right unit of analysis is the delegation context.

Reach changes the world the agent acts in

The paper’s strongest business-relevant contribution is its treatment of reach.

Reach is easy to misunderstand. It does not merely mean “the agent has more tools.” It means the agent may face a different distribution of decision problems from the principal.

A human employee and an AI sales agent do not just choose differently over the same opportunities. The AI agent may contact more leads, generate more negotiation paths, test more price points, trigger different customer reactions, and create failure modes that the human would never have encountered. Expanded reach creates upside and hazard at the same time.

This matters because delegation is no longer a comparison between Alice and Bob on the same task distribution. It becomes a comparison between:

the problems Alice would face if she acted herself; and
the problems Bob would face if the decision were delegated.

That is where many “AI ROI” claims quietly cheat. They compare performance on the old task, then assume the deployment environment remains unchanged. But a system with broader reach often changes the opportunity set. Sometimes that is the point. Sometimes that is the lawsuit.

The scoring framework turns delegation into expected loss minus expected gain

To make the reach problem tractable, the paper uses a simplified class of binary gambles: accept or reject. The simplification is narrow, but useful. It lets the authors define losses from wrong decisions and gains from correct decisions across a distribution of possible problems.

For a decision rule $D$, the ideal decision at world $\omega$ is to accept gambles whose realized payoff is non-negative. Errors include both accepting bad gambles and rejecting good ones. The paper’s expected loss term is:

$$ L^\mu(D)=\sum_{\omega \in \Omega}\pi(\omega)\int_{D \triangle I_\omega}|g_\omega|d\mu(g) $$

The gain term captures correct accepted gambles:

$$ G^\mu(D)=\int_{D \cap I_\omega}|g_\omega|d\mu(g) $$

The net score is:

$$ S^\mu(D)=L^\mu(D)-G^\mu(D) $$

Lower is better. Delegation is rational when the agent’s score under the delegated problem distribution beats the principal’s score under the self-action distribution:

$$ S^{\mu_{\text{delegate}}}(D_A)\leq S^{\mu_{\text{self}}}(D_\pi) $$

This formula is the paper’s bridge from alignment philosophy to deployment governance. It says: do not ask whether the system is generically aligned enough. Ask whether its expected score is better in the actual delegation environment.

For business use, the framework can be translated into a practical scoring discipline:

Question	What it estimates
What errors will the AI make that humans would not?	Misalignment and model-specific loss
What errors will humans make that the AI avoids?	Accuracy advantage
What new decisions does the AI create or access?	Reach expansion
Are those new decisions better or worse on average?	Distribution shift from delegation
Can mistakes be reversed, audited, or capped?	Loss containment
Is the task repeatable enough to learn from?	Whether online adaptation is realistic

That is not a compliance checklist. It is a way to stop pretending “human in the loop” is a magical amulet.

The examples are diagnostics, not empirical proof

The paper’s Section 4 uses illustrative examples. These are not benchmarks, ablations, or empirical demonstrations. Their purpose is diagnostic: isolate the mechanism by changing one variable at a time.

Paper component	Likely purpose	What it supports	What it does not prove
Noisy expert example	Main illustrative application of accuracy versus noisy misalignment	More information can still fail to justify delegation if the agent’s distorted utility creates enough errors	It does not estimate real AI reliability
Misaligned expert example	Main illustrative application of value mismatch	A different utility function can sometimes improve the principal’s outcome if it avoids severe downside	It does not show misalignment is generally beneficial
Broader-reach expert example	Main illustrative application of reach	Expanded reach can make delegation rational even when decision logic is otherwise identical	It does not guarantee more tools improve deployment
Appendix proof of Theorem 3.4	Formal support for the universal-delegation condition	Broad delegation requires strong posterior alignment under the stated assumptions	It does not cover opaque, non-Bayesian neural agents directly
Online delegation experiment/code	Exploratory extension	Repeated delegation can be framed as a bandit-learning problem	It is not a mature empirical validation suite

The noisy expert example is useful because it blocks a lazy interpretation: “more accurate means delegate.” Alice faces a box with payoff drawn from ${-5,3,8}$ and would always open because the expected value is positive. Bob gets extra information by peeking, but his payoffs are shifted by a noise term. The paper computes Alice’s score as $-2.0$ and Bob’s as approximately $-0.917$. Since lower is better, Alice should not delegate. Bob knows more, but the distorted objective creates enough bad choices to erase the advantage.

The misaligned expert example goes the other way. Alice faces payoffs from ${-400,25,100,225}$ and, acting alone, never opens because the expected monetary value is negative. Bob uses a risk-averse utility function and receives a fixed fee. Under the authors’ setup, Bob’s choices protect Alice from the large negative outcome while still capturing moderate positive outcomes, making delegation better in the model.

There is a small but important reading note here: the main text and Appendix B appear to report different exact net-score values for this misaligned-expert example. The qualitative direction remains the same—delegation wins in that illustration—but the magnitude should not be treated as a robust numerical result. This is precisely why the example should be read as a mechanism demonstration, not as evidence that “misaligned agents are profitable.” Please do not put that sentence in a board deck. The board already has enough problems.

The broader-reach example is cleaner and more central. Alice can access boxes $A_1$ to $A_3$; Bob can access $A_1$ to $A_5$. They share the same beliefs and utility. The only difference is reach.

Agent	Reach	Loss	Gain	Net score
Alice	$A_1$–$A_3$	2.56	2.67	-0.11
Bob	$A_1$–$A_5$	1.53	3.10	-1.57

Bob wins because the additional boxes are positively skewed. Nothing mystical happens. Bob does not become more aligned. Bob does not become smarter. Bob simply has access to a better distribution of decisions.

That is the operational lesson: capability changes the decision landscape. Sometimes the AI is better because it reasons better. Sometimes it is better because it can reach better opportunities. Sometimes it is worse because its reach includes entirely new ways to fail at scale. The same word, “capability,” hides all three.

The business implication is delegation design, not model worship

The paper does not provide a deployment recipe. It provides a way to think more precisely about one.

For business systems, the practical move is from model-level approval to workflow-level delegation scoring. Instead of asking, “Should we use this AI agent?”, ask:

Delegation design question	Why it matters
What exact decision class is being delegated?	Scoped trust can be rational when universal trust is not
What is the baseline human or existing-system score?	Delegation must beat something, not just look impressive
What losses are caused by misalignment?	A capable system can still optimize the wrong objective
What gains come from superior accuracy?	Better forecasts and classification can offset some mismatch
What new opportunities does reach create?	Expanded action space may change the economics
What new harms does reach create?	More tools also means more ways to cause damage
Can the system be monitored or reversed cheaply?	Reversibility changes expected loss
Is the task repeatable enough for learning?	Online delegation works better in low-stakes repeated settings

This also clarifies why “AI agent” is too broad a category for serious governance. A document-routing agent, a trading agent, a hiring-screening agent, and an autonomous procurement negotiator have different distributions of errors, gains, reach, reversibility, and legal exposure. Evaluating them under one global “aligned enough?” label is administratively convenient and analytically lazy.

A more mature organization would maintain a delegation portfolio:

low-reach, high-repeatability tasks can tolerate more experimentation;
high-reach, high-impact tasks require stronger controls;
high-accuracy but misaligned systems may be useful under narrow constraints;
well-aligned but low-reach systems may be safe but economically irrelevant;
broad autonomous systems demand evidence far beyond ordinary benchmark competence.

That last point is where the paper quietly restores discipline. It does not say misalignment is fine. It says misalignment can be priced only when the delegation boundary is clear.

What the paper shows, what Cognaptus infers, and what remains uncertain

The paper directly shows a formal relationship among delegation, belief accuracy, value alignment, and reach under a Bayesian expected-utility framework. It proves demanding conditions for universal delegation and develops a scoring approach for ex ante delegation when problem distributions differ.

Cognaptus infers a governance principle: AI deployment should be evaluated at the level of decision classes and reach boundaries, not only at the level of model alignment. That inference is practical, but it is still an interpretation. The paper does not hand enterprises a calibrated risk model.

Layer	Statement
Direct result	Universal delegation requires very strong trust and posterior alignment conditions.
Direct result	Scoped or expected-value delegation can be rational despite misalignment.
Direct result	Reach changes the distribution of problems and can make delegation better or worse.
Cognaptus inference	AI governance should score delegation contexts, not just certify models.
Cognaptus inference	“Limited autonomy” is not a compromise slogan; it is mathematically natural.
Remaining uncertainty	Real AI systems are not transparent Bayesian expected-utility maximizers.
Remaining uncertainty	Estimating real-world problem distributions, losses, and gains remains difficult.
Remaining uncertainty	Online learning is suitable for repeated low-stakes settings, not one-shot catastrophic decisions.

The boundary matters. A theoretical framework can discipline business thinking without pretending to solve empirical measurement. The paper gives us a clean map; it does not give us the traffic, weather, fuel price, or driver psychology. Annoying, yes. Also normal.

The limitation is not that the model is simple; it is where the simplicity bites

The paper openly assumes Bayesian expected-utility maximizers. That assumption is useful for deriving clean delegation conditions, but it is not how modern AI systems actually behave. Large models can show context-dependent preferences, inconsistent behavior, tool-induced drift, and policy changes after fine-tuning or system-prompt updates.

The binary-gamble setup is also restrictive. Many business decisions are sequential, strategic, multi-agent, and path-dependent. A procurement bot, for example, does not merely accept or reject a gamble. It negotiates, reveals information, triggers counterpart reactions, and changes the future opportunity set.

The reach concept partly anticipates this problem, but the formal scoring remains simplified. For enterprise use, the framework would need empirical estimates of:

task distribution before and after delegation;
model error under operational conditions;
value mismatch under realistic incentives;
cost of false positives and false negatives;
reversibility of bad decisions;
monitoring cost;
legal and reputational tail risk.

This is where the paper is most useful as a conceptual foundation rather than an off-the-shelf governance system. It tells managers what must be measured. It does not make the measurements cheap.

Stop asking whether the AI is aligned enough in general

The question “Is this AI aligned?” is too large to be operational. It encourages either naïve deployment or total paralysis, depending on the temperament of the room.

The better question is narrower and harder to dodge:

For this decision class, under this monitoring regime, with this reach, does delegation improve expected outcomes after accounting for misalignment?

That question will not fit comfortably on a marketing banner. Good. Marketing banners are where nuance goes to be embalmed.

The paper’s deeper contribution is to make bounded delegation respectable. It shows why broad autonomy demands extreme trust, while local delegation can still be rational under uncertainty. That is exactly the distinction businesses need as AI agents move from suggestion engines to decision participants.

The future of AI governance will not be decided by waiting for perfect alignment, nor by pretending misalignment is just a philosophical smell. It will be decided by disciplined delegation design: scope the task, price the mismatch, measure the reach, cap the downside, and only then decide whether the agent gets the wheel.

Cognaptus: Automate the Present, Incubate the Future.

Daniel A. Herrmann, Abinav Chari, Isabelle Qian, Sree Sharvesh, and B. A. Levinstein, “A Decision-Theoretic Approach for Managing Misalignment,” arXiv:2512.15584, 2025, https://arxiv.org/abs/2512.15584. ↩︎

The delegation problem has three moving parts, not one#

Universal delegation is where alignment purists are mostly right#

Scoped delegation is where the paper becomes useful#

Reach changes the world the agent acts in#

The scoring framework turns delegation into expected loss minus expected gain#

The examples are diagnostics, not empirical proof#

The business implication is delegation design, not model worship#

What the paper shows, what Cognaptus infers, and what remains uncertain#

The limitation is not that the model is simple; it is where the simplicity bites#

Stop asking whether the AI is aligned enough in general#