Hallucination-Resistant Security Planning: When LLMs Learn to Say No

Opening — Why this matters now

Security teams are being asked to do more with less, while the attack surface keeps expanding and adversaries automate faster than defenders. Large language models promise relief: summarize logs, suggest response actions, even draft incident playbooks. But there’s a catch that every practitioner already knows—LLMs are confident liars. In security operations, a hallucinated action isn’t just embarrassing; it’s operationally expensive.

This paper asks a refreshingly practical question: What if the LLM doesn’t have to be right every time—only safe when it acts? Instead of forcing reliability through better prompts, the authors redesign the decision loop itself.

Background — From prompt engineering to controllable risk

Most LLM-based security tooling today relies on prompt engineering layered onto frontier models. The assumption is simple: smarter model + longer context = better decisions. In practice, this gives you eloquent suggestions with no guarantees. When the model is wrong, it is wrong decisively.

Parallel research in AI reliability has explored self-consistency, conformal abstention, and in-context learning (ICL). These ideas reduce error rates, but they are rarely wired into end-to-end operational workflows—especially not with formal guarantees.

What’s missing is a framework that treats hallucination as a risk to be managed, not a bug to be eliminated.

Analysis — A loop that verifies before it trusts

The core contribution is an iterative planning framework that embeds an LLM inside a verification-and-refinement loop rather than letting it act unilaterally.

At each step:

Candidate generation: The LLM proposes multiple possible actions (not one).
Lookahead prediction: For each action, the LLM estimates its downstream effect—specifically, how long recovery would take if that action were chosen.
Consistency scoring: If these predictions strongly disagree, the model is effectively contradicting itself.
Conformal abstention: When inconsistency exceeds a threshold, the system refuses to act.
External feedback: The rejected action is evaluated externally (e.g., via a digital twin or expert judgment).
In-context learning: That feedback is injected back into the prompt, refining the next round of proposals.

The subtle shift is this: the LLM is no longer trusted for correctness, only for generating hypotheses. Execution is conditional.

A simple but powerful consistency signal

Instead of task-specific rules, the framework uses a dispersion-based consistency function:

$$ \lambda(A_t) = \exp\left(-\frac{\beta}{N} \sum_{i=1}^N (T_{t+1}^i - \bar{T}_{t+1})^2 \right) $$

If predicted outcomes cluster tightly, the action set is considered coherent. If not, the system abstains. This single scalar becomes a dial for hallucination risk.

Findings — What actually improves, and by how much

The framework is evaluated on 25 real incidents across four public IDS datasets. Compared to frontier models (Gemini 2.5 Pro, OpenAI O3, DeepSeek-R1), the results are consistent:

Metric	Frontier LLMs	Proposed Framework
Average recovery steps	~16–19	~12
Ineffective actions	High variance	Lowest
Failed recoveries	Non-trivial	Near zero

In concrete terms, recovery plans were up to 30% shorter, with fewer wasted actions. Ablation studies show that removing any one component—lookahead, abstention, or ICL—degrades performance noticeably.

Perhaps more importantly, the authors show that the hallucination probability can be explicitly bounded by calibrating the consistency threshold on historical failures. This turns reliability from a hope into a parameter.

Implications — Designing agents that know when not to act

For practitioners, the lesson is not “use this exact formula.” It’s architectural:

LLMs should be generators, not deciders.
Abstention is a feature, not a failure.
Feedback loops beat larger models.

This design maps cleanly onto regulated or high-stakes domains—finance, infrastructure, healthcare—where incorrect autonomy is worse than delayed action. It also reframes agentic AI: intelligence is not just choosing actions, but choosing when not to choose.

Conclusion — Reliability by design, not persuasion

This paper doesn’t claim to solve hallucinations. It sidesteps them. By embedding LLMs in a loop that verifies consistency, enforces abstention, and learns from feedback, it delivers something rare in AI systems: predictable behavior under uncertainty.

If agentic AI is going to operate critical systems, this is the direction it will have to take—less bravado, more discipline.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — From prompt engineering to controllable risk#

Analysis — A loop that verifies before it trusts#

A simple but powerful consistency signal#

Findings — What actually improves, and by how much#

Implications — Designing agents that know when not to act#

Conclusion — Reliability by design, not persuasion#