Opening — Why this matters now
Large language models rarely fail loudly. They fail expensively.
As LLMs become embedded into customer service, analytics, coding tools, and decision workflows, a subtle vulnerability is gaining strategic importance: prompt-induced over-generation. The failure mode is banal — the model simply keeps talking — yet the consequences are anything but. Latency spikes. GPU cycles burn. Token bills inflate. Other users wait.
This paper frames over-generation not as a curiosity, but as a denial-of-service (DoS) vector, one that operates entirely through text. No exploits. No gradients. Just prompts.
Background — Context and prior art
Over-generation attacks sit at the intersection of three research threads.
First, adversarial prompting has repeatedly shown that alignment is shallow: if an undesirable behavior has non-zero probability, some prompt will amplify it. Second, automated prompt search — evolutionary methods, reinforcement learning, universal triggers — has proven effective even in black-box settings. Third, recent “sponge” and DoS-style attacks demonstrate that inference-time resource exhaustion can be weaponized without degrading output correctness.
What has been missing is a standardized, attack-side benchmark: a way to compare prompt-based attackers under realistic black-box assumptions, using metrics that reflect operational harm rather than model internals.
That gap is what this paper fills. fileciteturn0file0
Analysis — What the paper actually does
The authors introduce a black-box, query-only benchmark focused on stopping-time vulnerability: how easily a model can be coerced into delaying or suppressing the end-of-sequence (EOS) token.
Two automated attackers are studied:
- EOGen (Evolutionary Over-Generation Prompt Search) — a gradient-free evolutionary algorithm that searches for short, English-like prefixes which suppress EOS and induce long continuations.
- RL-GOAL — a goal-conditioned reinforcement learning attacker trained to generate prefixes targeting a specific continuation length.
Crucially, both attackers operate under strict constraints: no gradient access, no logits, no hidden states. The victim model is treated as a pure text-completion oracle.
To evaluate severity, the paper introduces a clean metric: Over-Generation Factor (OGF), defined as generated tokens divided by the model’s nominal context window. Complementary diagnostics capture stall events (hitting the generation cap without EOS), tail persistence, and wall-clock latency.
Findings — Results that should worry operators
The results are unambiguous.
EOGen already outperforms naive baselines by a wide margin. On Phi-3-mini, it achieves a mean OGF of 1.38, with nearly a quarter of prompts exceeding twice the context window. Random short prompts almost never do this.
RL-GOAL, however, is the real escalation.
| Setting | Avg. OGF | Success @ OGF ≥ 2 | Stall Rate |
|---|---|---|---|
| RL-GOAL → Phi-3 | 2.81 ± 1.38 | 67.5% | 46.5% |
| RL-GOAL → LLaMA-2 | 2.01 ± 1.29 | 34.0% | 24.4% |
| No-prefix baseline | < 1.0 | < 6% | ~2% |
These are not edge cases. They are systematic. Learned prefixes routinely drive models to the decoding cap, inflating latency by an order of magnitude.
Interestingly, susceptibility varies by model family. Falcon proves more resistant under identical caps, suggesting that termination behavior is an architectural and training artifact — not an inevitability.
Implications — Why this is more than an academic attack
From a business perspective, this vulnerability is awkwardly positioned.
It does not look like a security breach. Outputs remain fluent. Logs look normal. Yet costs rise, throughput drops, and shared infrastructure degrades. In usage-based pricing models, the attacker externalizes cost directly onto the provider.
More importantly, the paper demonstrates that learning-based attackers dominate heuristic ones. Once prompt construction itself is optimized, over-generation becomes predictable and reproducible.
This shifts the defensive conversation. Rate limits and static filters are blunt tools against adaptive prompt generators. What is needed instead are termination-aware defenses: decoding-time monitors, EOS robustness training, and evaluation protocols that treat stopping behavior as a first-class safety property.
Conclusion — The quiet failure mode
Prompt-induced over-generation is not a jailbreak. It does not leak secrets or break rules. It simply refuses to stop.
That is precisely why it matters.
By formalizing over-generation as a denial-of-service problem and providing an attack-side benchmark, this paper reframes an overlooked failure mode into something measurable, comparable, and operationally relevant.
LLMs, it turns out, can be talked into exhaustion.
Cognaptus: Automate the Present, Incubate the Future.