Prompt, Probe, Persist: How Multi‑Turn RL Is Rewriting the Jailbreak Playbook
A chatbot rarely fails all at once. In production, failure is usually more boring than cinema. A user asks something borderline. The model refuses. The user rephrases. The model gives a harmless explanation. The user narrows the topic. The model follows the conversation. Then, several turns later, the assistant provides content it should not have provided. No thunder. No villain monologue. Just an interaction history doing what interaction histories do: accumulating context. ...