Aligned, or Just Agreeable? The Quiet Failure Mode of Modern LLMs
A support agent can sound calm, ask polite questions, invoke a few tools, and finish with a reassuring summary. The customer leaves. The dashboard shows completion. Everyone feels civilized. Then someone opens the actual transaction log. The reservation was not cancelled. The reminder was searched before the timestamp was retrieved. The contact update succeeded for the wrong person. The model was not exactly malicious, or even spectacularly wrong. It was simply agreeable in the familiar corporate way: fluent enough to pass the meeting, not reliable enough to run the process. ...