DRIFT-BENCH: When Agents Stop Asking and Start Breaking
A user says, “Update the record with a sensible value.” That sentence is small. The damage may not be. For a normal chatbot, the worst outcome might be a vague answer wearing a confident expression. Annoying, yes, but usually recoverable. For an agent connected to a database, file system, workflow platform, or API service, the same ambiguity becomes operational. The model may update the wrong row, call the wrong endpoint, overwrite a file, or politely explain its mistake after making it. Charming, in the same way a self-driving forklift is charming. ...