The Tail That Wags the Model: Why p99 Latency Should Run Your LLM
A demo can survive a slow answer. A production service cannot survive the slow answer that arrives just often enough to make users stop trusting the product. That is the quiet problem behind p99 latency. The average response time tells you how the service feels on a normal day. p99 tells you what happens to the unlucky one percent: the support agent waiting in front of a customer, the analyst refreshing a dashboard, the employee whose workflow now includes watching a spinner and reconsidering their life choices. ...