Quality Guarantees

When teams push large text workloads through LLMs (contract triage, lead deduping, safety filtering), they face a brutal choice: pay for the “oracle” model (accurate but pricey) or accept quality drift with a cheaper “proxy”. Model cascades promise both—use the proxy when confident, escalate uncertain items to the oracle—but in practice they’ve been fragile. SUPG and similar heuristics often over‑ or under‑sample, rely on asymptotic CLT assumptions, and miss targets when sample sizes are small. The BARGAIN framework fixes this by combining task‑aware adaptive sampling with tighter finite‑sample tests to certify targets while maximizing utility (cost saved, recall, or precision). The authors report up to 86% more cost reduction vs. SUPG for accuracy‑target (AT) workloads, and similarly large gains for precision‑target (PT) and recall‑target (RT) settings—with rigorous guarantees. ...