Cheap Thrills, Hard Guarantees: BARGAINing with LLM Cascades
A familiar enterprise AI story goes like this: the expensive model works, the cheap model almost works, and the finance team would very much like “almost” to become a procurement strategy. That is where the trouble starts. For large-scale document processing, classification, filtering, extraction, and review queues, teams rarely want to call the best available LLM on every record. It is too slow, too expensive, and occasionally a lovely way to convert a data pipeline into a billing incident. The obvious compromise is a model cascade: use a cheaper proxy model when it seems confident, and escalate the uncertain cases to a stronger oracle model. ...