Cover image

SAGA, Not Sci‑Fi: When LLMs Start Doing Science

Opening — Why this matters now For years, we have asked large language models to explain science. The paper behind SAGA asks a more uncomfortable question: what happens when we ask them to do science instead? Scientific discovery has always been bottlenecked not by ideas, but by coordination — between hypothesis generation, experiment design, evaluation, and iteration. SAGA reframes this entire loop as an agentic system problem. Not a chatbot. Not a single model. A laboratory of cooperating AI agents. ...

December 29, 2025 · 3 min · Zelina
Cover image

When Bandits Get Priority: Learning Under Scarce, Tiered Capacity

Opening — Why this matters now Large Language Models, edge computing platforms, and cloud inference systems all share a quiet but inconvenient truth: resources are scarce, and not everyone is equal. Some tasks pay more. Some users matter more. Some workloads jump the queue. Yet much of the bandit literature still assumes a polite world—where arms dispense rewards independently, capacity is either infinite or fixed, and every pull is treated equally. That abstraction collapses the moment you introduce priorities, stochastic capacity, and multiple simultaneous plays. ...

December 29, 2025 · 4 min · Zelina
Cover image

Cheap Thrills, Hard Guarantees: BARGAINing with LLM Cascades

When teams push large text workloads through LLMs (contract triage, lead deduping, safety filtering), they face a brutal choice: pay for the “oracle” model (accurate but pricey) or accept quality drift with a cheaper “proxy”. Model cascades promise both—use the proxy when confident, escalate uncertain items to the oracle—but in practice they’ve been fragile. SUPG and similar heuristics often over‑ or under‑sample, rely on asymptotic CLT assumptions, and miss targets when sample sizes are small. The BARGAIN framework fixes this by combining task‑aware adaptive sampling with tighter finite‑sample tests to certify targets while maximizing utility (cost saved, recall, or precision). The authors report up to 86% more cost reduction vs. SUPG for accuracy‑target (AT) workloads, and similarly large gains for precision‑target (PT) and recall‑target (RT) settings—with rigorous guarantees. ...

September 6, 2025 · 5 min · Zelina