Cover image

From Tokens to Teaspoons: What a Prompt Really Costs

TL;DR for operators A Google paper on Gemini Apps reports that the median text prompt in May 2025 consumed 0.24 Wh, generated 0.03 gCO2e, and consumed 0.26 mL of water under a comprehensive production-serving measurement boundary.1 That is small. Very small. Less “boil the kettle” and more “squint at a television for nine seconds.” ...

August 24, 2025 · 17 min · Zelina
Cover image

Agents on the Wire: Protocols, Memory, and Guardrails for Real-World Agentic AI

TL;DR for operators An agent demo usually fails in production for boring reasons. Not because the model suddenly forgot how to reason. Because the agent cannot reliably discover another agent, remember the right state, expose a stable contract, validate risky outputs, or execute generated code without turning the server into an involuntary escape room. ...

August 18, 2025 · 17 min · Zelina
Cover image

From Chaos to Choreography: The Future of Agent Workflows

TL;DR for operators A new survey on agent workflows is not useful because it tells us agents are becoming important. Anyone still surprised by that has probably been trapped in a quarterly innovation committee. Its value is more practical: it turns the messy agent-tool-platform landscape into a comparison map for deciding what kind of workflow infrastructure a business is actually buying or building.1 ...

August 9, 2025 · 18 min · Zelina
Cover image

From Tadpole to Titan: How DEVFT Grows LLMs Like a Brain

TL;DR for operators Federated LLM fine-tuning sounds attractive until someone asks the rude operational question: who is actually paying for the compute, memory, and communication on the devices? The paper behind DevFT proposes a useful answer: do not fine-tune the full model end-to-end from the first round. Start with a compact submodel, train it federatively, transfer the learned LoRA parameters forward, then expand the model in stages until it reaches the full target size.1 The authors call this Developmental Federated Tuning, and yes, the developmental psychology metaphor is a little enthusiastic. Fortunately, the mechanism is more interesting than the metaphor. ...

August 4, 2025 · 16 min · Zelina
Cover image

Merge Without Mayhem: How Orthogonal Deltas Could Revolutionize Model Composition

TL;DR for operators Model composition usually sounds harmless until someone asks the obvious production question: “Can we remove that client-specific update without retraining the whole thing?” At that point, many elegant AI stacks quietly become sedimentary rock. The MDM-OC paper proposes a cleaner model lifecycle: keep a shared base model, express every fine-tuned specialist as a task delta, orthogonalize those deltas so they interfere less, merge them with tuned coefficients, and subtract a selected delta later when a capability, customer, or data source needs to be removed.1 The important claim is not “we found another averaging recipe.” The claim is that model updates can be treated as separable components in parameter space. ...

August 2, 2025 · 20 min · Zelina
Cover image

Reasoning at Scale: How DeepSeek Redefines the LLM Playbook

TL;DR for operators DeepSeek-R1 is not a story about one model suddenly becoming clever because someone found the secret lever labelled “reason harder”. It is a systems story: take a strong base model, reward it on problems where correctness can be checked, let longer reasoning traces emerge, repair the ugly parts with cold-start data and alignment, then distil the resulting behaviour into smaller models where deployment economics actually matter.1 ...

July 15, 2025 · 14 min · Zelina
Cover image

The Outlier Is a Lie: Quantization Breakthroughs with OSP

TL;DR for operators If your deployment plan depends on squeezing a language model into cheap inference hardware, this paper is worth reading because it changes the timing of the quantization problem. Most quantization work asks: “How do we repair a model after training so it survives 4-bit inference?” Outlier-Safe Pre-Training asks a more irritating question: “Why did we train a quantization-hostile model in the first place?”1 ...

June 25, 2025 · 18 min · Zelina