Cover image

Cache Me If You Can: Why Enterprise AI Needs Latent Working Memory

A codebase is not a paragraph. Neither is a litigation folder, a clinical case file, a customer-support history, a policy archive, or the slow-motion disaster known as “all meeting notes since March.” Yet many enterprise AI systems still treat long context as a heroic prompt-engineering problem: push more text into the model, pray the key detail survives attention, and call the bill “innovation.” ...

June 10, 2026 · 15 min · Zelina
Cover image

Graph Expectations: Why Context Compression Needs Structure, Not Just Similarity

Opening — Why this matters now The AI industry has developed a charmingly expensive habit: when models struggle with long documents, we buy them larger windows and pretend the problem has been solved. It has not. Long-context LLMs are useful, but longer context is not the same as better context. A model can accept a very large input and still miss the crucial paragraph buried in the middle, over-attend to duplicated evidence, or lose the argumentative spine of a document. The result is familiar to anyone building AI tools for legal review, finance research, policy analysis, procurement, consulting, compliance, or enterprise knowledge work: the model has “read” everything, yet somehow understands the wrong thing. Very modern. Very expensive. ...

May 1, 2026 · 12 min · Zelina
Cover image

Trees That Think Faster: Adaptive Compression for the Long-Context Era

Long context is a lovely product promise until the invoice arrives. Every enterprise AI demo eventually wants the same magic trick: read the whole contract archive, remember every customer interaction, inspect every ticket, keep all meeting notes alive, and answer as if the model has a tidy brain instead of a very expensive attention matrix. The sales slide says “128K context.” The infrastructure team hears “latency, memory, and GPU burn.” Both are correct. One is merely dressed better. ...

December 7, 2025 · 17 min · Zelina