LLM Infrastructure

No More Low-Rank Detours: GPart and the Geometry of Fine-Tuning

Adapters are supposed to make fine-tuning simple. A team takes a large pretrained model, freezes most of it, trains a small adapter for customer support, another for invoice extraction, another for compliance review, and so on. The pitch is attractive: less storage, less training cost, faster iteration, fewer excuses from the infrastructure team. Naturally, the adapter becomes the small and tidy object everyone wants to manage. ...

Pooling Resources: UniPool and the MoE Budget Nobody Wanted to Audit

Opening — Why this matters now AI infrastructure has entered its spreadsheet era. Not the glamorous spreadsheet, where revenue projections grow diagonally upward and nobody asks where the assumptions came from. The other spreadsheet: the one where compute cost, memory footprint, inference latency, training instability, and model quality all insist on appearing in the same row. ...

Graph Expectations: Why Context Compression Needs Structure, Not Just Similarity

Opening — Why this matters now The AI industry has developed a charmingly expensive habit: when models struggle with long documents, we buy them larger windows and pretend the problem has been solved. It has not. Long-context LLMs are useful, but longer context is not the same as better context. A model can accept a very large input and still miss the crucial paragraph buried in the middle, over-attend to duplicated evidence, or lose the argumentative spine of a document. The result is familiar to anyone building AI tools for legal review, finance research, policy analysis, procurement, consulting, compliance, or enterprise knowledge work: the model has “read” everything, yet somehow understands the wrong thing. Very modern. Very expensive. ...

The Esperanto of AI Agents: How the Agent Data Protocol Unifies a Fragmented Ecosystem

Every engineering team has met this problem: the useful data exists, but it lives in thirteen different shapes, three different tool conventions, two incompatible logs, and one heroic spreadsheet that nobody dares to open. AI agents have the same disease, only with more acronyms. The paper behind the Agent Data Protocol, or ADP, argues that large-scale supervised fine-tuning of AI agents has been held back less by a lack of data than by a lack of shared representation.1 Agent datasets already exist for coding, software engineering, web browsing, API use, operating-system interaction, and general tool use. The difficulty is that each one tends to encode actions, observations, tool calls, web states, messages, and execution feedback in its own local dialect. Naturally, every dataset is special. How convenient for nobody. ...

GraphRAG Without the Drag: Scaling Knowledge-Augmented LLMs to Web-Scale

TL;DR for operators GraphRAG usually sounds like a clean enterprise promise: put your knowledge into a graph, attach it to a language model, and enjoy more grounded answers. The less glamorous truth is that someone has to build the graph. At web scale, that “someone” is usually an LLM being asked to extract triples from millions or billions of passages, which is a fine idea if the procurement team has recently discovered oil under the server room. ...