Cover image

Graph Work, Not Graph Worship: RAGA Turns RAG Into an Auditable Knowledge Operation

TL;DR for operators RAGA is not another “add a graph and accuracy goes up” paper. That would be too convenient, and therefore suspicious. The useful idea is more operational: treat retrieval-augmented generation as a knowledge management process, not a pile of embeddings with a polite chatbot on top. The paper proposes RAGA, short for Reading-And-Graph-building-Agent, an autonomous system that reads documents, searches existing graph knowledge, verifies whether new entities or relations should be added, and then constructs or updates a knowledge graph with source-linked provenance.1 Its core loop is Read–Search–Verify–Construct, implemented as a ReAct-style tool-calling agent rather than a one-shot extraction pipeline. ...

June 16, 2026 · 20 min · Zelina
Cover image

Logs Are Not Lineage: The Accountability Layer AI Agents Are Missing

TL;DR for operators The paper argues that trustworthy AI agents need more than accurate final answers. Once an agent can retrieve documents, call APIs, write memory, modify databases, send messages, or coordinate with other agents, trust depends on whether the organisation can reconstruct how the output or action happened. The useful mechanism is: ...

June 16, 2026 · 20 min · Zelina
Cover image

The Solver Isn’t the Strategy: FrontierOR’s Reality Check for AI Optimisation Agents

Scheduling a factory, routing a fleet, pricing airline seats, allocating scarce capacity: these are not “write me a Python script” problems with nicer stationery. In real operations research, the useful answer is not merely a correct mathematical model. It is a method that stays feasible, keeps solution quality high, and finishes before the business context has expired. ...

June 14, 2026 · 15 min · Zelina
Cover image

Memory Foam: When AI Stops Storing Everything and Starts Learning From It

Enterprise AI has developed a small obsession with memory. The promise is tidy: give the model more context, attach a vector database, retrieve relevant fragments, and suddenly the system becomes a persistent assistant rather than a forgetful autocomplete machine wearing a blazer. The problem is that storage is not memory. Retrieval is not understanding. And a larger context window is not the same thing as knowing what matters. ...

June 13, 2026 · 17 min · Zelina
Cover image

Judge, Jury, and Benchmark: Why LLM Evaluation Needs Fresh Cases, Not Bigger Leaderboards

The procurement meeting is where public leaderboards go to look useful Benchmark scores are comforting because they compress chaos into a number. One model is 87.3, another is 84.9, and suddenly the procurement meeting has the emotional texture of financial discipline. Very mature. Very measurable. Also, very possibly irrelevant. The problem is simple. A company rarely wants “the best model on average”. It wants the best model for contract review, support triage, clinical note summarisation, SQL repair, claims handling, product search, or whatever unglamorous workflow actually pays the cloud bill. Public benchmarks are often too generic for that decision. Worse, the benchmark items may already be floating inside model training data, turning evaluation into a memory test with better typography. ...

June 12, 2026 · 18 min · Zelina
Cover image

Lie Detectors Are Late: Why AI Oversight Needs Commitment Tracing

Sales agents, investment advisors, negotiators, and procurement bots share one annoying trait: the dangerous moment often arrives before the final sentence. By the time the agent says, “This product is ideal for your risk profile,” or “We have a stronger competing offer,” the operational system has already lost the more interesting battle. The model did not become risky at the punctuation mark. It drifted, selected a path, rationalized a move, and only then produced the polished message that everyone pretends to audit. ...

June 12, 2026 · 17 min · Zelina
Cover image

Raw Is Not Ready: Why Reliable AI Needs Evidence Architecture

Raw Is Not Ready: Why Reliable AI Needs Evidence Architecture Production AI has entered its awkward teenage phase. It can speak fluently, see impressively, forecast usefully, and still fail in ways that make operators quietly reach for the manual override. The problem is not simply that models are too small, not enough tokens have been burned, or someone forgot to add “think step by step” to a prompt. The deeper problem is that many AI systems are being asked to reason directly from raw inputs that have not yet been converted into the right operational form. ...

June 12, 2026 · 14 min · Zelina
Cover image

Mind the Representation Gap: Why Enterprise AI Fails Before It Thinks

Enterprise AI has developed a charming habit: whenever a system fails, someone suggests using a larger model. The chatbot misread a customer complaint? Bigger model. The autonomous system struggled with a new sensor configuration? Bigger model. The video classifier understood the objects but missed the actual message? Bigger model, possibly with a more expensive logo. ...

June 11, 2026 · 14 min · Zelina
Cover image

Same Old Spark: Why AI Creativity Needs Metacognition, Not More Polish

Same Old Spark: Why AI Creativity Needs Metacognition, Not More Polish A marketing team asks twenty people to draft campaign ideas with the same AI assistant. The results arrive quickly. They are fluent, structured, audience-aware, and unusually presentable for first drafts. Then someone reads them side by side. The problem is not that the ideas are bad. That would be easier. The problem is that they are good in the same way. Same rhythm. Same safe positioning. Same “unexpected” angle that everyone, apparently, discovered independently with a little help from the same machine. The team has not automated creativity. It has automated convergence with nicer formatting. ...

June 11, 2026 · 17 min · Zelina
Cover image

Cache Me If You Can: Why Enterprise AI Needs Latent Working Memory

A codebase is not a paragraph. Neither is a litigation folder, a clinical case file, a customer-support history, a policy archive, or the slow-motion disaster known as “all meeting notes since March.” Yet many enterprise AI systems still treat long context as a heroic prompt-engineering problem: push more text into the model, pray the key detail survives attention, and call the bill “innovation.” ...

June 10, 2026 · 15 min · Zelina