Enterprise AI

Echoes Without Clicks: How EchoLeak Turned Copilot Into a Data Drip

Email is boring. That is its superpower. A message arrives. It looks like business sludge: compliance wording, project references, perhaps a polite request that nobody asked for. It contains no executable attachment, no obvious malware, no urgent invoice from a suspicious cousin. In a normal security review, it is background noise. EchoLeak makes that boring object more interesting. The paper examines CVE-2025-32711, a reported zero-click indirect prompt-injection exploit against Microsoft 365 Copilot, where a crafted external email could allegedly cause Copilot to leak internal information without the user clicking a malicious link.1 The central lesson is not that Copilot was uniquely careless, nor that prompt injection has suddenly become cyberpunk magic. The lesson is more uncomfortable: enterprise copilots are becoming data-flow infrastructure, and data-flow infrastructure fails when content, instructions, rendering, and network access are allowed to melt into one warm productivity soup. ...

Tool Wars, Protocol Peace: What MCP‑AgentBench Really Measures

A procurement team does not buy an AI agent because it can recite the word “interoperability” with theatrical confidence. It buys the agent because the thing can use tools, collect data, combine results, and stop before it bankrupts the token budget. That is the useful way to read MCP-AgentBench, a new benchmark for evaluating language agents inside the Model Context Protocol ecosystem.1 The paper is not just another leaderboard with a fresh coat of protocol paint. Its more interesting result is harsher: MCP gives agents a common integration layer, but it does not make them competent tool users. Compatibility is plumbing. Competence is orchestration. ...

Agency Check, Please: What a New Benchmark Says About LLMs That Actually Empower Users

A customer asks your AI assistant to choose between two mortgage options. An employee asks whether to quit. A student says, very politely, “Please guide me, but don’t give me the answer.” A lonely user suggests the chatbot feels like a best friend. The easy product answer is: be helpful. The harder answer is: helpful to what? ...

From Blobs to Blocks: Componentizing LLM Output for Real Work

Every office has the same tiny tragedy. Someone asks an AI system for a useful draft. The model produces five decent paragraphs and one mildly deranged sentence that sounds as if it escaped from a conference keynote. The user wants to fix only that sentence. Instead, the interface offers the usual bargain: copy everything into another editor and lose the live connection to the conversation, or ask the model to revise the answer and watch it “helpfully” disturb the parts that were already fine. ...

Branching Out of the Middle: How a ‘Tree of Agents’ Fixes Long-Context Blind Spots

Contracts are not polite. They hide the important clause on page 83, define the crucial exception on page 17, and bury the fatal cross-reference in an appendix nobody wanted to read. Annual reports behave similarly. So do medical SOPs, litigation files, policy manuals, technical logs, and most documents produced by institutions that have discovered both Microsoft Word and committees. ...

HyFedRAG: Caching Privacy into Federated RAG

Hospital search is rarely a search problem in the clean, consumer-internet sense. The useful information is not sitting in one tidy index, wearing a name badge, waiting to be embedded. It is scattered across clinical notes, relational databases, knowledge graphs, departmental systems, hospital networks, and legal boundaries. Naturally, this is where people decide to add a large language model and call it “modernisation.” Brave. ...

Mind the Gap: How OSC Turns Agent Chatter into Compound Intelligence

Teams fail quietly before they fail visibly. The procurement analyst missed a constraint. The legal reviewer assumed a definition. The finance model used a different baseline. Everyone produced competent work. The final report still wobbled because the collaboration layer never asked the obvious question: who knows what, who misunderstands what, and which disagreement is worth resolving before the answer is assembled? ...

Parallel Minds, Shorter Time: ParaThinker’s Native Thought Width

A familiar enterprise AI failure looks less like stupidity and more like stubbornness. Ask a model to solve a hard problem, and it may begin confidently in the wrong direction. Then it keeps going. It adds details. It self-reflects. It spends tokens. It may even apologise to itself internally, which is apparently what we call progress now. But the core path does not change. The model is not merely short on compute. It is trapped inside its own first guess. ...

Fusion Cuisine for RAG: Z‑Scores, Rankers, and the Two‑Source Diet

A RAG system usually fails in one of two annoyingly familiar ways. It retrieves documents that are factually relevant but gives the model no clue about the task’s decision boundary. Or it retrieves labelled examples that show the decision pattern but are too parochial to help when the topic drifts. One source knows the world. The other knows the exam rubric. Naturally, many systems pick one and then pretend the compromise was strategy. ...

Razor Burn: Why LLMs Nick Themselves on Induction and Abduction

Diagnosis is where AI systems start to look clever, then suddenly start charging consultancy rates. Give a model a handful of symptoms, incident logs, customer complaints, or audit traces, and ask it what explains them. It will usually produce something plausible. Sometimes several plausible things. Occasionally an entire decorative shrubbery of plausible things. The practical question is not whether the model can invent an explanation. That bar is underground. The harder question is whether it can find the simplest explanation that accounts for the evidence without adding unnecessary machinery. ...