Agents

USB‑C for Agents, Stress‑Tested: What MCP‑Universe Really Reveals

The pitch: a unified plug—and a tougher test The Model Context Protocol (MCP) is often described as the “USB‑C of AI tools”: one standardized way for agents to talk to external services (maps, finance data, browsers, repos, etc.). MCP‑Universe, a new benchmark from Salesforce AI Research, finally stress‑tests that idea with real MCP servers rather than toy mocks. It derives success from execution outcomes, not multiple‑choice guesswork—exactly what enterprises need to trust automation. ...

Memory With Intent: Why LLMs Need a Cognitive Workspace, Not Just a Bigger Window

TL;DR Today’s long-context and RAG systems scale storage, not thinking. Cognitive Workspace (CW) reframes memory as an active, metacognitive process: curate, plan, reuse, and consolidate. In tests, CW reports ~55–60% memory reuse and 17–18% net efficiency gains despite a 3.3× operation overhead—precisely because it thinks about what to remember and why. The Setup: Context ≠ Cognition Over the past 18 months we’ve cheered >1M-token windows and slicker attention kernels. But piling tokens into a context is like dumping files on a desk; it’s storage without stewardship. In knowledge work, what moves the needle is not how much you can “see” but how well you organize, recall, and reuse—with intent. ...

Paging Dr. Model: When AI Runs the Workup

What if the AI didn’t just answer a question—it ordered the right tests, asked for the right observations, and stopped when it had enough to call the case? A new paper introduces DxDirector-7B, a 7B-parameter medical LLM trained to act as the director of care, not the assistant. Instead of waiting for a physician to assemble clean inputs, the model starts from the patient’s vague chief complaint (e.g., “tummy pain and tired”) and then plans the diagnostic pathway, requesting only those clinician actions that software cannot perform (physical exams, labs, imaging). The goal is twofold: maximize diagnostic accuracy and minimize human workload. ...

The Retrieval-Reasoning Tango: Charting the Rise of Agentic RAG

In the AI race to make large language models both factual and reasoned, two camps have emerged: one focused on retrieval-augmented generation (RAG) to fight hallucination, the other on long-chain reasoning to mimic logic. But neither wins alone. This week’s survey by Li et al. (2025), Towards Agentic RAG with Deep Reasoning, delivers the most comprehensive synthesis yet of the field’s convergence point: synergized RAG–Reasoning. It’s no longer a question of whether retrieval helps generation or reasoning helps retrieval—but how tightly the two can co-evolve, often under the coordination of autonomous agents. ...

Wall Street’s New Intern: How LLMs Are Redefining Financial Intelligence

The financial industry has always prided itself on cold precision. For decades, quantitative models and spreadsheets dominated boardrooms and trading desks. But that orthodoxy is now under siege. Not from another statistical breakthrough, but from something surprisingly human-like: Large Language Models (LLMs). Recent research shows a dramatic shift in how AI—particularly LLMs like GPT-4 and LLaMA—is being integrated across financial workflows. Far from just summarizing news or answering earnings call questions, LLMs are now organizing entire investment pipelines, fine-tuning themselves on proprietary data, and even collaborating as autonomous financial agents. A recent survey by Mahdavi et al. (2025) categorized over 70 state-of-the-art systems into four distinct architectural frameworks, offering us a lens through which to assess the future of financial AI. ...