Autonomous Agents

When Maps Start Thinking: Teaching Agents to Plan in Time and Space

Opening — Why this matters now AI can already write poetry, debug code, and argue philosophy. Yet ask most large language models to plan a realistic trip—respecting time, geography, traffic, weather, and human constraints—and they quietly fall apart. Real-world planning is messy, asynchronous, and unforgiving. Unlike math problems, you cannot hallucinate a charging station that does not exist. ...

Browsing Without the Bloat: Teaching Agents to Think Before They Scroll

Opening — Why this matters now Large Language Models have learned to think. Then we asked them to act. Now we want them to browse — and suddenly everything breaks. Deep research agents are running head‑first into a practical wall: the modern web is not made of tidy pages and polite APIs. It is dynamic, stateful, bloated, and aggressively redundant. Give an agent a real browser and it drowns in tokens. Don’t give it one, and it misses the most valuable information entirely. ...

The Invariance Trap: Why Matching Distributions Can Break Your Model

Opening — Why this matters now Distribution shift is no longer a corner case; it is the default condition of deployed AI. Models trained on pristine datasets routinely face degraded sensors, partial observability, noisy pipelines, or institutional drift once they leave the lab. The industry response has been almost reflexive: enforce invariance. Align source and target representations, minimize divergence, and hope the problem disappears. ...

When the Paper Talks Back: Lost in Translation, Rejected by Design

Opening — Why this matters now Academic peer review is buckling under scale. ICML alone now processes close to ten thousand submissions a year. In response, the temptation to insert LLMs somewhere into the review pipeline—screening, triage, or scoring—is understandable. Efficiency, after all, is a persuasive argument. Unfortunately, efficiency is also how subtle failures scale. This paper asks an uncomfortable but necessary question: what happens when the paper being reviewed quietly talks back to the model reviewing it? Not loudly. Not visibly. Just enough to tip the scales. ...

MIRAGE-VC: Teaching LLMs to Think Like VCs (Without Drowning in Graphs)

Opening — Why this matters now Venture capital has always been a strange mix of narrative craft and network math. Partners talk about vision, conviction, and pattern recognition, but behind the scenes, outcomes are brutally skewed: most startups fail quietly, a few dominate returns, and almost everything depends on who backs whom, and in what order. ...

Regrets, Graphs, and the Price of Privacy: Federated Causal Discovery Grows Up

Opening — Why this matters now Federated learning promised a simple trade: keep data local, share intelligence globally. In practice, causal discovery in federated environments has been living off a polite fiction — that all clients live in the same causal universe. Hospitals, labs, or business units, we are told, differ only in sample size, not in how reality behaves. ...

Replay the Losses, Win the Game: When Failed Instructions Become Your Best Training Data

Opening — Why this matters now Reinforcement learning for large language models has a dirty secret: most of the time, nothing happens. When tasks demand perfect instruction adherence—formatting, style, length, logical constraints—the model either nails everything or gets a zero. Binary rewards feel principled, but in practice they starve learning. Aggregated rewards try to help, but they blur causality: different mistakes, same score, same gradient. The result is slow, noisy, and often misdirected optimization. ...

Think Wide, Then Think Hard: Forcing LLMs to Be Creative (On Purpose)

Opening — Why this matters now Large language models are prolific. Unfortunately, they are also boring in a very specific way. Give an LLM a constrained task—generate a programming problem, write a quiz, design an exercise—and it will reliably produce something correct, polite, and eerily similar to everything it has produced before. Change the temperature, swap the model, even rotate personas, and the output still clusters around the same conceptual center. ...

Many Minds, One Decision: Why Agentic AI Needs a Brain, Not Just Nerves

Opening — Why this matters now Agentic AI has officially crossed the line from clever demo to operational liability. We are no longer talking about chatbots that occasionally hallucinate trivia. We are deploying autonomous systems that decide, act, and trigger downstream consequences—often across tools, APIs, and real-world processes. In that setting, the old comfort blanket of “the model said so” is no longer defensible. ...

OrchestRA and the End of Linear Drug Discovery

Opening — Why this matters now Drug discovery has a reputation problem. It is slow, expensive, and structurally brittle. Despite exponential growth in biomedical data and modeling tools, R&D productivity has declined for decades. The core reason is not lack of intelligence — human or artificial — but fragmentation. Biology, chemistry, and pharmacology still operate like loosely coupled departments passing half-finished work downstream. ...