Optimization

The Art of Forgetting: Why Smarter AI Agents Need Selective Amnesia

Opening — Why this matters now Everyone is obsessed with making AI remember more. Longer context windows. Persistent memory. Multi-session agents that “never forget.” It sounds impressive—until your system starts hallucinating outdated facts, dragging irrelevant context into decisions, and slowing down under its own cognitive weight. The uncomfortable truth is this: memory is not an asset unless it is curated. ...

Benchmarking the Benchmarks: When AI Can’t Agree on the Rules

Opening — Why this matters now AI systems are increasingly asked to optimize not one objective, but many—speed, cost, safety, fairness, energy usage, latency. In theory, this is progress. In practice, it creates a quiet problem: we no longer agree on what “good” means. Multi-objective optimization is no longer a niche academic curiosity. It is embedded in logistics platforms, robotic planning, financial routing, and increasingly, agentic AI systems that must balance competing goals under uncertainty. ...

Reflection in the Dark: When Prompt Optimization Forgets to Think

Opening — Why this matters now Everyone wants automatic prompt optimization. No one wants to admit it behaves like a very confident intern with no memory. As LLM-based systems move from demos to production pipelines, prompt tuning is no longer an artisanal craft—it’s a scaling bottleneck. APO (Automatic Prompt Optimization) promises to replace intuition with iteration. In theory, elegant. In practice, quietly brittle. ...

From Data to Atoms: How CliqueFlowmer Turns AI Into a Materials Inventor

Opening — Why this matters now For decades, discovering new materials has been painfully slow. The process typically involves theorizing candidate compounds, simulating their properties, synthesizing them in laboratories, and testing whether the results resemble the prediction. This loop—hypothesis, simulation, experiment—can take months or even years for a single promising compound. Artificial intelligence promised to accelerate this process. Yet most generative AI systems used in computational materials discovery behave like cautious imitators: they reproduce variations of materials already present in training datasets rather than aggressively searching for better ones. ...

When Privacy Meets Chaos: Making Federated Learning Behave

Opening — Why this matters now Federated learning was supposed to be the grown-up solution to privacy anxiety: train models collaboratively, keep data local, and everyone sleeps better at night. Then reality arrived. Real devices are heterogeneous. Real data are wildly Non-IID. And once differential privacy (DP) enters the room—armed with clipping and Gaussian noise—training dynamics start to wobble like a poorly calibrated seismograph. ...

DeltaEvolve: When Evolution Learns Its Own Momentum

Opening — Why this matters now LLM-driven discovery systems have crossed an uncomfortable threshold. They no longer fail because models cannot generate ideas, but because they cannot remember the right things. AlphaEvolve, FunSearch, and their successors proved that iterative code evolution works. What they also revealed is a structural bottleneck: context windows are finite, expensive, and poorly used. ...

When SGD Remembers: The Hidden Memory Inside Training Dynamics

Opening — Why this matters now Modern deep learning quietly assumes a comforting fiction: that training is memoryless. Given the current parameters (and maybe the optimizer buffers), tomorrow’s update shouldn’t care about yesterday’s data order, augmentation choice, or micro-step path. This assumption underwrites theory, stabilizes intuition, and keeps whiteboards clean. Reality, however, has been less cooperative. Practitioners know that order matters, momentum carries ghosts of past gradients, and small curriculum tweaks can echo far longer than expected. Yet until now, there has been no clean, operational way to measure whether training truly forgets—or merely pretends to. ...

Many Minds, One Solution: Why Multi‑Agent AI Finds What Single Models Miss

Opening — Why this matters now Multi-agent LLM systems are everywhere: debate frameworks, critic–writer loops, role-based agents, orchestration layers stacked like an over-engineered sandwich. Empirically, they work. They reason better, hallucinate less, and converge on cleaner answers. Yet explanations usually stop at hand-waving: diversity, multiple perspectives, ensemble effects. Satisfying, perhaps—but incomplete. This paper asks a sharper question: why do multi-agent systems reach solutions that a single agent—given identical information and capacity—often cannot? And it answers it with something rare in LLM discourse: a clean operator-theoretic explanation. ...

Lean LLMs, Heavy Lifting: When Workflows Beat Bigger Models

Opening — Why this matters now Everyone wants LLMs to think harder. Enterprises, however, mostly need them to think correctly — especially when optimization models decide real money, real capacity, and real risk. As organizations scale, optimization problems grow beyond toy examples. Data spills into separate tables, constraints multiply, and naïve prompt‑to‑solver pipelines quietly collapse. ...

Speculate Smarter, Not Harder: Hierarchical Decoding Without Regret

Opening — Why this matters now LLM inference has quietly become the dominant cost center of modern AI systems. Training grabs headlines; inference drains budgets. As models scale into the tens of billions of parameters, every additional forward pass hurts — financially and operationally. Speculative decoding promised relief by letting small models run ahead and big models merely verify. But verification, ironically, became the bottleneck. ...