Cover image

When Rewards Learn Back: Evolution, but With Gradients

Opening — Why this matters now Reinforcement learning has always had an uncomfortable secret: most of the intelligence is smuggled in through the reward function. We talk about agents learning from experience, but in practice, someone—usually a tired engineer—decides what “good behavior” numerically means. As tasks grow longer-horizon, more compositional, and more brittle to specification errors, this arrangement stops scaling. ...

December 16, 2025 · 4 min · Zelina
Cover image

Textual Gradients and Workflow Evolution: How AdaptFlow Reinvents Meta-Learning for AI Agents

From Static Scripts to Living Workflows The AI agent world has a scaling problem: most automated workflow builders generate one static orchestration per domain. Great in benchmarks, brittle in the wild. AdaptFlow — a meta-learning framework from Microsoft and Peking University — proposes a fix: treat workflow design like model training, but swap numerical gradients for natural language feedback. This small shift has a big implication: instead of re-engineering from scratch for each use case, you start from a meta-learned workflow skeleton and adapt it on the fly for each subtask. ...

August 12, 2025 · 3 min · Zelina