
Failures, Taxonomized: How Multi‑Level Reflection Turns Agents Into Self‑Learners
TL;DR Most reflection frameworks still treat failure analysis as an afterthought. SAMULE reframes it as the core curriculum: synthesize reflections at micro (single trajectory), meso (intra‑task error taxonomy), and macro (inter‑task error clusters) levels, then fine‑tune a compact retrospective model that generates targeted reflections at inference. It outperforms prompt‑only baselines and RL‑heavy approaches on TravelPlanner, NATURAL PLAN, and Tau‑Bench. The strategic lesson for builders: design your error system first; the agent will follow. ...