Prompt Optimization

Experience Is Not Memory: Why Learning Agents Need a Better Feedback Loop

A support ticket goes wrong. A workflow agent chooses the wrong tool. A finance assistant misses a procedural step. The usual response is familiar: add the failure to memory, rewrite a prompt, perhaps ask the agent to “reflect” before trying again. This is useful, in the same way that putting a sticky note on a broken machine is useful. It may prevent the same mistake next time. It does not prove the machine has learned how to improve. ...

Reflection in the Dark: When Prompt Optimization Forgets to Think

A prompt fails. The optimizer reflects. The prompt changes. The score moves. This is the part where everyone is supposed to feel comforted. A self-improving system has looked at its mistake and revised itself. Very modern. Very agentic. Very convenient. The less comforting possibility is that the system has not understood the mistake at all. It has simply rewritten the prompt around the nearest explanation it can imagine. The score may improve, stagnate, or fall, but the optimizer still cannot answer the most basic operational question: what exactly did we just fix? ...

Thoughts in Motion: From Static Prompts to Self-Optimizing Reasoning Graphs

A workflow looks harmless until it starts waiting on itself. One LLM call asks for a plan. Another evaluates the plan. A third revises the result. A fourth retrieves evidence. Somewhere in the middle, three subtasks could have run at the same time, two repeated calls could have been reused, and one prompt should probably have been tuned before anyone proudly called the system “agentic.” Instead, the whole thing runs as a neat little chain: expensive, slow, and quietly brittle. Very elegant, in the way a traffic jam is elegant if viewed from a drone. ...

ResMAS: When Multi‑Agent Systems Stop Falling Apart

Agent teams fail in a very ordinary way. One agent misreads a question. Another repeats the wrong answer with more confidence. A third receives both versions, performs a tiny ceremony of “collaboration,” and returns something that looks more polished than the original error. Management sees five agents instead of one and assumes redundancy has arrived. It has not. Sometimes it is just a committee with better stationery. ...

Numbers Need Narration: Making LLMs Do Reasoning‑Intensive Regression

TL;DR for operators Many AI workflows do not need a yes-or-no judgment. They need a number: how well did this answer follow the instruction, how far did this reasoning trace remain valid, how much better is answer A than answer B, how strong is this essay, how risky is this case, how close is this support call to escalation? ...