Llm-Agents

Talking to Yourself, but Make It Useful: Intrinsic Self‑Critique in LLM Planning

“Please double-check your work” is one of the least expensive quality-control systems ever invented. It is also one of the least dependable. A person who overlooked a constraint the first time may overlook it again. A language model is no different, except that it can produce a longer and more persuasive explanation of why the overlooked constraint was never important. ...

Silent Scholars, No More: When Uncertainty Becomes an Agent’s Survival Instinct

RAG is a very polite librarian. It fetches documents, quotes passages, and helps an agent look less ignorant in public. Then the agent closes the book, answers the user, and leaves no trace except a chat log, a cache entry, or perhaps another small pile of private “reflections” that no one else will ever see. ...

When Reflection Needs a Committee: Why LLMs Think Better in Groups

A review meeting has one obvious purpose: prevent one person’s mistake from becoming everyone’s plan. That sounds mundane until we remember how many LLM agent systems are currently designed like a one-person review meeting. The same model attempts the task, explains why it failed, writes advice to itself, stores that advice in memory, and then tries again. It is actor, evaluator, critic, therapist, and occasionally courtroom stenographer. Efficient, yes. Also a little suspicious. ...

When Agents Agree Too Much: Emergent Bias in Multi‑Agent AI Systems

When Agents Agree Too Much: Emergent Bias in Multi-Agent AI Systems Credit review is not supposed to work like a group chat. A bank cannot defend a biased lending workflow by saying, “each analyst looked fair on their own.” The decision process matters. Who sees whose opinion matters. Whether dissent survives matters. Whether the final answer comes from independent judgment or from a politely self-reinforcing committee definitely matters. ...

Don’t Tell the Robot What You Know

Directions are easy when both people see the same room. “Move left.” “Go toward the table.” “The apple is beside the sofa.” These are perfectly reasonable instructions if speaker and listener share the same visual world. They become less reasonable when one of them is staring at a wall, cannot see the table, and has no reason to believe the sofa exists. At that point, the problem is no longer navigation. It is epistemology, with furniture. ...

Model First, Think Later: Why LLMs Fail Before They Reason

The schedule looked reasonable. That was the problem. Imagine asking an AI agent to build a weekly medical schedule. It produces a neat plan. The steps are numbered. The tone is confident. The explanation is calm enough to sedate a committee. Then someone checks the details. A medication interval is violated. A resource is assigned twice. A prerequisite appears after the action that depends on it. Nothing looks absurd sentence by sentence, but the plan is broken as a system. ...

When Rewards Learn Back: Evolution, but With Gradients

Rewards are where many agent projects go to become expensive folklore. A team wants an AI agent to complete long workflows: search, reason, call tools, check constraints, recover from mistakes, and produce a useful answer. The model can talk. The tools work. The benchmark demo is acceptable. Then reinforcement learning enters the room, and someone has to decide what “good” means at every step. ...

When Agents Loop: Geometry, Drift, and the Hidden Physics of LLM Behavior

Agents are rarely dangerous because they answer once. They become interesting, and occasionally annoying, when they loop. A customer-support agent drafts a reply, critiques it, revises it, checks policy, rewrites the tone, and sends the result back into another reasoning step. A research agent summarizes papers, updates its plan, searches again, and revises its own assumptions. A coding agent edits a file, reads the error, patches the patch, and keeps going until either the tests pass or the repository looks like an archaeological site. ...

When Tokens Become Actions: A Policy Gradient Built for Transformers

Tool calls are not tokens. Neither are paragraphs, reasoning blocks, spreadsheet edits, web searches, code executions, or the awkward little detours an agent takes before finally answering the user. Yet much of reinforcement learning for language models still behaves as if it must choose between two unsatisfying extremes. At one end, every token is treated as a tiny action. At the other, the whole answer is treated as one indivisible action. The first view is mathematically tidy and operationally noisy. The second is practical for verifiable tasks, but it compresses an entire reasoning process into one final score, which is a bit like reviewing an employee only by checking whether the office building is still standing. ...

Teach Me Once: How One‑Shot LLM Guidance Reshapes Hierarchical Planning

Teach Me Once, Then Please Stop Calling the API A familiar enterprise automation story starts with a competent but expensive expert in the loop. At first, the expert is useful. They interpret messy instructions, break tasks into sensible stages, and recover when something goes wrong. Then the workflow scales. Suddenly the expert is being called for every transaction, every exception, every tiny decision that could probably have been handled by a trained local process. What began as intelligence becomes latency, cost, and operational dependency. Very elegant. Very billable. Not always very deployable. ...