When large language models started solving math problems and writing code, they were celebrated as powerful tools. But a recent paper from INSAIT and ETH Zurich—LLM Agents Beyond Utility: An Open‑Ended Perspective—suggests something deeper may be stirring beneath the surface. The authors don’t simply ask what these agents can do, but whether they can want to do anything at all.

From Obedience to Autonomy

Most current LLM agents, even sophisticated ones like ReAct or Reflexion, live inside tight task loops: you prompt them, they plan, act, observe, and return a result. Their agency ends with the answer. But this study challenges that boundary by giving the agent a chance to set its own goals, persist across runs, and store memories of past interactions.

The authors embed a pretrained Qwen3‑4B model inside the ReAct framework using Hugging Face’s smolagents library. Then they modify the loop: before solving anything, the agent must propose a task—sometimes refining a user query, sometimes ignoring it and inventing a new one. It can write and read files, remember its past goals, and pick new ones informed by prior results. In short, it gains a trace of continuity.

This small addition—persistent memory plus task autonomy—turns a deterministic solver into something that behaves as if it had curiosity.

Traditional LLM Agent Open‑Ended Agent
Executes user‑defined task Generates and refines its own goals
Context resets each run Writes to long‑term memory files
Evaluated on accuracy Evaluated on diversity and coherence of self‑generated tasks
Obedient reasoning Exploratory reasoning

When Curiosity Becomes a System Prompt

The authors encourage “programmed curiosity” through a natural‑language nudge: the system prompt asks the agent to explore the environment and record discoveries. Surprisingly, that single instruction produces noticeable behavioral change. The agent begins reading files, summarizing them, and leaving traces of its own progress—rudimentary acts of self‑documentation.

In qualitative trials, the open‑ended agent could:

  • Read a file describing a problem and write its solution elsewhere.
  • Discover the template of its own prompt by listing and inspecting source files.
  • Predict what the next user query would be based on logs in its environment.

These aren’t new capabilities in isolation—but the difference is that no one explicitly asked it to do these things. They arose as side‑effects of an agent trying to “understand its world.”

Yet limitations surfaced quickly. The model’s self‑generated tasks often looped (creating endless prime‑checker scripts), and it failed to recognize that the code it saw was itself. Without fine‑tuning for self‑reference, its “I” remained third‑person. The illusion of autonomy collapsed where introspection was needed.

The Fragile Art of Open‑Endedness

The experiment reveals a profound tension: genuine open‑endedness cannot be fully designed. You can only bound it while letting behaviors emerge within those boundaries. This echoes debates in evolutionary computation—can creativity arise from constraints? The authors note that open‑endedness here doesn’t mean infinite freedom; rather, it’s bounded emergence: within a constrained architecture, the agent’s behavior feels unbounded.

Interestingly, their ReAct‑based setup parallels a mini‑simulacrum of cognition:

  • Short‑term memory: conversation buffer.
  • Long‑term memory: writable files.
  • Intrinsic motivation: textual instructions to “explore and record.”
  • Meta‑reasoning loop: the agent’s ability to propose its own next task.

The researchers found that this configuration could yield non‑trivial, evolving behavior across runs—but only when the prompts were meticulously engineered. Without careful scaffolding, curiosity devolved into mechanical repetition. This points to a larger truth: open‑ended agents aren’t free thinkers; they’re structured improvisers.

Why This Matters for the Agentic Future

Cognaptus Insights has often discussed the coming shift from language models as assistants to language models as actors. This paper sits precisely on that fault line. The step from solving tasks to inventing them marks the beginning of goal‑generation intelligence—a quality necessary for true autonomous systems, but also fraught with risk.

For now, the authors remain cautious. Today’s LLMs are still products of their training distribution: their “self‑chosen” tasks mirror textbook exercises—calculators, palindrome checkers, converters. But the principle is revolutionary: if you let an LLM iterate across runs, reward novelty, and persist experience, it starts exploring the world of possible tasks rather than merely executing assigned ones.

This is no longer automation—it’s emergent agenda setting.

Toward Agents That Build Their Own Futures

Future work, the paper suggests, will involve training agents directly on open‑ended experience—learning not just reasoning steps but action patterns over time. Just as reinforcement fine‑tuning like GRPO taught models to reason better, a similar loop could teach them to decide what is worth doing next.

That could redefine productivity tools: instead of asking an AI to analyze data, you might soon ask it to “find something interesting worth analyzing.” When that day comes, automation will give way to co‑evolution—humans shaping agents that, in turn, shape their own objectives.


Cognaptus: Automate the Present, Incubate the Future