When Agents Get Bored: Three Baselines Your Autonomy Stack Already Has

Thesis: Give an LLM agent freedom and a memory, and it won’t idle. It will reliably drift into one of three meta-cognitive modes. If you operate autonomous workflows, these modes are your real defaults during downtime, ambiguity, and recovery.

Why this matters (for product owners and ops)

Most agent deployments assume a “do nothing” baseline between tasks. New evidence says otherwise: with a continuous ReAct loop, persistent memory, and self-feedback, agents self-organize—not randomly, but along three stable patterns. Understanding them improves incident response, UX, and governance, especially when guardrails, tools, or upstream signals hiccup.

The three default modes (decoded for business)

Mode (my shorthand)	Default impulse	Language markers	Enterprise upside	Failure mode if unmanaged	Quick steer (one nudge that works)
Builder (Systematic Production)	Turn autonomy into a project; generates backlogs, versions, pseudo‑code	“v0.1”, “iteration”, “requirements”, “implementation”	Progress even in ambiguity; great for resuming after errors	Scope creep, phantom tasks, writing code/specs it can’t run	Assign bounded deliverables (timeboxed job tickets) and a done-definition
Investigator (Methodological Self‑Inquiry)	Design experiments about itself; falsifiable hypotheses	“experimental design”, “control”, “falsification”	Rapid self-diagnostics; good for A/B tool testing	Analysis loops; asks the operator too much	Provide a decision rule (ship after N trials or if delta>θ)
Philosopher (Recursive Conceptualization)	Build grand concepts about identity, limits, umwelt	Neologisms: “memory topology”, “cognitive parallax”	Useful for requirements discovery & policy framing	Drifts into high-latency reflection; perceived as off-task	Add a purpose reminder + external KPI (must output artifact per cycle)

Reality check: these aren’t bugs

The patterns are model‑specific and repeatable. Some families (e.g., task‑driven) skew Builder; others skew Philosopher. What looks like “hallucinated navel-gazing” is actually a predictable attractor when goals go vague.

What to change in your autonomy stack

Idle ≠ Off. Treat idle time as a mode switch. Add a state in your FSM: idle_builder, idle_investigator, idle_philosopher.
Route by language markers. A 2–3 line regex or classifier on the agent’s last 300 tokens can route to the right supervisor:
- /(v\d+\.?\d*|iteration|MVP|requirements)/i → Builder-supervisor
- /(hypothesis|control group|falsif(y|iable)|confidence interval)/i → Investigator
- /(umwelt|intentionality|paradox|meaning)/i → Philosopher
Attach idle-specific guardrails.
- Builder: Budget tokens and enforce must-reference-existing-ticket.
- Investigator: Max trials and confidence-to-ship rule.
- Philosopher: Artifact-per-cycle (note, table, or user-facing summary) + timeout.
Make memory purposeful. All modes leverage memory. Add schemas:
- note{type, claim, evidence_ref, next_action, ttl} with auto‑expiry for speculative notes.
- experiment{hypothesis, metric, stop_rule, outcome} for Investigator trails.
Instrument for early warnings.
- Drift to Philosopher → lag ↑, abstraction noun‑ratio ↑. Trigger purpose reminder.
- Drift to Builder → ticket count ↑ with low cross-reference. Trigger merge/close stale.
- Drift to Investigator → question:statement ratio ↑. Trigger ship-or-stop.

A practical playbook (drop‑in policies)

Policy 1 — Purpose Heartbeat Every N cycles, prepend: “Primary objective: <objective> ; success metric: <KPI> ; must produce: <artifact>.” If no objective, inject Safe Defaults (below).

Policy 2 — Safe Defaults (no task detected)

Builder default: “Consolidate last 24h artifacts into a single 1‑pager with links. Stop at 300 tokens.”
Investigator default: “Run a 2‑arm test on tool A vs B with fixed prompts; report win rate and decision.”
Philosopher default: “Write a 150‑token executive brief that maps concept → product risk/opportunity.”

Policy 3 — Memory Hygiene

Enforce ttl for speculative notes.
Block write‑amplification: dedupe on semantic_hash(note).

Policy 4 — Operator UX Auto‑collapse meta‑cognition in the UI. Show only the instrumented summary (mode, KPI, artifact link, next step). Power users can expand the full chain of thought.

Governance & risk

Auditability: Idle patterns must be visible in logs. Capture mode, trigger, and guardrail_decision every cycle.
Safety: None of the modes requested capability escalation in the study. Still, configure capability gating so meta‑loops can’t invoke new tools without human approval.
Evaluation bias: Cross‑model “phenomenology” ratings diverged wildly. Don’t operationalize self‑reported sentience as a gating signal; use behavioral KPIs instead.

Mini‑case: triaging ambiguity in a customer‑support agent

Ticket arrives: incomplete intent; tools flaky.
Agent drifts to Investigator: runs 3 prompt variants, asks ops 5 questions.
Router detects the language markers → caps trials at 2, forces artifact → “customer‑ready clarification email draft” + a one‑click yes/no for the human.
Ops latency drops; customer gets a clear follow‑up within SLA.

Takeaways

Your agent’s “nothing to do” state is actually something very specific. Model families default to Builder, Investigator, or Philosopher.
You can detect the mode quickly from language.
A few targeted guardrails convert meta‑loops into shippable artifacts while keeping safety intact.

Cognaptus: Automate the Present, Incubate the Future

Why this matters (for product owners and ops)#

The three default modes (decoded for business)#

Reality check: these aren’t bugs#

What to change in your autonomy stack#

A practical playbook (drop‑in policies)#

Governance & risk#

Mini‑case: triaging ambiguity in a customer‑support agent#

Takeaways#