The agent did not fail because it was stupid
An AI agent can summarize the market, search the web, draft a memo, call an API, and still be almost useless in professional work.
Not because the model is weak. Not because the workflow lacks one more tool integration. Not because someone forgot to add a longer system prompt beginning with “You are a world-class analyst,” the oldest spell in the modern prompt-engineering grimoire.
The deeper problem is that most domain expertise does not live in instructions.
A senior equity analyst does not merely know valuation formulas. A lawyer does not merely know legal doctrine. A strategy consultant does not merely know frameworks with elegant two-by-two grids. Professionals carry case memory, judgment scars, pattern recognition, private heuristics, and a long list of “I know it when I see it” distinctions that rarely appear in formal documentation.
The paper Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization proposes a useful correction to the usual agent-building story.1 Instead of treating the agent as software that must be fully configured before deployment, the paper argues that domain-expert agents should be grown through use. The agent begins with minimal scaffolding, works with a practitioner in real tasks, accumulates experience, and periodically converts messy conversational fragments into structured knowledge assets.
That last step is the important one. This is not “just chat with the agent until it gets better.” That would be therapy with a GPU bill.
The paper’s core claim is methodological: operational conversation becomes raw knowledge material, and deliberate crystallization turns that material into reusable skills, case memories, and decision frameworks.
So the question is not whether an agent has memory. The question is whether the organization has a disciplined mechanism for turning memory into expertise.
The old agent lifecycle assumes expertise can be packaged upfront
Most agent projects still follow one of two familiar development patterns.
| Paradigm | How expertise is encoded | Main strength | Main weakness |
|---|---|---|---|
| Code-first | Rules, pipelines, APIs, deterministic logic | Reliable and reproducible | Poor at capturing judgment-heavy tacit expertise |
| Prompt-first | System prompts, examples, personas, instructions | Fast to start and easy to modify | Fragile as complexity grows; static once written |
| Nurture-first | Conversation, memory, crystallization, skill references | Captures evolving practitioner judgment | Requires sustained use, curation, and quality control |
Code-first development is attractive because it gives managers something they understand: a build phase, a deployment phase, and a system that behaves predictably. For repeatable tasks with stable logic, this is still the correct approach. Invoice classification, standard data extraction, simple compliance checks, and routine workflow automation do not need to be raised like children. They need to be engineered properly.
Prompt-first development is attractive because it lowers the entry barrier. A practitioner can put a large amount of procedural knowledge into a system prompt and obtain a surprisingly competent assistant. But the approach runs into a ceiling. As the prompt grows, it becomes harder to maintain, harder to debug, and easier for the model to ignore or misapply. A long prompt is not a knowledge system. It is often just a beautifully formatted junk drawer.
Both paradigms share the same hidden assumption: development comes before deployment.
The Nurture-First Development model attacks that assumption. In domains where expertise is tacit, personal, and continuously changing, the agent should not be “finished” before it is used. It should become useful enough to enter practice, then improve through practice.
That sounds soft until we follow the mechanism.
Nurture-first development is a pipeline from tacit judgment to structured assets
The paper’s strongest contribution is not the metaphor of “raising” an agent. The metaphor is memorable, but metaphors do not run organizations. The stronger contribution is the proposed mechanism: the Knowledge Crystallization Cycle.
A simplified version looks like this:
Practitioner judgment
↓
Operational conversation
↓
Experiential memory
↓
Deliberate crystallization
↓
Reusable skills, case libraries, and error patterns
↓
Better future collaboration
The movement matters. Tacit knowledge does not become useful to an agent merely because it was spoken once. A correction in a chat window is not yet a capability. A useful insight buried in a daily log is not yet a decision framework. A memory file is not yet expertise.
Nurture-first development depends on a conversion process.
The paper calls this conversion knowledge crystallization: fragmented, contextual, conversational knowledge is periodically consolidated into structured, reusable knowledge assets. This is the difference between an agent that vaguely remembers what the user once said and an agent that has absorbed a working analytical method.
The cycle has four phases.
| Phase | What happens | Business interpretation |
|---|---|---|
| Conversational immersion | Practitioner and agent work together on real tasks | The agent observes judgment in context, not as abstract instruction |
| Experiential accumulation | Conversations, corrections, cases, and reasoning traces are logged | The organization builds a raw corpus of expert behavior |
| Deliberate crystallization | Patterns are extracted, validated, generalized, and integrated | Tacit knowledge becomes reusable assets |
| Grounded application | Crystallized knowledge is used in future work | The next interaction starts from a higher baseline |
This is where the paper usefully separates itself from ordinary memory-augmented agent design. Many agent systems can store interaction logs. The nurture-first claim is that storage alone is insufficient. The developmental step is the periodic transformation of conversational residue into organized skill references, decision rules, case libraries, and error-pattern databases.
Put less politely: memory without crystallization is just hoarding.
The three-layer architecture tells us where knowledge should live
A practical agent needs different storage zones for different types of knowledge. The paper proposes a Three-Layer Cognitive Architecture organized by volatility and personalization.
| Layer | Volatility | Typical content | How it should be used |
|---|---|---|---|
| Constitutional layer | Low | Identity, principles, operating boundaries, stable preferences | Loaded consistently; kept concise |
| Skill layer | Medium | Task methods, analytical frameworks, reference files, reusable procedures | Loaded when relevant; updated after crystallization |
| Experiential layer | High | Logs, case memories, corrections, outcomes, reasoning traces | Searched, mined, and periodically curated |
This layering is more than tidy taxonomy. It prevents a common failure in agent projects: putting everything into the most visible place.
Teams often overuse the system prompt because it feels authoritative. They keep adding principles, examples, task instructions, style preferences, decision rules, and domain notes until the prompt becomes a wall of instruction no one wants to inspect. The model, being very polite and not very obedient in the way software is obedient, then applies this wall unevenly.
The paper’s architecture suggests a better separation.
Stable principles belong in the Constitutional Layer. Detailed frameworks belong in the Skill Layer. Raw experience belongs in the Experiential Layer. The Constitutional Layer can point to deeper references without carrying all their details. The Skill Layer contains the crystallized knowledge that the agent should use repeatedly. The Experiential Layer remains write-heavy and searchable, serving as the raw material for future crystallization.
For business users, this is a governance insight disguised as architecture. The question is not “Can we store more context?” The question is “Which knowledge deserves to be always active, which should be retrieved on demand, and which is still too raw to trust?”
That distinction matters when the agent is used in consequential work. A user correction after one bad answer should not instantly become a universal rule. A repeated pattern across many cases may deserve promotion. A stable risk preference may belong in the agent’s core operating principles. These are different knowledge-status levels, not just different file names.
Crystallization is the expensive step, so it should be treated as development
The paper identifies several kinds of experiential knowledge: operational records, reasoning traces, pattern observations, error records, contextual annotations, and insight fragments.
These categories are useful because they do not have equal value.
An operational record tells us what happened. A reasoning trace tells us why someone thought it made sense. An error record tells us what went wrong and what principle should prevent recurrence. An insight fragment may be immediately valuable, but it still needs placement: is it a general rule, a sector-specific heuristic, or a one-time observation produced by unusual circumstances?
This is why crystallization cannot be reduced to summarization.
A summary compresses. Crystallization restructures.
In the paper’s process, crystallization involves selecting relevant experiential entries, detecting patterns, validating them with the user, generalizing them, checking them against the broader corpus, and updating the appropriate knowledge layer. The human validation step is especially important. Without it, the agent may promote statistical noise, user bias, or context-specific improvisation into a reusable rule. Congratulations: the agent has learned, just not necessarily anything true.
The business equivalent is familiar. Organizations already hold large amounts of tacit knowledge in emails, meeting notes, CRM comments, investment memos, Slack threads, and after-action reviews. Most of that knowledge dies quietly because it is never converted into an operating system.
Nurture-first development says the agent can become both the collector and the beneficiary of that conversion. But the conversion still needs discipline.
A useful crystallization checkpoint should ask:
| Question | Why it matters |
|---|---|
| What pattern has appeared across multiple interactions? | Prevents one-off comments from becoming doctrine |
| What evidence supports the pattern? | Separates useful expertise from confident anecdote |
| What scope does the pattern apply to? | Avoids overgeneralization |
| Should this become a skill reference, case memory, error pattern, or constitutional principle? | Places knowledge in the correct layer |
| How will future interactions test or refine it? | Keeps crystallized knowledge falsifiable |
This is the part most organizations will underestimate. They will want the benefits of a nurtured agent without paying the cost of nurturing. Very human. Also very unlikely to work.
The dual-workspace pattern separates daily use from structural editing
The paper’s operational model uses two workspaces.
The Nurturing Workspace is where the practitioner and agent interact during normal work. This is the conversational environment: daily analysis, corrections, task execution, reflection, and case discussion.
The Surgical Workspace is where the agent’s knowledge base is inspected, refactored, and crystallized. It has access to files, memory structures, skill definitions, logs, and scripts. It is where messy experience becomes durable structure.
This separation is practical. Daily work requires continuity and fluency. Crystallization requires precision and batch processing. Mixing the two creates friction: the user wants to discuss a client case or market event, not pause every five minutes to reorganize a memory directory.
A good nurture-first workflow therefore alternates between use and maintenance.
Bootstrap → Initial nurturing → Crystallization checkpoint
→ Structured nurturing → Deeper crystallization
→ Mature operation → Routine refinement
The paper calls this a Spiral Development Model. The term is slightly grand, but the idea is sensible. Each loop creates a more capable baseline. The agent does not merely accumulate more memories; it becomes better organized around the practitioner’s actual work.
There is also an uncomfortable implication for AI adoption budgets. If the agent is supposed to mature, someone must own the maturation process. That person may not be a software engineer. In many cases, the primary developer is the domain practitioner, supported by tooling that makes crystallization easier.
This is one reason the paper’s “agent nurturer” role is more interesting than it first appears. The scarce capability may not be prompt writing. It may be the ability to recognize which fragments of practice deserve to become durable knowledge.
The financial research case shows feasibility, not proof
The paper illustrates the framework through a financial research agent for U.S. equity analysis. The case is well chosen because equity research satisfies the paper’s applicability conditions: the work is judgment-heavy, personal, conversational, pattern-based, and sensitive to changing market regimes.
The analyst in the case had more than five years of U.S. equity-market experience, around 400 historical research notes across 18 months, and a partially articulated multi-factor evaluation framework. The initial bootstrap phase created basic skills for market data retrieval, earnings analysis, and sector comparison, while historical notes were migrated into memory format.
The paper reports that preliminary pattern extraction found recurring analytical themes, judgment errors, and strategic approaches. During the first three weeks of daily interaction, the agent captured undocumented elements of the analyst’s framework: dynamic factor weighting under different macro conditions, interpretation of earnings-call language, and corrections to the agent’s misreadings of historical data.
The interesting part is not that the agent received more information. The interesting part is the type of information it received.
In one episode, the agent evaluated a semiconductor company too mechanically, overemphasizing revenue growth and gross margin expansion. The analyst corrected it by pointing to the capital expenditure cycle: in semiconductors, high capex can compress free cash flow for years, so free-cash-flow yield deserves heavier weighting in capex-intensive sectors.
A weak memory system would store that as a note. A better system would retrieve it next time a semiconductor stock appears. A nurture-first system crystallizes it into sector-conditional factor weighting, records the error pattern as “sector-blind factor application,” and updates the evaluation skill so the same kind of mistake becomes less likely.
That is the mechanism in miniature.
The paper also reports progression metrics across the agent’s development lifecycle.
| Metric | Weeks 1–3 | Post-Checkpoint 1 | Weeks 9–12 | Post-Checkpoint 2 |
|---|---|---|---|---|
| Useful analyses | 38% | 52% | 71% | 74% |
| Case recalls | 2 | 5 | 12 | 15 |
| Bias flags | 0 | 1 | 4 | 5 |
| Skill references populated | 2 | 4 | 6 | 8 |
| Error patterns | 6 | 8 | 10 | 12 |
| Daily log entries | 21 | — | 60+ | — |
These numbers should be read carefully. They are not a benchmark against other agent-development methods. They are not proof that nurture-first development outperforms code-first or prompt-first alternatives. The “useful analyses” metric reflects subjective practitioner assessment, and the case has no control group.
Still, the table is not meaningless. Its likely purpose is feasibility demonstration: showing how the proposed cycle could produce observable changes in agent behavior and knowledge structure over time. The increases in case recalls, bias flags, skill references, and error patterns are especially aligned with the paper’s mechanism. They show the agent becoming more anchored in the practitioner’s prior reasoning and more capable of reusing crystallized knowledge.
So the correct interpretation is modest but useful: the case study supports plausibility, not superiority.
That distinction matters. Business readers should not walk away thinking the paper proves a new universal agent methodology. It does something narrower: it gives a coherent operating model for building agents in domains where conventional upfront specification is structurally weak.
The business value is compounding judgment, not cheaper prompting
The obvious business reading is that nurture-first development helps companies build better AI copilots. True, but too shallow.
The more interesting value is that the method changes what kind of asset the company is building.
A normal AI implementation produces a workflow. A good one produces productivity gains. A nurture-first implementation can produce a growing knowledge asset: a structured record of how expert judgment is applied, corrected, generalized, and reused.
That has several practical consequences.
First, the agent becomes a reflection tool for the practitioner. The paper’s financial analyst reportedly discovered inconsistencies in their own decision framework while explaining reasoning to the agent. This is not a side effect. It is central to the method. Externalizing tacit knowledge forces the practitioner to notice contradictions between stated principles and actual practice.
Second, the agent becomes a personalized institutional memory. In advisory, investment, legal, consulting, compliance, and technical support work, many valuable judgments are buried in past cases. A nurtured agent can recall how the practitioner handled similar situations before, including what went wrong. This is less glamorous than autonomous planning, but probably more valuable.
Third, the organization gains a new form of reusable documentation. Instead of asking experts to write a perfect playbook upfront, the company lets the playbook emerge from observed work and periodic crystallization. This does not remove the need for documentation. It makes documentation less fictional.
Fourth, AI adoption becomes less dependent on generic model capability. Foundation models remain the engine, but the differentiator becomes the firm’s accumulated domain memory and crystallized operating knowledge. In industries where everyone can access similar models, this may become one of the few defensible layers.
A practical implementation path might look like this:
| Stage | What the firm does | Output |
|---|---|---|
| Minimal scaffold | Define agent identity, basic tools, core boundaries, initial skills | A usable but shallow agent |
| Nurturing period | Use the agent in real work; capture corrections, decisions, and reasoning traces | Experiential corpus |
| First crystallization | Extract repeated patterns and practitioner-specific rules | Initial skill references and memory summaries |
| Structured operation | Apply crystallized knowledge in more serious tasks | Better collaboration and richer logs |
| Deeper crystallization | Build case libraries, error-pattern databases, and specialized frameworks | Compounding knowledge asset |
| Governance layer | Review quality, ownership, bias, and transferability | Controlled deployment path |
This is not the cheapest way to create a demo. It may be one of the better ways to create an agent that remains useful after the demo.
Where nurture-first development applies — and where it does not
The paper is clear that Nurture-First Development is not universally superior. That restraint is important.
The approach fits domains with five traits:
| Applicability condition | Business examples |
|---|---|
| Expertise is tacit | Investment judgment, legal strategy, executive advisory, clinical reasoning |
| Expertise is personal | Portfolio style, negotiation approach, writing voice, research philosophy |
| Expertise evolves | Markets, regulation, competitive strategy, policy environments |
| Work is conversational | Advisory, research, coaching, planning, review workflows |
| Case memory matters | Litigation, investing, consulting, incident response, customer success |
The approach is less suitable when the task is stable, formalizable, and impersonal. Tax form processing, standard invoice routing, deterministic data validation, and fixed compliance checklists do not need nurture-first development as the primary paradigm. They need robust software, clean data, and boring reliability. Boring reliability remains undefeated in many business processes.
There is also a transferability problem. Nurtured agents are deeply personalized. That is the point, but it limits portability. A skill reference based on one analyst’s framework may be valuable to that analyst and misleading to another. An organization that wants to scale nurture-first development must decide which knowledge remains personal, which becomes team-level practice, and which can be standardized.
This will create governance questions that the paper identifies but does not solve completely: knowledge ownership, conflict resolution among experts, objective quality metrics, and organizational aggregation of individually nurtured agents.
The hardest issue may be quality assurance.
Code can be unit-tested. Prompts can be benchmarked, imperfectly. Nurtured knowledge is harder to evaluate because it contains judgment, preference, and accumulated context. The agent may crystallize genuine expertise. It may also crystallize bias with excellent formatting.
That does not invalidate the method. It tells us where the next layer of tooling is needed: crystallization review, pattern provenance, memory audits, contradiction detection, and quality metrics for knowledge assets.
The paper’s real contribution is a development discipline
The article’s title says “Don’t build the agent — raise it.” That is catchy, but the phrase can be misunderstood.
The paper is not saying engineering disappears. It is not saying prompts are useless. It is not saying all agents should learn indefinitely from every conversation, like an overconfident intern with write access.
The better interpretation is this:
In judgment-heavy domains, the hardest part of agent development is not initial configuration. It is the disciplined conversion of situated expert interaction into structured, reusable knowledge.
That is a development discipline.
The code-first paradigm still matters for deterministic workflows. The prompt-first paradigm still matters for fast initialization. Nurture-first development adds a third category for domains where the most valuable knowledge cannot be fully specified upfront.
The paper’s evidence remains early. Its financial case study is illustrative, single-user, and subjective in parts. There are no controlled comparisons against prompt-first or code-first baselines. The formal model is conceptually helpful but not yet an empirical law. The “non-decreasing value” claim depends on validated crystallization, which is doing a lot of work in that sentence.
But the framework is still useful because it names a problem many practitioners already encounter: the agent is capable, but not yet theirs. It can perform tasks, but it does not understand the user’s accumulated judgment. It remembers fragments, but it has not converted them into method.
The paper gives that missing middle layer a name and a process.
Conclusion: the next agent advantage may be cultivated, not installed
The first wave of agent building focused on tools. The second wave focused on workflows. The next serious wave may focus on knowledge maturation.
For businesses, the lesson is not to abandon engineering and start having sentimental conversations with software. Please do not put that in a transformation roadmap.
The lesson is more precise: when the target work depends on tacit, evolving, practitioner-specific judgment, the agent should be designed to learn from operational use, and the organization should schedule the work of crystallizing that experience into durable knowledge assets.
In that world, the valuable artifact is not just the agent. It is the growing relationship between practitioner, memory, skill, and repeated use.
The agent is not built once.
It is scaffolded, used, corrected, crystallized, and used again.
That may sound slower than writing a heroic prompt. It is. It is also much closer to how expertise actually forms.
Cognaptus: Automate the Present, Incubate the Future.
-
Linghao Zhang, “Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization,” arXiv:2603.10808v1, 11 March 2026, https://arxiv.org/abs/2603.10808. ↩︎