Thinking in Libraries: Why Humans (and AI) Solve Hard Problems by Rewriting the Search Space

Templates are usually sold as a convenience feature. Save time. Avoid repetition. Make the next task faster.

That is not wrong. It is just a little shallow, which is how many productivity slogans prefer to travel.

A better way to think about a template, helper function, saved workflow, reusable prompt, or internal operating procedure is this: it changes the search space. It does not merely shorten the final sequence of actions. It changes what counts as an available move.

That distinction matters because hard work is often not hard at the moment of execution. It is hard at the moment of discovery. The expensive part is not always typing the final answer, drawing the final shape, writing the final report, or running the final script. The expensive part is finding a path through too many possible paths.

The arXiv paper “Online library learning in human visual puzzle solving” studies this mechanism in a controlled visual puzzle task where people can create and reuse “helpers”: intermediate constructions that persist across future puzzles.¹ The paper’s contribution is not that people like shortcuts. We knew that, approximately since the first person copied a spreadsheet tab and called it “final_v3_real.” The contribution is more precise: people appear to build reusable abstractions online, under uncertainty, and those abstractions reshape future problem-solving effort.

The business interpretation is not “every company needs more templates.” Companies already have too many templates. Some are useful; some are where good intentions go to become bureaucratic sediment. The sharper lesson is that reusable structures should be judged by how they reduce future search, not only by how many clicks they save in the current task.

The hard part is not the long program; it is finding the program

Many readers will naturally assume that a hard problem is hard because its solution is long. A long procedure means more steps, more time, more chances to make errors. That intuition is sometimes correct, but it misses the central mechanism in this paper.

The authors frame visual puzzle solving as a kind of program induction. A target pattern can be generated by a program composed from primitive shapes and transformations. In the Pattern Builder Task, participants worked on a 10×10 grid using five primitive shapes, three binary transformations, and four unary operations. They could combine primitives into intermediate structures, and crucially, they could save those intermediate structures as helpers for later use.

A helper is not just a saved output. It becomes a new available operand. Once saved, it joins the working vocabulary of the task. A participant who has built an X-shaped pattern does not need to reconstruct the X from primitive diagonals every time. The X can become a unit.

That is the mechanism:

primitive operations
        ↓
intermediate construction
        ↓
saved helper
        ↓
expanded vocabulary
        ↓
compressed future search

The final solution may still look complex. But the learner is no longer searching only in the original primitive space. They are searching in a modified space that now contains previously useful chunks. The puzzle has not become objectively smaller. The solver’s representation of the puzzle has become more useful.

This is why the paper’s title uses “library learning.” A library, in this sense, is a learned collection of reusable abstractions. It is not a folder of documents. It is a changing action vocabulary.

Pattern Builder makes abstraction visible instead of merely inferred

A common difficulty in studying abstraction is that the interesting process happens inside the learner’s head. Researchers may observe the final answer, the time taken, or the error rate, but not the intermediate representational choices. The Pattern Builder Task is designed to expose those choices.

Participants solved 14 target patterns in a fixed order of increasing difficulty. They could create helpers during the task, remove helpers, and reuse helpers across later trials. After the target-solving phase, they entered a free-play phase, where they could create patterns without a target and submit them to a gallery.

This design matters because it separates several things that are often mixed together:

Observed behavior	What it reveals	Why it matters
Creating a helper	The participant chooses to externalize an intermediate structure	Abstraction is treated as an action, not a post-hoc explanation
Reusing a helper	The saved structure becomes part of later problem solving	The representation changes future search
Converging on similar helpers	Different participants discover similar reusable chunks	Some abstractions are strongly invited by the task structure
Free-play reuse	Helpers shape exploration even without a target	Abstraction is not merely reactive to immediate task pressure

That last point is easy to underrate. In many business workflows, reusable components are justified only by near-term efficiency: “Will this save time on the next task?” The free-play phase suggests a broader role. Once people acquire a compositional vocabulary, they may use it to explore new outputs, not only to complete assigned tasks. In business language, libraries do not merely accelerate execution. They can expand the space of things employees can imagine doing.

That is useful. It is also dangerous if the library is bad. A poorly designed template library does not just save the wrong steps. It teaches the organization to see the wrong moves as natural.

Early learners save broadly; experienced learners reuse selectively

The paper reports an online behavioral experiment with 34 recruited participants, of whom four were excluded based on pre-registered disengagement criteria. The final sample contained 30 participants. Each participant solved 14 patterns, giving 420 target-solving trials.

The headline behavioral result is straightforward: participants used helpers, and they used them increasingly efficiently.

Across all trials, participants achieved 92.4% accuracy. Across successful trials, they created 470 helpers in total, averaging 15.7 helpers per participant. Most participants—28 out of 30—created at least one helper.

But the time pattern is the important part. Early in the task, participants created many helpers. The largest helper-creation counts appeared at the beginning and when a new diagonal structure was introduced. Later, participants increasingly reused saved helpers rather than constantly creating new ones.

The proportion of solution steps using saved helpers rose from 21% in the first pattern to 80% in pattern 9 and 87% in pattern 14. A linear regression found a positive trend of 3.54 percentage points per trial.

This is a useful behavioral signature. At the beginning, the learner does not know which abstractions will pay off. So the rational strategy is often broad externalization: save more candidate chunks, keep options open, avoid rebuilding everything from scratch. As the task distribution becomes clearer, the learner can become more selective.

In other words, the learning process has two phases:

Library expansion: create enough candidate abstractions to make future reuse possible.
Library discipline: reuse what proves valuable and stop treating every intermediate object as sacred infrastructure.

A company implementing AI workflow automation often gets this sequence backwards. It tries to standardize the library before users have explored the work. Then everyone pretends the official workflow matches reality, because meetings are cheaper than admitting the map is fictional.

The paper’s evidence suggests a better sequencing principle: allow broad helper creation early, then prune and standardize after reuse patterns become visible.

The key evidence is the search-cost result

The most important result is not that participants created helpers. That part is interesting, but not enough. The stronger result is that human effort tracked model-estimated search complexity better than raw program length.

The authors built computational models for the Pattern Builder Task. A bottom-up search model enumerates programs from primitives, using observational equivalence to prune programs that produce the same output. The library-learning variants add previously solved patterns into the available primitive set, allowing later searches to reuse earlier solutions as atomic components.

The study compares two computational metrics:

Metric	Meaning	Practical interpretation
Program length	Number of primitives in the shortest solution found by Short+Library	How long the final compressed solution is
Nodes expanded	Number of candidate programs evaluated during search	How large the effective discovery process is

The distinction is the center of the paper. Program length is about the final answer. Nodes expanded is about the search needed to find it.

Participants spent an average of 84 seconds per puzzle, with a median of 44.1 seconds. They completed patterns with a median of 3.0 steps. Mean solution time and mean steps were strongly correlated across patterns, with $r = .89$.

When the authors compared human behavior with the Short+Library model, nodes expanded strongly predicted mean solution time ($r = .82$) and mean number of steps ($r = .79$). Trial-level mixed-effects models supported the same relationship: log-transformed nodes expanded significantly predicted both steps and solution time.

Program length behaved differently. Shorter programs were associated with higher success rates ($r = -.67$), so length still mattered for whether people could solve the puzzle. But program length did not reliably predict solution time or number of steps ($r = -.20$, not statistically reliable).

That is the paper’s cleanest correction to the common misconception. A problem can have a compact final representation while still being difficult to discover. Conversely, a reusable abstraction can reduce search even if the final visible output does not look dramatically shorter.

For AI systems, this maps directly onto a familiar engineering problem. A workflow agent that has no library must rediscover procedures from scratch: which API to call, which exception matters, which report format the client expects, which data cleaning step is always needed, which judgment call has already been settled. A workflow agent with a good library does not merely produce shorter outputs. It starts from a better hypothesis space.

The model comparisons are main evidence, not decorative benchmarking

The paper includes several model variants: Baseline, Short, Library, and Short+Library. This is not a leaderboard in the usual AI sense. The comparison helps isolate what library learning contributes.

The Baseline and Short models search from the original primitives. The Library and Short+Library models add solved patterns into the primitive set. The reported result is stark: participants and library models could solve all 14 puzzles by reusing past solutions, while Baseline models could only solve the first six puzzles within the computational budget.

The likely purpose of this comparison is not to claim that the simple library model is a perfect cognitive model. It is a mechanism check. If adding reusable abstractions makes later patterns tractable under a fixed budget, and humans increasingly behave as if they are using reusable abstractions, then library learning becomes a plausible explanatory mechanism.

Paper component	Likely purpose	What it supports	What it does not prove
Baseline vs Library models	Main mechanism comparison	Reuse can make later puzzles tractable within a budget	Humans use exactly the same algorithm
Helper usage trend across trials	Main behavioral evidence	Participants increasingly rely on saved abstractions	The trend would generalize to every task domain
Nodes expanded vs solution time	Main human-model comparison	Human effort tracks estimated search cost	Nodes expanded is the only cognitive cost
Free-play phase	Exploratory extension	Learned helpers shape open-ended exploration	Free play predicts workplace creativity directly
Saving final target patterns	Behavioral convergence test	Participants converge toward reusable complete-pattern helpers	Partial helpers are unimportant

The last row deserves attention. The authors note that participants increasingly saved target patterns themselves: 50.8% in the first seven trials versus 78.5% in the last seven trials. This convergence resembles the library model’s simple heuristic of adding final solutions as new primitives.

But humans may also create partial, anticipatory helpers. The paper explicitly notes that its computational model is limited because it uses complete patterns as new primitives. A richer model would need to capture partial abstraction: saving a useful component before knowing exactly how it will be reused.

That limitation is not a flaw to wave away. It is where the business analogy gets more interesting.

In organizations, the most valuable reusable component is often not the completed deliverable. It is a reusable intermediate: a diagnostic checklist, a data cleaning routine, a due-diligence question set, a prompt scaffold, an escalation rule, a chart-generation script, or a client-specific interpretation frame. Completed reports are examples. Intermediate abstractions are tools.

Free play shows that libraries guide exploration, not only execution

After the 14 target patterns, participants entered a free-play phase. They had at least five minutes, could continue longer, and could submit creations to a gallery.

The results are small but revealing. Participants spent an average of 6.32 minutes in free play. Twenty-seven of 30 submitted at least one pattern, producing 80 creations in total. Of those, 64 were given custom names. Seventeen participants created at least one new helper during free play, creating 146 helpers in total.

The authors also found that the number of helpers created during the task phase predicted whether participants created helpers during free play. The logistic regression coefficient was $\beta = 0.29$, with $p = .005$.

This is not the strongest causal evidence in the paper, but it gives the mechanism a wider shape. Helpers did not disappear when the target disappeared. Participants continued to operate inside the learned compositional space. Some designs were symmetric; others were named figurative patterns such as “city sky line” and “THUMBS UP.”

For business readers, the useful inference is modest but important. A reusable library does not only reduce the cost of assigned work. It can also structure exploration. When analysts have reusable chart templates, data loaders, interpretation frames, and domain-specific prompts, they are more likely to try variations. When operators have modular automation routines, they are more likely to assemble new workflows. When product teams have reusable research and evaluation components, they are more likely to test adjacent ideas.

The library becomes a medium for thinking.

This is why the quality of the library matters. A library full of shallow prompt tricks will make shallow prompt tricks feel like the natural language of the organization. A library full of tested diagnostic routines and domain-specific abstractions will make better reasoning cheaper. That is the difference between “knowledge management” and a shared drive with excellent folder names.

What this means for AI workflow design

The paper directly studies human participants solving visual puzzles. It does not test enterprise AI agents, business-process automation, or knowledge-management systems. Still, the mechanism travels well because many AI workflows face the same structural problem: repeated search under uncertainty.

A useful business translation is this:

Paper mechanism	Business analogue	Design implication
Primitive operations	Raw tools, APIs, prompts, data queries, manual steps	Necessary but insufficient; too primitive a vocabulary forces rediscovery
Helper creation	Saving reusable intermediate structures	Let users externalize useful patterns before formal standardization
Helper reuse	Applying saved abstractions in later tasks	Measure reuse frequency and search reduction, not just asset count
Helper convergence	Many users independently saving similar structures	Candidate for standard workflow component or product feature
Free-play creation	Exploration using learned components	Good libraries support innovation, not only compliance

This suggests a different ROI logic for AI workflow systems.

The obvious ROI metric is execution time: how many minutes did the automation save on this task? That is still useful. But the deeper metric is search reduction: how much less diagnosis, configuration, prompting, reformatting, and decision branching is required next time?

In practical terms, a company should ask:

Which intermediate steps are repeatedly reconstructed from scratch?
Which saved components are reused across different tasks, not only within one report?
Which helper-like assets converge across teams without central instruction?
Which templates reduce decision ambiguity rather than merely enforcing formatting?
Which libraries become too large, causing selection cost to rise again?

That final question matters. The paper’s introduction explicitly notes the trade-off: too many helpers can streamline individual solutions while increasing the cost of maintaining and selecting among them. This is the classic failure mode of corporate knowledge bases. They begin as memory. They end as archaeological sites.

A good AI workflow library therefore needs governance, but not premature governance. Early exploration should be permissive. Later consolidation should be ruthless.

The practical lesson is library discipline, not library enthusiasm

There are three practical lessons worth taking from the paper.

First, abstraction should be treated as a first-class workflow event. When a user repeatedly builds the same intermediate object, the system should notice. In an AI product, that may mean detecting recurring prompt patterns, repeated data transformations, common exception-handling chains, or similar analytic frames across projects.

Second, not every saved object deserves promotion. Early helper creation may be broad because the future task distribution is uncertain. That does not mean every early helper is strategically valuable. The useful signal is later reuse, convergence across users, and reduction in search cost.

Third, AI systems should support partial abstractions, not only completed outputs. Saving a final report as a template is fine. Saving the diagnostic path that produced the report is often more valuable. Saving the evaluation rubric, the data-cleaning assumptions, the transformation pipeline, and the client-specific reasoning frame may reduce future search more than copying the report shell.

This is where many “AI knowledge base” products remain too document-centered. They store outputs. They retrieve outputs. Then they hope retrieval will behave like reasoning. Sometimes it does. Often it behaves like a very confident intern with a filing cabinet.

A library-learning perspective points toward something more operational: capture reusable procedures, intermediate representations, and decision structures. The asset is not the document. The asset is the reusable move.

Boundaries: a small visual-puzzle study is not an enterprise productivity theorem

The paper is careful enough, and we should be too.

The study is exploratory, single-condition, and small: 30 final participants after exclusions. The task order was fixed. The domain was visual pattern construction, not business work. The computational model is intentionally simple: its library mechanism adds complete solved patterns as new primitives, while human participants may create partial and anticipatory helpers. The free-play result is suggestive, not a direct model of innovation.

So the direct claim is limited:

People in this task created and reused helpers.
Helper use increased over trials.
Participants converged on similar helper strategies.
Library-based models solved later puzzles that baseline models could not solve within budget.
Human time and steps tracked model-estimated search cost more strongly than raw program length.

The Cognaptus inference is broader but should remain labeled as inference:

In business and AI workflows, reusable intermediate abstractions may reduce future search costs.
The value of a workflow library should be evaluated by reuse, convergence, and search reduction, not merely by asset count.
Good libraries may support exploration by expanding what users can conveniently try.

What remains uncertain is the transfer function. Visual puzzle helpers are clean, visible, and compositional. Enterprise work is messier: incentives distort documentation, teams disagree on vocabulary, client contexts vary, and many “helpers” carry hidden assumptions. The mechanism is promising; the implementation is where optimism usually goes to be audited.

Rewriting the search space is the real productivity gain

The paper’s quiet lesson is that abstraction is not a decorative layer on top of problem solving. It is part of problem solving itself.

When people build helpers, they are not merely saving time on a repeated step. They are changing the set of things they can easily think with. A later problem is approached with a different vocabulary from the one available at the beginning. The solver has not only learned answers. The solver has learned better moves.

That is also the real promise of AI workflow automation when it is done well. Not a chatbot that generates yet another plausible paragraph. Not a template gallery with motivational naming conventions. Not a knowledge base that rewards whoever uploaded the most PDFs.

The useful system is one that watches work, identifies reusable intermediate structures, helps people test them, promotes the ones that reduce future search, and retires the ones that merely look organized.

Hard problems are not solved only by working faster inside the same space. Often, they are solved by rewriting the space.

That is what humans appear to do in the Pattern Builder Task. It is what good AI systems should help organizations do deliberately.

Cognaptus: Automate the Present, Incubate the Future.

Pinzhe Zhao, Emanuele Sansone, Marta Kryven, and Bonan Zhao, “Online library learning in human visual puzzle solving,” arXiv:2603.23244v1, 2026. ↩︎

The hard part is not the long program; it is finding the program#

Pattern Builder makes abstraction visible instead of merely inferred#

Early learners save broadly; experienced learners reuse selectively#

The key evidence is the search-cost result#

The model comparisons are main evidence, not decorative benchmarking#

Free play shows that libraries guide exploration, not only execution#

What this means for AI workflow design#

The practical lesson is library discipline, not library enthusiasm#

Boundaries: a small visual-puzzle study is not an enterprise productivity theorem#

Rewriting the search space is the real productivity gain#