Think Wide, Then Think Hard: Forcing LLMs to Be Creative (On Purpose)

Opening — Why this matters now

Large language models are prolific. Unfortunately, they are also boring in a very specific way.

Give an LLM a constrained task—generate a programming problem, write a quiz, design an exercise—and it will reliably produce something correct, polite, and eerily similar to everything it has produced before. Change the temperature, swap the model, even rotate personas, and the output still clusters around the same conceptual center.

This is not a bug. It is a structural outcome of how we prompt models to satisfy constraints immediately. The paper behind this article names the phenomenon bluntly: the Artificial Hivemind effect—a tendency toward premature convergence that quietly kills creativity.

The contribution here is not another decoding trick. It is a forcing function.

Background — Creativity is a process, not a vibe

Human creativity is not random. It is staged.

Classic theories—from Wallas’ early 20th‑century model to Guilford’s divergent–convergent framework—agree on one thing: exploration must precede evaluation. Humans first generate ideas freely, then narrow them down under constraints.

LLMs, by contrast, are usually asked to do both at once.

When we prompt “generate a creative problem that satisfies X, Y, Z”, the model immediately optimizes for constraint satisfaction. Novel directions are pruned before they even surface. The result is syntactic variation without conceptual expansion.

Prior attempts to fix this problem—higher temperature, persona simulation, multi‑agent debate—help at the margins but leave the core reasoning loop untouched.

This paper intervenes precisely there.

Analysis — What the paper actually does

The proposed method, CREATIVEDC, explicitly decomposes generation into two phases:

Divergent thinking: explore ideas related to the theme only. No constraints. No feasibility checks. Actively push for unusual, emotionally odd, or unconventional scenarios.
Convergent thinking: select one idea and retrofit it to satisfy all task constraints. If it fails, discard it and try another.

This sounds trivial. It is not.

By separating ideation from validation, the model is prevented from collapsing into the nearest high‑probability solution too early. Constraint satisfaction becomes a filter, not a generator.

The authors instantiate this method in a demanding setting: creative programming problem generation, where outputs must include:

a natural language problem description,
a correct and comprehensive test suite,
and a working reference solution.

Creativity is therefore not allowed to break utility. That tension is intentional.

Findings — Measured creativity, not vibes

The paper evaluates creativity along three dimensions: diversity, novelty, and utility. Importantly, novelty is measured against other LLM‑generated problems under the same constraints, not against the open web. This makes the benchmark adversarial rather than forgiving.

Summary of results (K = 100 problems per context)

Method	Lexical Diversity	Semantic Diversity	Semantic Novelty	Utility (%)
BASE	Low–Medium	Low	Very Low	~93
CoT	Medium	Low	Very Low	~91
CREATIVEDC	High	High	Significantly Higher	~91

The key result is not a small uplift—it is scale behavior.

Using the Vendi Score (an effective count of distinct problems), CREATIVEDC produces:

~24% more distinct problems at small sample sizes,
~72% more distinct problems at K = 100,

with the gap widening as more samples are drawn.

In other words: CREATIVEDC does not just diversify—it keeps diversifying.

Implications — Why this matters beyond education

Although evaluated on programming exercises, the implications generalize cleanly:

Agentic systems: planning agents that jump directly to execution will converge prematurely. Explicit divergence phases should be first‑class citizens in agent design.
Synthetic data generation: diversity collapse is a known failure mode. Structured ideation dramatically increases effective sample size without retraining.
Enterprise AI workflows: when outputs feel “samey,” the issue is rarely the model—it is the reasoning schedule.

Most importantly, this work reframes creativity as an inference‑time control problem, not a training‑time one. That makes it cheap, portable, and immediately deployable.

Conclusion — Creativity is procedural

LLMs are not uncreative because they lack imagination. They are uncreative because we rush them.

CREATIVEDC shows that a minimal, theory‑grounded scaffold—first explore, then commit—is enough to break the hivemind effect in a measurable, scalable way.

Creativity, it turns out, is less about randomness and more about patience.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Creativity is a process, not a vibe#

Analysis — What the paper actually does#

Findings — Measured creativity, not vibes#

Summary of results (K = 100 problems per context)#

Implications — Why this matters beyond education#

Conclusion — Creativity is procedural#