Opening — Why This Matters Now

Large Language Models are increasingly tasked with generating culture. Not summarizing it. Not translating it. Generating it.

From marketing copy to brand storytelling, from music to visual art—LLMs are being positioned as creative collaborators. But creativity without grounding is just noise with confidence.

A recent study titled “Can LLMs Cook Jamaican Couscous?” asks a deceptively simple question: can LLMs adapt a culturally rooted artifact—like a Moroccan dish—into a Jamaican variant in a way that reflects meaningful cultural distance? fileciteturn0file0

The answer, in short: they generate novelty. But not culture.

And that distinction matters more than most AI deployment roadmaps currently admit.


Background — Culture, Distance, and Culinary Creativity

Culture is not just content. It is structure. Tradition. Constraint.

In social science, cultural distance is measured across geography, language, religion, and value systems (e.g., Inglehart–Welzel maps). In creative domains, novelty is not random deviation—it is transformation within boundaries.

Cuisine is an ideal test case:

  • It encodes identity.
  • It evolves through adaptation.
  • It preserves core ingredients even when innovating.

The original GlobalFusion dataset paired 500 dishes with human-made cultural adaptations across 130 countries. Human adaptations exhibited a clear pattern:

The greater the cultural distance between two countries, the greater the divergence in the adapted recipe.

Novelty correlated with distance.

The new paper extends this setup by generating LLM-based cultural adaptations for the same dish-country pairs and comparing divergence patterns.

In other words: do models understand that adapting Moroccan couscous into a Jamaican variant requires different degrees of transformation than adapting it into a Tunisian one?


Analysis — What the Paper Actually Measures

The authors introduce five information-theoretic divergence metrics based on Jensen–Shannon divergence:

Metric What It Captures Cultural Meaning
Newness Appearance/disappearance of words Surface novelty
Uniqueness Divergence from prototype Distinctiveness
Difference Distance from community examples Cultural boundary crossing
New Surprise Unexpected word combinations Creative recombination
Divergent Surprise PMI distribution shifts Deep structural shift

They then correlate these metrics with four forms of cultural distance:

  • Inglehart–Welzel value distance
  • Geographic distance
  • Linguistic distance
  • Religious distance

For humans, two metrics stood out:

  • Difference
  • New Surprise

These were strongly correlated with cultural distance.

For LLMs? That relationship largely collapses.


Findings — Where Models Diverge From Humans

1. LLMs Overproduce Divergence

Across nearly all models tested (LLaMA-3-70B, Gemma-3, Qwen-3, etc.), divergence scores were inflated relative to human baselines.

Behavior Humans LLMs
Same-country variation Low divergence Elevated divergence
Cross-cultural variation Distance-sensitive increase Almost flat response

LLMs generate novelty regardless of cultural distance.

Humans increase novelty proportionally to distance.

That difference is not cosmetic—it reveals a structural gap.


2. “Creative” Prompts Do Almost Nothing

The study varied prompts using keywords like:

  • “novel”
  • “authentic”
  • “traditional”
  • “creative”
  • “original”

Result: minimal metric sensitivity.

Models do not meaningfully distinguish between tradition and creativity. Instead, novelty appears to be implemented via token-level deviation—injecting atypical or low-frequency terms.

Creativity becomes lexical mutation.

Culture becomes optional garnish.


3. Cultural Encoding Is Lost in Middle Layers

Using a Logit Lens layer analysis, the authors examined how divergence evolves inside the model.

Findings:

  • Early layers: cultural divergence compressed
  • Middle layers: flattened further
  • Final layers: partial superficial reconstruction

The pattern mirrors multilingual compression effects observed in other LLM studies.

Cultural specificity appears weakly encoded internally.

By the time generation occurs, the signal is thin.


4. Ingredient Grounding Reveals Substitution Bias

The most striking business-relevant insight may lie here.

Humans adapting a dish:

  • Preserve core ingredients
  • Introduce regional variation thoughtfully

LLMs:

  • Substitute heavily
  • Default to global ingredients
  • Replace specificity with placeholders

Top LLM ingredients across 1.3M generated recipes included:

  • Salt (to taste)
  • Onion
  • Garlic
  • Oil
  • Sugar

Culturally distinctive ingredients—cassava flour, semolina variants, Sichuan pepper—were often replaced with generic analogs.

This leads to what we might call culinary homogenization.

And if that sounds trivial, imagine this dynamic in:

  • Brand localization
  • Cross-border marketing
  • Cultural storytelling
  • Policy simulation

5. Country Attribution Errors Reveal Blurred Boundaries

When the origin country was omitted from prompts:

  • 25–50% of titles misattributed cuisine

Even when explicitly provided:

  • 15–40% error rates persisted

Errors clustered regionally:

  • Taiwan → China
  • Spain → Italy
  • Tunisia → Morocco

The model does not treat cultural boundaries as firm constraints.

It treats them as approximate neighbors in embedding space.


Implications — Why This Matters for AI Deployment

This paper is not about recipes.

It is about the limits of generative systems in culturally sensitive domains.

1. Novelty ≠ Cultural Creativity

LLMs maximize token divergence.

Humans optimize novelty under cultural constraint.

If deployed for:

  • Localization
  • Government communications
  • Cross-cultural education
  • Brand positioning

Models may produce outputs that feel fresh—but culturally hollow.


2. Scaling Does Not Fix Cultural Grounding

70B models did not outperform smaller models meaningfully.

Multilingual training did not guarantee better alignment.

Instruction tuning did not repair structural divergence issues.

Cultural grounding is not an emergent byproduct of scale.

It requires architectural or training-level intervention.


3. Evaluation Must Be Artifact-Based

Traditional alignment tests use surveys and value questions.

This paper demonstrates the importance of artifact-level evaluation:

  • Measure divergence
  • Compare against human baselines
  • Analyze internal representations
  • Inspect grounding elements (ingredients, entities, context)

This is far closer to real-world deployment contexts.

For AI governance, this is the direction that matters.


Conclusion — The Flavor Gap

LLMs can generate Jamaican couscous.

They cannot yet understand why it must still, in some way, remain couscous.

They inject spice. They increase lexical surprise. They amplify deviation.

But culture is not deviation alone.

It is constraint, continuity, and boundary awareness.

Until models preserve these structural anchors, their creativity will remain statistically impressive—and culturally thin.

And that gap will become more visible as generative AI moves from novelty generation to identity formation.

Cognaptus: Automate the Present, Incubate the Future.