Opening — Why This Matters Now
Large Language Models are increasingly tasked with generating culture. Not summarizing it. Not translating it. Generating it.
From marketing copy to brand storytelling, from music to visual art—LLMs are being positioned as creative collaborators. But creativity without grounding is just noise with confidence.
A recent study titled “Can LLMs Cook Jamaican Couscous?” asks a deceptively simple question: can LLMs adapt a culturally rooted artifact—like a Moroccan dish—into a Jamaican variant in a way that reflects meaningful cultural distance? fileciteturn0file0
The answer, in short: they generate novelty. But not culture.
And that distinction matters more than most AI deployment roadmaps currently admit.
Background — Culture, Distance, and Culinary Creativity
Culture is not just content. It is structure. Tradition. Constraint.
In social science, cultural distance is measured across geography, language, religion, and value systems (e.g., Inglehart–Welzel maps). In creative domains, novelty is not random deviation—it is transformation within boundaries.
Cuisine is an ideal test case:
- It encodes identity.
- It evolves through adaptation.
- It preserves core ingredients even when innovating.
The original GlobalFusion dataset paired 500 dishes with human-made cultural adaptations across 130 countries. Human adaptations exhibited a clear pattern:
The greater the cultural distance between two countries, the greater the divergence in the adapted recipe.
Novelty correlated with distance.
The new paper extends this setup by generating LLM-based cultural adaptations for the same dish-country pairs and comparing divergence patterns.
In other words: do models understand that adapting Moroccan couscous into a Jamaican variant requires different degrees of transformation than adapting it into a Tunisian one?
Analysis — What the Paper Actually Measures
The authors introduce five information-theoretic divergence metrics based on Jensen–Shannon divergence:
| Metric | What It Captures | Cultural Meaning |
|---|---|---|
| Newness | Appearance/disappearance of words | Surface novelty |
| Uniqueness | Divergence from prototype | Distinctiveness |
| Difference | Distance from community examples | Cultural boundary crossing |
| New Surprise | Unexpected word combinations | Creative recombination |
| Divergent Surprise | PMI distribution shifts | Deep structural shift |
They then correlate these metrics with four forms of cultural distance:
- Inglehart–Welzel value distance
- Geographic distance
- Linguistic distance
- Religious distance
For humans, two metrics stood out:
- Difference
- New Surprise
These were strongly correlated with cultural distance.
For LLMs? That relationship largely collapses.
Findings — Where Models Diverge From Humans
1. LLMs Overproduce Divergence
Across nearly all models tested (LLaMA-3-70B, Gemma-3, Qwen-3, etc.), divergence scores were inflated relative to human baselines.
| Behavior | Humans | LLMs |
|---|---|---|
| Same-country variation | Low divergence | Elevated divergence |
| Cross-cultural variation | Distance-sensitive increase | Almost flat response |
LLMs generate novelty regardless of cultural distance.
Humans increase novelty proportionally to distance.
That difference is not cosmetic—it reveals a structural gap.
2. “Creative” Prompts Do Almost Nothing
The study varied prompts using keywords like:
- “novel”
- “authentic”
- “traditional”
- “creative”
- “original”
Result: minimal metric sensitivity.
Models do not meaningfully distinguish between tradition and creativity. Instead, novelty appears to be implemented via token-level deviation—injecting atypical or low-frequency terms.
Creativity becomes lexical mutation.
Culture becomes optional garnish.
3. Cultural Encoding Is Lost in Middle Layers
Using a Logit Lens layer analysis, the authors examined how divergence evolves inside the model.
Findings:
- Early layers: cultural divergence compressed
- Middle layers: flattened further
- Final layers: partial superficial reconstruction
The pattern mirrors multilingual compression effects observed in other LLM studies.
Cultural specificity appears weakly encoded internally.
By the time generation occurs, the signal is thin.
4. Ingredient Grounding Reveals Substitution Bias
The most striking business-relevant insight may lie here.
Humans adapting a dish:
- Preserve core ingredients
- Introduce regional variation thoughtfully
LLMs:
- Substitute heavily
- Default to global ingredients
- Replace specificity with placeholders
Top LLM ingredients across 1.3M generated recipes included:
- Salt (to taste)
- Onion
- Garlic
- Oil
- Sugar
Culturally distinctive ingredients—cassava flour, semolina variants, Sichuan pepper—were often replaced with generic analogs.
This leads to what we might call culinary homogenization.
And if that sounds trivial, imagine this dynamic in:
- Brand localization
- Cross-border marketing
- Cultural storytelling
- Policy simulation
5. Country Attribution Errors Reveal Blurred Boundaries
When the origin country was omitted from prompts:
- 25–50% of titles misattributed cuisine
Even when explicitly provided:
- 15–40% error rates persisted
Errors clustered regionally:
- Taiwan → China
- Spain → Italy
- Tunisia → Morocco
The model does not treat cultural boundaries as firm constraints.
It treats them as approximate neighbors in embedding space.
Implications — Why This Matters for AI Deployment
This paper is not about recipes.
It is about the limits of generative systems in culturally sensitive domains.
1. Novelty ≠ Cultural Creativity
LLMs maximize token divergence.
Humans optimize novelty under cultural constraint.
If deployed for:
- Localization
- Government communications
- Cross-cultural education
- Brand positioning
Models may produce outputs that feel fresh—but culturally hollow.
2. Scaling Does Not Fix Cultural Grounding
70B models did not outperform smaller models meaningfully.
Multilingual training did not guarantee better alignment.
Instruction tuning did not repair structural divergence issues.
Cultural grounding is not an emergent byproduct of scale.
It requires architectural or training-level intervention.
3. Evaluation Must Be Artifact-Based
Traditional alignment tests use surveys and value questions.
This paper demonstrates the importance of artifact-level evaluation:
- Measure divergence
- Compare against human baselines
- Analyze internal representations
- Inspect grounding elements (ingredients, entities, context)
This is far closer to real-world deployment contexts.
For AI governance, this is the direction that matters.
Conclusion — The Flavor Gap
LLMs can generate Jamaican couscous.
They cannot yet understand why it must still, in some way, remain couscous.
They inject spice. They increase lexical surprise. They amplify deviation.
But culture is not deviation alone.
It is constraint, continuity, and boundary awareness.
Until models preserve these structural anchors, their creativity will remain statistically impressive—and culturally thin.
And that gap will become more visible as generative AI moves from novelty generation to identity formation.
Cognaptus: Automate the Present, Incubate the Future.