Opening — Why this matters now
Recommendation systems have quietly crossed a threshold. The question is no longer what to recommend, but how many things, in what order, and with what balance. In feeds, short-video apps, and content platforms, users consume slates—lists experienced holistically. Yet most systems still behave as if each item lives alone, blissfully unaware of its neighbors.
This paper tackles that mismatch head-on. It asks a deceptively simple question: what if recommendation models planned the slate first, instead of greedily assembling it item by item?
Background — From ranking items to generating slates
Classic recommendation pipelines score items independently, then stitch them together with reranking heuristics. Efficient? Yes. Slate-aware? Not really. Diversity, coherence, and balance emerge accidentally—if at all.
Generative recommendation promised a fix. By treating recommendation as sequence generation, models could, in theory, capture inter-item dependencies. In practice, three problems persist:
- Semantic ID entanglement — item tokenizations blur meaning at different prefix levels.
- Sequential inefficiency — multi-token items explode decoding steps and latency.
- No global planning — left-to-right generation reacts locally instead of reasoning globally.
The result: elegant theory, awkward deployment.
Analysis — What HiGR actually does
HiGR (Hierarchical Generative Recommendation) reframes slate generation as a coarse-to-fine planning problem.
1. Structured semantic IDs (CRQ-VAE)
Instead of treating semantic IDs as an afterthought, the paper redesigns them. A contrastive residual-quantized VAE enforces prefix-level semantic meaning:
- Early ID prefixes encode high-level similarity.
- Final layers preserve item-level discrimination.
This matters because diversity and relevance can now be controlled during decoding, not patched on afterward.
2. Hierarchical slate decoding
HiGR splits generation into two stages:
| Stage | Role | What it decides |
|---|---|---|
| Slate planner | Coarse-grained | Global intent, structure, balance |
| Item generator | Fine-grained | Concrete item identities |
Instead of 30+ token-by-token steps per slate, HiGR plans once, then fills efficiently. The architecture mirrors how humans curate lists: outline first, details later.
3. Listwise preference alignment (ORPO)
Training objectives finally match user behavior. Users don’t judge items; they judge lists.
HiGR adopts a reference-free preference optimization scheme that:
- Uses implicit feedback at the slate level
- Optimizes ranking quality, interest alignment, and diversity jointly
- Avoids the computational overhead of classical RLHF pipelines
In short: less reward hacking, more signal.
Findings — Results that matter
Offline gains
Across industrial-scale datasets, HiGR consistently outperforms strong baselines:
| Model | Recall@5 | NDCG@5 |
|---|---|---|
| OneRec | 0.0577 | 0.0589 |
| HiGR (no alignment) | 0.0721 | 0.0753 |
| HiGR (with ORPO) | 0.0760 | 0.0831 |
Efficiency gains
HiGR delivers ~5× inference speedup compared to autoregressive generative baselines—without sacrificing quality. Hierarchy pays rent.
Online validation
In live A/B tests on a large commercial platform, HiGR improved:
- Average Watch Time (+1.22%)
- Video Views (+1.73%)
These are not academic decimals. At scale, they are business events.
Implications — Why this is bigger than recommendation
HiGR hints at a broader pattern emerging across AI systems:
- Planning beats reflex — hierarchical reasoning outperforms token-level greed.
- Alignment must match evaluation units — item-level loss cannot optimize list-level experience.
- Efficiency is architectural, not just hardware — fewer steps matter more than faster GPUs.
This is as relevant to agent systems, workflow automation, and tool orchestration as it is to feeds and videos.
Conclusion — Thinking in lists, not tokens
HiGR succeeds because it respects how users actually experience products: holistically, comparatively, and impatiently. It doesn’t just generate better slates—it generates them the right way.
Generative recommendation is no longer blocked by theory. The bottleneck was planning.
Cognaptus: Automate the Present, Incubate the Future.