Opening — Why this matters now

Recommendation systems have quietly crossed a threshold. The question is no longer what to recommend, but how many things, in what order, and with what balance. In feeds, short-video apps, and content platforms, users consume slates—lists experienced holistically. Yet most systems still behave as if each item lives alone, blissfully unaware of its neighbors.

This paper tackles that mismatch head-on. It asks a deceptively simple question: what if recommendation models planned the slate first, instead of greedily assembling it item by item?

Background — From ranking items to generating slates

Classic recommendation pipelines score items independently, then stitch them together with reranking heuristics. Efficient? Yes. Slate-aware? Not really. Diversity, coherence, and balance emerge accidentally—if at all.

Generative recommendation promised a fix. By treating recommendation as sequence generation, models could, in theory, capture inter-item dependencies. In practice, three problems persist:

  1. Semantic ID entanglement — item tokenizations blur meaning at different prefix levels.
  2. Sequential inefficiency — multi-token items explode decoding steps and latency.
  3. No global planning — left-to-right generation reacts locally instead of reasoning globally.

The result: elegant theory, awkward deployment.

Analysis — What HiGR actually does

HiGR (Hierarchical Generative Recommendation) reframes slate generation as a coarse-to-fine planning problem.

1. Structured semantic IDs (CRQ-VAE)

Instead of treating semantic IDs as an afterthought, the paper redesigns them. A contrastive residual-quantized VAE enforces prefix-level semantic meaning:

  • Early ID prefixes encode high-level similarity.
  • Final layers preserve item-level discrimination.

This matters because diversity and relevance can now be controlled during decoding, not patched on afterward.

2. Hierarchical slate decoding

HiGR splits generation into two stages:

Stage Role What it decides
Slate planner Coarse-grained Global intent, structure, balance
Item generator Fine-grained Concrete item identities

Instead of 30+ token-by-token steps per slate, HiGR plans once, then fills efficiently. The architecture mirrors how humans curate lists: outline first, details later.

3. Listwise preference alignment (ORPO)

Training objectives finally match user behavior. Users don’t judge items; they judge lists.

HiGR adopts a reference-free preference optimization scheme that:

  • Uses implicit feedback at the slate level
  • Optimizes ranking quality, interest alignment, and diversity jointly
  • Avoids the computational overhead of classical RLHF pipelines

In short: less reward hacking, more signal.

Findings — Results that matter

Offline gains

Across industrial-scale datasets, HiGR consistently outperforms strong baselines:

Model Recall@5 NDCG@5
OneRec 0.0577 0.0589
HiGR (no alignment) 0.0721 0.0753
HiGR (with ORPO) 0.0760 0.0831

Efficiency gains

HiGR delivers ~5× inference speedup compared to autoregressive generative baselines—without sacrificing quality. Hierarchy pays rent.

Online validation

In live A/B tests on a large commercial platform, HiGR improved:

  • Average Watch Time (+1.22%)
  • Video Views (+1.73%)

These are not academic decimals. At scale, they are business events.

Implications — Why this is bigger than recommendation

HiGR hints at a broader pattern emerging across AI systems:

  • Planning beats reflex — hierarchical reasoning outperforms token-level greed.
  • Alignment must match evaluation units — item-level loss cannot optimize list-level experience.
  • Efficiency is architectural, not just hardware — fewer steps matter more than faster GPUs.

This is as relevant to agent systems, workflow automation, and tool orchestration as it is to feeds and videos.

Conclusion — Thinking in lists, not tokens

HiGR succeeds because it respects how users actually experience products: holistically, comparatively, and impatiently. It doesn’t just generate better slates—it generates them the right way.

Generative recommendation is no longer blocked by theory. The bottleneck was planning.

Cognaptus: Automate the Present, Incubate the Future.