Opening — Why this matters now

Topic modeling has matured into infrastructure. It quietly powers search, document clustering, policy analysis, and exploratory research pipelines across industries. Yet one deceptively simple question still wastes disproportionate time and compute:

How many topics should my LDA model have?

Most practitioners answer this the same way they did a decade ago: grid search, intuition, or vague heuristics (“try 50, see if it looks okay”). The paper behind this article takes a colder view. Selecting the number of topics, T, is not an art problem — it is a budget‑constrained black‑box optimization problem. Once framed that way, some uncomfortable truths emerge.

Background — Context and prior art

Latent Dirichlet Allocation (LDA) decomposes a document–word matrix into document–topic and topic–word matrices. Its statistical behavior is well understood. Its operational behavior is not.

The number of topics, T, strongly affects:

  • Statistical fit (measured by perplexity)
  • Stability across random initializations
  • Human interpretability of topics
  • Downstream task performance

Classic approaches include:

  • Exhaustive grid search over T
  • Heuristics based on perplexity curvature
  • Composite metrics combining coherence and stability

All of them share a flaw: they assume evaluation is cheap. In reality, each evaluation requires training and validating a full LDA model — expensive, noisy, and opaque.

This paper strips the problem down to its essence: treat LDA as a function

$$f(T) = \text{Perplexity}(\text{LDA trained with } T)$$

with no gradients, no analytic form, and a strict evaluation budget. That is the textbook definition of black‑box optimization.

Analysis — What the paper actually does

The authors compare four families of optimizers under identical budgets:

Method Class Core Idea
GA Evolutionary Population, crossover, mutation
ES Evolutionary Parent–offspring mutation and selection
PABBO Learned / Amortized Preference-based RL with Transformer policy
SABBO Learned / Amortized Sharpness-aware distributional optimization

Key design choices matter:

  • Only T is optimized (priors fixed as α = β = 1/T)
  • Perplexity is the sole objective
  • 20 evaluations per run — deliberately tight
  • Identical initialization across methods

Four datasets are used: 20 Newsgroups, AG News, Yelp Reviews, and a mixed out-of-distribution validation corpus.

This is not about squeezing the last decimal point. It is about who gets to “good enough” fastest.

Findings — Results that change how you think

1. Final performance (after full budget)

After 20 evaluations, most methods converge to roughly similar perplexity bands — except ES, which lags.

Dataset GA ES PABBO SABBO
20NEWS 1776 2057 1810 1680
AGNEWS 2155 3800 2185 2151
VAL‑OUT 1653 2449 1566 1558
YELP 1379 1823 1357 1351

SABBO wins — but that is not the interesting part.

2. Sample efficiency (the real story)

When you look at when good solutions appear:

  • GA: slow, steady, consumes almost the entire budget
  • ES: slow and often wrong early
  • PABBO: volatile but frequently lucky early
  • SABBO: near‑optimal after the first evaluation

Yes — in most runs, one SABBO query identifies a topic count competitive with what GA or ES find after 20.

3. Wall‑clock reality

Time matters more than iterations:

  • GA is consistently the slowest
  • ES finishes earlier but underperforms
  • PABBO exploits cheap iterations
  • SABBO’s first step is expensive — and decisive

The implication is brutal: most traditional hyperparameter tuning is wasting compute after the first few tries.

Implications — What this means beyond LDA

This paper is not really about topic modeling. It is about how we should tune models in 2026 and beyond.

Three takeaways matter for practitioners:

  1. Hyperparameters are black‑box problems by default If gradients are unavailable, pretending otherwise is cargo cult optimization.

  2. Learned optimizers dominate tight budgets When evaluations are expensive, amortized optimizers are not “fancy” — they are rational.

  3. Sharpness awareness transfers surprisingly well SABBO was not trained on text data, yet generalizes effectively. That should unsettle anyone still grid‑searching.

The future directions outlined by the authors are even more telling: supervised prediction of optimal T, or reinforcement‑learning agents that choose topic counts directly from corpus features.

At that point, “choosing T” stops being a modeling decision and becomes an inference problem.

Conclusion — Stop counting, start optimizing

The uncomfortable conclusion is this: manual topic selection is already obsolete.

If a single sharpness‑aware query can outperform hours of evolutionary tinkering, the question is no longer whether to adopt learned black‑box optimization — but why we waited so long.

LDA was never the hard part. Decision‑making under uncertainty was.

Cognaptus: Automate the Present, Incubate the Future.