Opening — Why this matters now
Topic modeling has matured into infrastructure. It quietly powers search, document clustering, policy analysis, and exploratory research pipelines across industries. Yet one deceptively simple question still wastes disproportionate time and compute:
How many topics should my LDA model have?
Most practitioners answer this the same way they did a decade ago: grid search, intuition, or vague heuristics (“try 50, see if it looks okay”). The paper behind this article takes a colder view. Selecting the number of topics, T, is not an art problem — it is a budget‑constrained black‑box optimization problem. Once framed that way, some uncomfortable truths emerge.
Background — Context and prior art
Latent Dirichlet Allocation (LDA) decomposes a document–word matrix into document–topic and topic–word matrices. Its statistical behavior is well understood. Its operational behavior is not.
The number of topics, T, strongly affects:
- Statistical fit (measured by perplexity)
- Stability across random initializations
- Human interpretability of topics
- Downstream task performance
Classic approaches include:
- Exhaustive grid search over T
- Heuristics based on perplexity curvature
- Composite metrics combining coherence and stability
All of them share a flaw: they assume evaluation is cheap. In reality, each evaluation requires training and validating a full LDA model — expensive, noisy, and opaque.
This paper strips the problem down to its essence: treat LDA as a function
$$f(T) = \text{Perplexity}(\text{LDA trained with } T)$$
with no gradients, no analytic form, and a strict evaluation budget. That is the textbook definition of black‑box optimization.
Analysis — What the paper actually does
The authors compare four families of optimizers under identical budgets:
| Method | Class | Core Idea |
|---|---|---|
| GA | Evolutionary | Population, crossover, mutation |
| ES | Evolutionary | Parent–offspring mutation and selection |
| PABBO | Learned / Amortized | Preference-based RL with Transformer policy |
| SABBO | Learned / Amortized | Sharpness-aware distributional optimization |
Key design choices matter:
- Only T is optimized (priors fixed as α = β = 1/T)
- Perplexity is the sole objective
- 20 evaluations per run — deliberately tight
- Identical initialization across methods
Four datasets are used: 20 Newsgroups, AG News, Yelp Reviews, and a mixed out-of-distribution validation corpus.
This is not about squeezing the last decimal point. It is about who gets to “good enough” fastest.
Findings — Results that change how you think
1. Final performance (after full budget)
After 20 evaluations, most methods converge to roughly similar perplexity bands — except ES, which lags.
| Dataset | GA | ES | PABBO | SABBO |
|---|---|---|---|---|
| 20NEWS | 1776 | 2057 | 1810 | 1680 |
| AGNEWS | 2155 | 3800 | 2185 | 2151 |
| VAL‑OUT | 1653 | 2449 | 1566 | 1558 |
| YELP | 1379 | 1823 | 1357 | 1351 |
SABBO wins — but that is not the interesting part.
2. Sample efficiency (the real story)
When you look at when good solutions appear:
- GA: slow, steady, consumes almost the entire budget
- ES: slow and often wrong early
- PABBO: volatile but frequently lucky early
- SABBO: near‑optimal after the first evaluation
Yes — in most runs, one SABBO query identifies a topic count competitive with what GA or ES find after 20.
3. Wall‑clock reality
Time matters more than iterations:
- GA is consistently the slowest
- ES finishes earlier but underperforms
- PABBO exploits cheap iterations
- SABBO’s first step is expensive — and decisive
The implication is brutal: most traditional hyperparameter tuning is wasting compute after the first few tries.
Implications — What this means beyond LDA
This paper is not really about topic modeling. It is about how we should tune models in 2026 and beyond.
Three takeaways matter for practitioners:
-
Hyperparameters are black‑box problems by default If gradients are unavailable, pretending otherwise is cargo cult optimization.
-
Learned optimizers dominate tight budgets When evaluations are expensive, amortized optimizers are not “fancy” — they are rational.
-
Sharpness awareness transfers surprisingly well SABBO was not trained on text data, yet generalizes effectively. That should unsettle anyone still grid‑searching.
The future directions outlined by the authors are even more telling: supervised prediction of optimal T, or reinforcement‑learning agents that choose topic counts directly from corpus features.
At that point, “choosing T” stops being a modeling decision and becomes an inference problem.
Conclusion — Stop counting, start optimizing
The uncomfortable conclusion is this: manual topic selection is already obsolete.
If a single sharpness‑aware query can outperform hours of evolutionary tinkering, the question is no longer whether to adopt learned black‑box optimization — but why we waited so long.
LDA was never the hard part. Decision‑making under uncertainty was.
Cognaptus: Automate the Present, Incubate the Future.