Choosing Topics Without Counting: When LDA Meets Black-Box Intelligence

Opening — Why this matters now

Topic modeling has matured into infrastructure. It quietly powers search, document clustering, policy analysis, and exploratory research pipelines across industries. Yet one deceptively simple question still wastes disproportionate time and compute:

How many topics should my LDA model have?

Most practitioners answer this the same way they did a decade ago: grid search, intuition, or vague heuristics (“try 50, see if it looks okay”). The paper behind this article takes a colder view. Selecting the number of topics, T, is not an art problem — it is a budget‑constrained black‑box optimization problem. Once framed that way, some uncomfortable truths emerge.

Background — Context and prior art

Latent Dirichlet Allocation (LDA) decomposes a document–word matrix into document–topic and topic–word matrices. Its statistical behavior is well understood. Its operational behavior is not.

The number of topics, T, strongly affects:

Statistical fit (measured by perplexity)
Stability across random initializations
Human interpretability of topics
Downstream task performance

Classic approaches include:

Exhaustive grid search over T
Heuristics based on perplexity curvature
Composite metrics combining coherence and stability

All of them share a flaw: they assume evaluation is cheap. In reality, each evaluation requires training and validating a full LDA model — expensive, noisy, and opaque.

This paper strips the problem down to its essence: treat LDA as a function

$$f(T) = \text{Perplexity}(\text{LDA trained with } T)$$

with no gradients, no analytic form, and a strict evaluation budget. That is the textbook definition of black‑box optimization.

Analysis — What the paper actually does

The authors compare four families of optimizers under identical budgets:

Method	Class	Core Idea
GA	Evolutionary	Population, crossover, mutation
ES	Evolutionary	Parent–offspring mutation and selection
PABBO	Learned / Amortized	Preference-based RL with Transformer policy
SABBO	Learned / Amortized	Sharpness-aware distributional optimization

Key design choices matter:

Only T is optimized (priors fixed as α = β = 1/T)
Perplexity is the sole objective
20 evaluations per run — deliberately tight
Identical initialization across methods

Four datasets are used: 20 Newsgroups, AG News, Yelp Reviews, and a mixed out-of-distribution validation corpus.

This is not about squeezing the last decimal point. It is about who gets to “good enough” fastest.

Findings — Results that change how you think

1. Final performance (after full budget)

After 20 evaluations, most methods converge to roughly similar perplexity bands — except ES, which lags.

Dataset	GA	ES	PABBO	SABBO
20NEWS	1776	2057	1810	1680
AGNEWS	2155	3800	2185	2151
VAL‑OUT	1653	2449	1566	1558
YELP	1379	1823	1357	1351

SABBO wins — but that is not the interesting part.

2. Sample efficiency (the real story)

When you look at when good solutions appear:

GA: slow, steady, consumes almost the entire budget
ES: slow and often wrong early
PABBO: volatile but frequently lucky early
SABBO: near‑optimal after the first evaluation

Yes — in most runs, one SABBO query identifies a topic count competitive with what GA or ES find after 20.

3. Wall‑clock reality

Time matters more than iterations:

GA is consistently the slowest
ES finishes earlier but underperforms
PABBO exploits cheap iterations
SABBO’s first step is expensive — and decisive

The implication is brutal: most traditional hyperparameter tuning is wasting compute after the first few tries.

Implications — What this means beyond LDA

This paper is not really about topic modeling. It is about how we should tune models in 2026 and beyond.

Three takeaways matter for practitioners:

Hyperparameters are black‑box problems by default If gradients are unavailable, pretending otherwise is cargo cult optimization.
Learned optimizers dominate tight budgets When evaluations are expensive, amortized optimizers are not “fancy” — they are rational.
Sharpness awareness transfers surprisingly well SABBO was not trained on text data, yet generalizes effectively. That should unsettle anyone still grid‑searching.

The future directions outlined by the authors are even more telling: supervised prediction of optimal T, or reinforcement‑learning agents that choose topic counts directly from corpus features.

At that point, “choosing T” stops being a modeling decision and becomes an inference problem.

Conclusion — Stop counting, start optimizing

The uncomfortable conclusion is this: manual topic selection is already obsolete.

If a single sharpness‑aware query can outperform hours of evolutionary tinkering, the question is no longer whether to adopt learned black‑box optimization — but why we waited so long.

LDA was never the hard part. Decision‑making under uncertainty was.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — Context and prior art#

Analysis — What the paper actually does#

Findings — Results that change how you think#

1. Final performance (after full budget)#

2. Sample efficiency (the real story)#

3. Wall‑clock reality#

Implications — What this means beyond LDA#

Conclusion — Stop counting, start optimizing#