A dataset rarely hides everything equally.

In most organisations, the visible structure is already over-managed. Product images are labelled by category. Medical scans are labelled by diagnosis. Satellite imagery is indexed by region and timestamp. Customer records are sliced into the usual demographic trays. Scientific images come with whatever measurements the field has already agreed are worth writing down.

The trouble is not that modern models fail to compress this information. They compress it quite enthusiastically. Sometimes too enthusiastically. A latent space can become a filing cabinet for the obvious: class, colour, shape, acquisition artefact, instrument setting, annotation convention. Then, when researchers or analysts ask what else is inside the data, the model politely shows them the same dominant structure again. Very helpful. Very circular.

The paper What We Don’t C: Manifold Disentanglement for Structured Discovery proposes a more interesting move.1 Instead of asking a representation to reveal what it knows, the method asks: what becomes visible after we suppress what we already know?

That is the mechanism worth understanding. Not because it produces a new leaderboard trophy, and not because “latent space” has suddenly become a magical cave of hidden business value. The paper is useful because it reframes latent representations as something that can be iteratively interrogated. Known features become controls. The base distribution stops being “just noise”. And residual structure becomes the thing to inspect.

The point is not full disentanglement; it is targeted removal

Classical disentanglement has often been sold with a tidy mental picture: one latent dimension for one factor, another latent dimension for another factor, and so on until reality conveniently behaves like a spreadsheet. Reality, inconsiderately, does not.

WWDC takes a more modest and more practical route. The authors define the task as manifold disentanglement: given an existing representation manifold, remove or suppress a known signal from it so that other structure becomes easier to access. This is not the same as learning a perfectly factorised world model. It is closer to controlled subtraction.

That distinction matters.

The method assumes you already have a representation model, typically a VAE-style encoder-decoder system. It does not require retraining the original VAE every time a new conditioning variable appears. Instead, WWDC trains a latent flow-matching model on the frozen latent representations. The flow model learns a path between the VAE latent manifold and a Gaussian-like base distribution. During training, classifier-free guidance is used so the flow can estimate both conditional and unconditional behaviour.

Then comes the useful trick. At inference, the method runs the flow backward from a VAE latent representation toward the base distribution, using known information as guidance. If the flow is conditioned on a feature such as class label, digit identity, red/green colour value, or galaxy morphology, that guided reverse flow tends to make the conditioned feature less accessible in the base representation.

The replacement belief is simple:

Common reader belief Correction Why it matters
The base distribution is meaningless noise. In WWDC, the reverse-flowed base distribution can preserve useful structure from the original manifold. The “noise” side of a generative flow becomes inspectable, not disposable.
Conditioning simply deletes information. Guidance suppresses accessibility of the conditioned signal while other latent structure may remain. The method is better understood as controlled residualisation, not amnesia with a GPU.
Disentanglement means separating every factor cleanly. WWDC targets known signals one at a time or in groups. This makes the approach more plausible for messy scientific and industrial datasets.

The paper’s title pun is doing real work here. “What We Don’t C” means what we do not capture, consider, or catalogue. The method is not asking the model to become omniscient. It is asking the model to stop shouting the answer we already wrote on the label.

Why the base distribution is not just mathematical landfill

The paper’s mechanism depends on a subtle point: a flow model is not merely throwing latent samples into a random bin. Flow matching learns a continuous transformation between a source distribution and a target distribution. In this setup, the target is the VAE latent representation, and the source is a Gaussian base distribution.

Because the flow is trained along optimal-transport-style paths, the authors argue that structure from the original latent manifold can be preserved when mapped back toward the base. The VAE side already has some pressure toward Gaussianity through its KL regularisation. The flow’s base distribution is also Gaussian. That shared geometry is part of the reason the base side can still contain meaningful organisation.

This is the misconception the article needs to kill early: “base” does not mean “empty”. In a standard generative workflow, the base distribution is often treated as a sampling convenience. Start with noise, flow forward, generate a sample. WWDC reverses the practical emphasis. It asks what happens when we take a real sample, flow it backward, and inspect the resulting base-side representation after guidance has suppressed known features.

That is the paper’s central business-relevant mechanism. Many organisations already have embeddings, autoencoders, feature stores, and labelled datasets. The expensive thing is not always training another giant model. The expensive thing is often figuring out which residual patterns are worth a human expert’s attention.

WWDC suggests a workflow:

  1. Start with an existing representation.
  2. Choose a known signal to condition on.
  3. Reverse-flow the representation using that guidance.
  4. Inspect the residual manifold.
  5. Identify candidate features that were previously obscured.
  6. Add new annotations or probes.
  7. Repeat.

This is not “AI discovers everything”. It is closer to a structured audit loop for representations. Less glamorous, more useful. An increasingly rare combination.

The synthetic Gaussian test proves the mechanism, not the product

The first experiment is deliberately simple: four 2D Gaussian clusters. The authors condition the flow on class labels and examine what happens to class information and a secondary distance feature.

This is main evidence, but only for mechanism validation. It is not meant to prove usefulness on real data. It answers a narrower question: if the conditioning feature is cleanly separable, does guided reverse flow suppress that feature while surfacing another one?

The answer is yes in the controlled setting. In the unguided flow, class structure remains visible in the base distribution. In the class-guided flow, class structure disappears from the base representation. Meanwhile, a secondary feature—the distance of each point from the centre of its Gaussian—becomes easier to recover with a simple linear model.

That last detail is important. The value is not just that one signal is suppressed. The value is that another signal becomes simpler to access. In the authors’ quantitative evaluation, class mutual information falls under full guidance at the base end of the flow, while the secondary distance feature becomes linearly recoverable. The paper uses guidance weight and flow time to show how this changes across the trajectory.

This experiment should be read as a mechanism sanity check:

Test Likely purpose What it supports What it does not prove
Four 2D Gaussians Main mechanism evidence Guided reverse flows can suppress a clean class signal and expose another structured feature. That the method works automatically on messy, high-dimensional enterprise data.
Mutual information over flow time Mechanism measurement Conditioning reduces accessibility of class information as the representation moves toward the base. That all conditioned signals can be perfectly removed.
Linear recovery of distance Interpretability probe Residual structure can become simpler for downstream models. That the residual feature is always semantically meaningful.

This is good experimental design because the authors know the ground truth. They can say what should disappear and what should remain. The toy case earns the reader’s trust in the mechanism before the paper moves to messier settings.

Colored MNIST shows controlled residual discovery

The colored MNIST experiment is the paper’s middle bridge: still controlled, but no longer a two-dimensional toy.

The authors generate coloured digit images by multiplying MNIST digits by random RGB values. They train a VAE on this dataset, then train a flow model on the VAE latents. The flow is conditioned on digit class plus red and green values. Blue is deliberately withheld.

This design is neat because blue becomes the “what we don’t C” variable. The method is told about digit identity, red, and green. It is not told about blue. If the mechanism works, the guided representation should suppress the conditioned variables while allowing the unconditioned blue structure to remain accessible.

The visual evidence uses t-SNE projections. The original VAE space is visibly structured by digit class. After guided reverse flow, that class structure is largely suppressed. At the same time, the blue feature—hard to see clearly in the original VAE projection—appears as a clearer gradient in the guided base representation.

The linear probes add a more operational reading. Digit classification accuracy drops under guidance compared with unguided and VAE representations. Red and green recovery is also repressed, especially with fewer labelled examples. Blue, however, remains recoverable. This is the core result: the method does not merely damage the representation. It changes which information is easy to access.

There are two caveats, and both are informative rather than fatal.

First, class information does not vanish perfectly. The authors note that class is entangled with non-class-specific visual features: straightness, loop count, stroke geometry. A “1” is not only a label; it is also a shape. Suppressing the label does not remove every feature correlated with that label. This matters in business settings because labels are often proxies, not pure factors. “High-risk customer”, “premium user”, “defective part”, or “suspicious transaction” may each contain multiple behavioural or measurement substructures.

Second, the VAE was trained with reconstruction quality in mind, using a relatively large latent space. That helps interpretability and sample quality, but it also leaves room for guided properties to be recovered when enough training samples are available. Translation: representation quality and latent capacity are not boring implementation details. They decide whether the residual manifold is genuinely useful or merely rearranged clutter.

The paper also includes a style-transfer-style demonstration on cMNIST. The model reverse-flows a digit using one conditioning signal, then flows forward with another digit condition, producing stylistically related digits. This is best read as an exploratory extension, not the main thesis. It shows that guided base representations preserve some sample-specific style information. It does not by itself prove discovery. It does, however, support the idea that the base-side representation is not empty noise wearing a lab coat.

Galaxy10 moves from controlled proof to scientific plausibility

The Galaxy10 experiment is where the paper becomes interesting for scientific and industrial use. The authors apply WWDC to Galaxy10 DECaLS: 17,736 colour galaxy images, each 256 by 256 pixels, across ten broad morphology classes. These include categories such as disturbed, merging, round smooth, barred spiral, unbarred spirals, and edge-on galaxies.

Here the evidence is more qualitative. The authors train a VAE on galaxy images and a class-conditional flow model using Galaxy10 morphology labels. They then select galaxies, project them to the base distribution, and flow forward using “round” as the guiding class. The choice of “round” is deliberate: it is semantically simpler than many alternatives, so changes in the generated image can be interpreted as the removal or modification of galaxy morphology rather than the introduction of some complex new structure.

The result is visual feature isolation. The original galaxy, the guidance-generated “round” version, and the residual between them show which features the class guidance changed. Background features remain stable. In one example, the paper notes that a yellow lower-half artefact is preserved even when the galaxy structure changes, suggesting that the method has separated class-related galaxy morphology from incidental imaging artefacts.

This is not a benchmark victory. It is a plausibility demonstration for a discovery workflow.

For astronomy, that matters because modern surveys produce more visual structure than humans can cheaply inspect. For business, the analogy is broader: any organisation with large image, sensor, document, or embedding datasets faces the same problem. The known label can dominate the representation. WWDC offers a way to ask, “After we control for that label, what structure is still there?”

A manufacturing team might condition on known defect type and inspect residual material patterns. A medical imaging group might condition on an established diagnosis and inspect residual acquisition artefacts or secondary morphology. A fraud team might condition on known fraud typologies and search for residual behavioural clusters. A content platform might condition on topic or language and inspect residual style, safety, or quality signals.

Those are Cognaptus inferences, not claims proven by the paper. The paper directly shows synthetic Gaussians, coloured MNIST, and galaxy morphology. The business pathway is an extrapolation from representation inspection to operational diagnosis. Sensible, but not yet production-certified. We are not selling moon rockets from a t-SNE plot. Restraint remains legal.

The practical value is cheaper diagnosis, not magical discovery

The most commercially relevant part of WWDC is not that it can generate samples. It is that it reuses existing representations.

Many organisations already have embeddings or latent models trained for search, classification, anomaly detection, recommendation, or image reconstruction. The usual response to a new discovery question is to retrain, relabel, or build a new supervised model. That is expensive, slow, and often wasteful.

WWDC points toward a different operating model: keep the base representation fixed, then train lighter guided flow models around specific known variables. In the paper’s cMNIST setup, for example, the VAE is much larger than the flow model. The exact economics will vary by domain, but the design pattern is clear: use the expensive representation as infrastructure; use guided flows as diagnostic adapters.

That creates four practical pathways.

Technical contribution Operational consequence ROI relevance
Reuses frozen VAE-style representations Teams can test new conditioning variables without rebuilding the base model. Lower experimentation cost.
Suppresses known signals through guided reverse flow Analysts can inspect residual structure rather than rediscovering dominant labels. Faster annotation and triage.
Preserves some non-conditioned structure Hidden or secondary factors may become easier to probe. Better discovery workflows and QA.
Supports sample generation from guided base representations Experts can inspect counterfactual-style variants. More interpretable review loops.

The phrase “hidden feature discovery” should still be handled carefully. A residual cluster is not automatically a scientific fact, a customer segment, or a safety issue. It is a candidate. WWDC can help produce better candidates for inspection. Domain experts still have to decide whether those candidates mean anything.

This is where the method fits naturally into model assurance. Instead of asking only whether a model performs well on known labels, an organisation can ask what the representation still contains after known labels are suppressed. That is a more serious question. It moves evaluation from accuracy theatre toward representation due diligence.

The boundaries are not decorative; they affect deployment

The paper is explicit that WWDC is early-stage. The limitations are not generic “more work is needed” confetti. They define where practical use may fail.

First, the method depends on ODE solutions during flow training and inference. Numerical errors can affect the representation chain. The paper does not yet quantify how these errors influence information loss or preservation. In a production setting, this means teams would need stability checks before treating residual structure as meaningful.

Second, hyperparameter coverage is limited. The authors did not conduct a full investigation into flow training settings, VAE latent size, dropout frequency, or related configuration choices. These are not minor knobs. In WWDC, they influence how well conditioned information is suppressed and how cleanly residual information survives.

Third, the guidance mechanism itself remains open. The paper uses classifier-free guidance, but it does not establish the best conditioning architecture across domains. cMNIST uses an MLP with conditioning modulation; Galaxy10 uses a class-conditional U-Net-style flow model. That architectural flexibility is useful, but it also means users cannot assume one universal recipe.

Fourth, the current work is limited to Euclidean state spaces. The authors specifically note that extensions to other spaces, such as discrete tokens and quantised VAEs, remain future work. This matters for language, code, and many enterprise document systems. The idea may transfer, but the paper does not prove it.

Finally, the Galaxy10 evidence is compelling as a qualitative scientific demonstration, but it is not a controlled deployment study. It does not yet answer how often WWDC discovers genuinely novel expert-validated features, how many false candidate patterns it produces, or how it compares against simpler residualisation baselines in a real annotation pipeline.

Those boundaries should not reduce interest in the method. They should prevent the usual interpretability overreach. A tool that helps experts ask better questions is already valuable. It does not need to pretend it has abolished epistemology.

The deeper shift: from representation learning to representation interrogation

The useful mental model for WWDC is not “better latent space”. It is “interactive latent subtraction”.

A normal representation model compresses data into features. A supervised model learns to predict known labels. A clustering tool groups whatever geometry dominates the embedding. WWDC adds a more surgical question: what remains structurally organised after a known signal is suppressed?

That question is valuable because modern AI systems are increasingly deployed in domains where the known taxonomy is incomplete. In science, the labels lag behind the data. In industrial QA, known defect classes lag behind new process failures. In finance and cyber risk, known typologies lag behind adversarial behaviour. In customer analytics, known segments lag behind changing user intent. The gap is not merely between labelled and unlabelled data. It is between catalogued structure and uncatalogued structure.

WWDC does not solve that gap outright. It gives teams a mechanism for making the gap inspectable.

The article-level lesson is therefore quite specific: do not treat latent spaces as passive by-products of model training. Treat them as diagnostic terrain. Condition on what you know. Suppress it. Inspect what remains. Then decide whether the remainder is signal, artefact, or noise with better manners.

That is where the business relevance sits. Not in replacing expert judgement, but in focusing it. Not in discovering truth automatically, but in reducing the cost of asking the next intelligent question.

What We Don’t C is really about what we keep over-looking

The paper’s contribution is not a grand unified theory of disentanglement. Good. The field has enough grand unified theories quietly expiring in appendix tables.

WWDC is more modest and more useful: a guided latent flow-matching method that reuses existing VAE-style representations, suppresses known conditioning signals, and exposes residual structure for inspection. Its evidence moves sensibly from a controlled Gaussian mechanism test, to coloured MNIST with withheld blue information, to a real galaxy morphology demonstration. The experiments do not prove universal discovery. They do show that the mechanism deserves attention.

For businesses, research teams, and model-assurance groups, the practical question is now sharper. The issue is not only whether a representation captures the labels you care about. It is whether those labels are hiding everything else.

A model that only tells you what you already know is not useless. It is just management consulting with matrix multiplication.

The more interesting system helps you subtract the obvious and examine the residue. That is where discovery often begins: not in seeing more, but in temporarily refusing to look at what is already too visible.

Cognaptus: Automate the Present, Incubate the Future.


  1. Brian Rogers, Micah Bowles, Chris J. Lintott, Steve Croft, Oliver N. F. King, and James Kostas Ray, “What We Don’t C: Manifold Disentanglement for Structured Discovery,” arXiv:2511.09433v2, 2026. https://arxiv.org/abs/2511.09433↩︎