Mirror, Mirror in the Model: How MLLMs Learn from Their Own Mistakes

When multimodal large language models (MLLMs) like Gemini or Janus are asked to generate an image and then assess whether that image matches a prompt, you’d expect agreement. But a new study shows this harmony is often missing: the model’s own understanding branch disagrees with what its generation branch creates. This phenomenon—called self-contradiction—isn’t just an embarrassing quirk. As it turns out, it may be the most valuable feedback signal MLLMs have.

The Paradox: Fluent but Fragmented

A typical MLLM has two key capabilities:

Generation: producing outputs (like images) from text prompts.
Understanding: interpreting whether a given input (e.g. an image) matches a prompt.

These capabilities are often trained jointly, giving the illusion of a unified system. But as shown in Self-Contradiction as Self-Improvement (Han et al., 2025), real-world performance is far from unified. The paper introduces a “Nonunified score” to quantify the contradiction rate—how often a model deems its own output misaligned with its prompt.

In some tasks, over 40% of outputs failed this internal consistency check. Worse, most of the blame (85%) fell on generation, not understanding. The model knew it was wrong—but generated the error anyway.

The Proposal: Let the Model Teach Itself

This internal inconsistency leads to a fascinating possibility: what if the understanding branch acts as an internal supervisor? That is:

Treat the stronger understanding branch as a reward model to guide the weaker generation branch.

Using classic post-training techniques like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), the researchers re-trained the generation branch based on the understanding branch’s feedback. The result:

Metric	Change
Generation Quality (UniDet)	+5%
Unification (1 - Nonunified Score)	+10%

That’s without any human feedback or external reward model—just internal contradiction.

Discovery: Co-Improvement Without Explicit Coordination

Here’s where things get wild: improving generation not only helps generation—it also improves understanding. Even though the understanding branch isn’t explicitly retrained.

The authors formalize this using a learning dynamics framework, showing that updates to generation often propagate beneficially to the understanding branch due to shared gradients and structure. In short:

Better generations reduce false-positive interpretations. The model stops fooling itself.

This synergy echoes co-training in earlier multi-task systems but emerges naturally here, thanks to the shared architecture.

But Beware: The Illusion of Progress

There’s a dark side to this symmetry. If you fine-tune on corrupted data, both branches can degrade together—generating worse outputs and becoming more confident in them.

And yet, the Nonunified score still improves.

Why? Because both branches are agreeing—just agreeing on the wrong answer. This is the AI version of a group project gone wrong: internal harmony masking mutual misunderstanding.

The takeaway: Internal metrics like unification scores can’t tell the difference between co-improvement and co-degradation. External validation remains critical.

Solution: Curriculum Learning for Safer Self-Tuning

To avoid this collapse trap, the authors propose a Curriculum-based Online (CLO) post-training method:

Start with prompts the model handles well.
Gradually introduce harder prompts that it used to misunderstand or misgenerate.
Expand the training set dynamically as both branches improve.

This mirrors how humans learn—from easy problems to harder ones—and it works. In experiments on 3D spatial reasoning and other out-of-distribution tasks, the CLO method led to consistent improvements across:

Generation
Understanding
Unification

Compared to prior reinforcement-based approaches like T2I-R1, which depend on external reward models, this internal self-tuning mechanism is more lightweight, cost-efficient, and potentially more robust.

What This Means for AI Design

For anyone building agentic AI systems—like customer-facing bots, workflow managers, or AI copilots—this paper flips the script:

Contradictions are not bugs—they’re feedback.
The best supervisor might be inside the model.
Self-improvement is possible without labels—if you respect internal asymmetries.

But this also comes with a warning: internal agreement does not equal correctness. Co-degradation is subtle and dangerous.

At Cognaptus, we believe the future of automation lies not just in smarter outputs, but in systems that reflect on their actions—and learn from their internal missteps.

Cognaptus: Automate the Present, Incubate the Future.

The Paradox: Fluent but Fragmented#

The Proposal: Let the Model Teach Itself#

Discovery: Co-Improvement Without Explicit Coordination#

But Beware: The Illusion of Progress#

Solution: Curriculum Learning for Safer Self-Tuning#

What This Means for AI Design#