Divide and Conquer: How LLMs Learn to Teach

Designing effective lessons for training online tutors is no small feat. It demands pedagogical nuance, clarity, scenario realism, and learner empathy. A recent paper by Lin et al., presented at ECTEL 2025, offers a compelling answer to this challenge: use LLMs, but don’t ask too much at once. Their research reveals that breaking the task of lesson generation into smaller, well-defined parts significantly improves quality, suggesting a new collaborative model for scalable education design.

Teaching Tutors with AI

The study targets an often-overlooked segment of educational AI: training novice tutors for online middle school math instruction. Instead of using AI to replace tutors, the authors focused on using LLMs to generate interactive, scenario-based lessons that help new human tutors improve. The lessons follow a standardized structure with five parts: title page, two realistic tutoring scenarios (with questions), an instructional section grounded in research, and a conclusion.

To generate these, the team used GPT-4o with Retrieval-Augmented Generation (RAG), drawing on pedagogical papers to anchor responses in real-world instructional strategy.

The Power of Decomposition

A key innovation in this study was its experimental framework: they tested five different prompt strategies, ranging from generating the entire lesson in one go (“one-segment”) to generating each section separately (“five-segment”). The sweet spot? Three-segment prompts, which split the task into digestible yet coherent chunks.

Segments	Avg. Quality Score
One	10.67
Two	12.00
Three	14.67
Four	14.00
Five	13.33

Interestingly, breaking down the task too much led to diminishing returns. The five-segment strategy introduced clarity issues and disjointed reasoning. Conversely, the three-segment approach allowed for logical continuity while maintaining instructional focus.

Human + AI: Better Together

Two human lesson designers evaluated the outputs, comparing them to their own carefully crafted lessons. The verdict was clear:

Time-saving: AI drastically sped up scenario drafting and MCQ creation.
Scenario richness: LLMs produced believable, diverse student-tutor interactions.
No bias detected: A crucial win for educational equity.

But there were flaws:

Feedback was too generic
Inconsistent terminology (e.g., confusing tutors with learners)
Over-citation without explanation

In short, LLMs are brilliant assistants but still need human editors to ensure clarity, cohesion, and pedagogical soundness.

Lessons for LLM Deployment

The findings offer both theoretical and practical insights. Pedagogically, the success of modular generation supports theories of cognitive load and structured scaffolding. Practically, it points to a human-in-the-loop AI design model, where educators prompt, curate, and refine LLM outputs rather than start from scratch.

Future work could benefit from multi-agent approaches, where different LLM agents specialize in tasks like instruction, feedback, and visual aids. This could further enhance quality while maintaining coherence.

In a world where quality education needs to scale fast, this study shows that LLMs can help us teach the teachers—as long as we help the LLMs learn how to teach step by step.

Cognaptus: Automate the Present, Incubate the Future