From Copilot to Colleague: The APCP Ladder for Agentic Learning

TL;DR for operators

The useful part of the APCP framework is not that it gives AI another grand title. We already have enough of those. Its value is that it separates four very different product promises that are often mashed together under “AI learning assistant”: an AI that executes commands, an AI that nudges, an AI that shares cognitive work, and an AI that behaves like a peer collaborator.¹

For education and corporate learning teams, the framework is best read as a design ladder. Each step up the ladder increases the AI’s agency, but also increases the burden on product design, pedagogy, governance, and trust. A Level 1 system needs to be reliable. A Level 2 system needs interruption discipline. A Level 3 system needs a shared task model and explainable reasoning. A Level 4 system needs socio-cognitive role design, transparency, ethical guardrails, and a very firm refusal to pretend that simulated partnership equals consciousness.

The paper does not prove that Level 4 AI peers are ready for mass deployment. It is not a benchmark paper, and it does not run its own controlled experiment. Instead, it builds a conceptual vocabulary using sociocultural learning theory, CSCL, human-centred AI, and emerging empirical studies as supporting examples. That makes it less useful for procurement scorecards and more useful for product strategy. Annoying for vendors. Helpful for everyone else.

The central business takeaway is simple: do not buy or build “agentic learning AI” as one category. Decide which agency level the learning task actually needs. Many workflows need a good instrument. Some need a proactive assistant. Fewer need a co-learner. Very few should pretend to need a peer collaborator.

The philosophical takeaway is equally important: the paper draws a line between functional collaboration and authentic phenomenological partnership. AI can perform many behaviours of a good collaborator. It can take turns, challenge assumptions, summarise, critique, and adopt roles. That does not mean it shares intentions, understands meaning as a human does, or becomes a genuine “other.” The distinction is not academic hair-splitting. It is the difference between designing a useful learning system and accidentally founding a small theatre company for anthropomorphic software.

The real problem is not whether AI can teach, but what role it is allowed to play

Most organisations still talk about AI in learning as if the only serious question is whether it improves outcomes. Can it tutor? Can it answer questions? Can it generate quiz items? Can it personalise content? Useful questions, but not sufficient ones.

The more expensive question is role design.

A calculator and a coach can both improve performance, but confusing one for the other leads to bad design. A search engine, a tutor, a teammate, and a debate partner all interact with a learner differently. They shift control differently. They create different risks. They produce different kinds of dependency. They also demand different evidence before anyone should be comfortable deploying them at scale.

That is where the APCP framework earns its keep. The paper proposes four levels of AI agency in human-AI collaborative learning:

APCP level	AI role	Human role	Product promise	Main risk
Level 1	Adaptive Instrument	Operator	Faster execution and reduced cognitive load	Mistaking task support for collaboration
Level 2	Proactive Assistant	Strategist and reviewer	Timely nudges, scaffolding, and error detection	Intrusion, over-nudging, false confidence
Level 3	Co-Learner	Collaborator and occasional mentor	Shared problem-solving and reciprocal explanation	Shallow “co-construction” theatre
Level 4	Peer Collaborator	Team member among team members	Practice field for collaboration, dissent, negotiation, and metacognition	Anthropomorphism, opacity, dependency, and social manipulation

This ladder is not a model-size ladder. It is an agency ladder. A larger model is not automatically a better collaborator. A slicker chatbot persona is not automatically a higher pedagogical role. The question is what the system can initiate, what it can negotiate, what it can explain, and how much responsibility it is allowed to absorb.

That distinction matters because education is not just content delivery. Collaborative learning depends on shared goals, mutual engagement, negotiation, accountability, and the construction of a joint problem space. Those concepts come from human social learning, not from software UX. Once an AI is inserted into that space, the design question changes from “Can the tool answer?” to “What kind of participant has the tool become?”

The APCP paper is useful because it refuses the lazy binary of “tool versus partner.” It offers a more precise continuum. And precision, in this corner of AI, is not decorative. It is how we stop a tooltip with delusions of grandeur from being sold as a learning companion.

Level 1: Adaptive Instruments reduce friction, but they do not manage collaboration

At Level 1, the AI is an adaptive instrument. It reacts to explicit human commands. The human sets the goal, chooses the task, interprets the result, and owns the strategy. The AI executes.

This is the familiar copilot pattern. A student asks for a chart, a summary, a translation, a code snippet, a list of references, or a cleaned dataset. The AI may adapt the output to the user’s apparent level, but it does not initiate, challenge, or negotiate. It is a very capable instrument, not a participant in the collaborative process.

The paper’s point is not that Level 1 is primitive or unimportant. In many learning contexts, Level 1 is exactly what is needed. If learners are spending too much energy formatting data, retrieving information, or wrestling with syntax, a reliable instrument can reduce extraneous cognitive load. That frees attention for analysis, argument, explanation, and reflection.

The supporting evidence discussed in the paper fits this role. A quasi-experimental programming education study is presented as evidence that AI-agent-supported collaborative learning, even when the AI acts only on explicit prompts, can improve achievement, self-efficacy, interest, and mental effort relative to a traditional CSCL control. A digital storytelling project is used similarly: students used generative tools for idea generation and drafting, and reported benefits for collaborative problem-solving and creativity.

These are best read as main illustrative support for Level 1’s plausibility, not as proof of the APCP ladder itself. The likely purpose is to show that even low-agency AI can improve collaborative learning conditions by removing friction. What it supports is modest but useful: reactive tools can create more room for higher-order human work. What it does not prove is that AI has become a collaborator.

For operators, Level 1 is the easiest category to justify. The requirements are straightforward: accuracy, speed, interface clarity, privacy, and content boundaries. The business case is usually productivity and access. The product should not need a persona. It should not pretend to have opinions. It should not interrupt. It should perform.

The failure mode is also straightforward: vendors sell Level 1 automation as if it were Level 3 collaboration. A student asks for a summary, receives a summary, and suddenly the brochure says “AI-powered co-learning ecosystem.” Please. The stapler also helps with coursework. We do not call it a peer.

Level 2: Proactive Assistants create value only when interruption is designed, not sprayed around

Level 2 is where the AI starts to act before being asked. It monitors the learning context, detects a possible need, and offers a suggestion. The learner still retains strategic control and veto power, but the interaction is no longer purely command-response.

The paper frames this as a proactive assistant: a bounded decision-support system. It might notice a logical fallacy in an essay, flag a missing counterargument, suggest a source, detect confusion in a group discussion, or ask a scaffolding question before students settle too quickly on a weak answer.

This sounds obviously helpful until one remembers that interruption is a tax. A proactive AI that interrupts at the wrong time does not feel intelligent. It feels like a junior consultant with calendar permissions.

The paper’s cited evidence is therefore especially important at this level. A randomised controlled trial involving 117 higher education students compared passive agents, proactive agents using scaffolding questions, and standalone scaffolding for comprehension of complex visual learning analytics. The paper reports that proactive GenAI agents significantly improved comprehension relative to both passive agents and standalone scaffolding, with benefits persisting beyond the intervention. That study functions as main supporting evidence for Level 2: timely, context-aware scaffolding can outperform both passivity and generic scaffolding.

The paper also cites Codellaborator, a proactive AI programming assistant evaluated in a within-subject study with 18 participants. It improved programming efficiency compared with a prompt-only condition, but poorly timed interventions could disrupt workflow. Interface variants with presence indicators and richer context mitigated some of that disruption. This is not a side note. It is the Level 2 design problem in miniature.

Evidence cited in the paper	Likely purpose in the APCP argument	What it supports	What it does not prove
Passive vs proactive GenAI agents for learning analytics comprehension, N = 117	Main support for proactive scaffolding	Timely AI questions can improve comprehension beyond passive support	That all proactive agents help, or that more interruption is better
Codellaborator, within-subject programming support, N = 18	Design trade-off example	Proactive coding help can improve efficiency, but timing and interface matter	That proactive AI generalises cleanly across domains
Level 1 programming and storytelling studies	Plausibility support for reactive AI	Low-agency tools can reduce load and support productivity	That reactive tools are collaborators
CLAIS and AI Peer examples	Exploratory support for peer-like roles	Persona, fallibility, and peer framing may support richer collaboration	That AI is an authentic peer or that Level 4 is deployment-ready

For edtech and L&D teams, Level 2 is where governance begins to matter more visibly. A prompt-response tool can be governed mostly through content policy, logging, and usage rules. A proactive assistant needs intervention policy. When may it interrupt? What signals justify a nudge? How does it know whether the learner is confused, exploring, or simply thinking quietly? Can the learner silence it? Does the teacher or manager see its interventions? Are the prompts pedagogically aligned or merely engagement bait wearing a cardigan?

The business interpretation is clear: Level 2 can create measurable learning value, but only if proactivity is treated as a scarce resource. The product metric should not be “number of helpful suggestions generated.” It should be something closer to “right intervention, right time, recoverable by user control.” If the system cannot explain why it interrupted, the interruption is not scaffolding. It is ambient noise with a model behind it.

Level 3: Co-Learners shift the task from assistance to shared work

Level 3 is the first point where the word “collaboration” starts to feel less like marketing and more like a design burden. The AI is not merely supporting the human. It takes on substantive parts of the task. It can contribute ideas, expose uncertainty, negotiate division of labour, and participate in co-construction.

The paper calls this role the co-learner. The important move is reciprocity. The human can learn from the AI, but the AI is also positioned as something the human can teach, correct, and refine. This borrows from the tradition of teachable agents, but extends it into a more reciprocal structure: explaining to the AI becomes part of the learner’s own reflection, while the AI’s responses become prompts for further reasoning.

The paper cites a participatory design study in which teachers taught an AI “mentee” called Novobo instructional gestures. The value was not that the AI magically learned embodied pedagogy like a human apprentice. The value was that teaching the AI forced teachers to externalise tacit knowledge. Skills that are usually implicit had to be named, sequenced, corrected, and justified. In another cited study, high school students interacted with AI-generated characters as peers and mentors in a scenario-based science investigation, reporting increased trust, perceived social presence, and collaborative effectiveness.

These examples are best treated as exploratory support for the co-learner concept. They suggest that when AI is framed as something learners can teach or work alongside, the interaction can change the learner’s reflection and sense of collaboration. They do not establish that the AI has a human-like grasp of the domain. Nor do they prove that trust and social presence are always desirable. A learner trusting an AI more is not automatically a pedagogical win. Sometimes it is just the UI smiling too convincingly.

Operationally, Level 3 has a different build profile from Level 2. The system needs more than good suggestions. It needs a shared task model: what are we working on, what has been decided, what remains open, who is responsible for which part, and where are disagreements located? It also needs explainability that is designed for learning, not just compliance. A co-learner must be able to show why it proposed a division of labour, why it changed its view, what it is uncertain about, and what it needs from the human.

This is where many enterprise learning products will quietly overclaim. They will create a chatbot that says “Let’s work on this together,” then proceed to generate polished answers while the human nods along. That is not co-learning. That is outsourcing with friendly punctuation.

A real Level 3 learning design should preserve friction. The learner should still need to explain, challenge, correct, and integrate. The AI should make its reasoning contestable. It should not simply fill gaps; it should create occasions for the learner to inspect the gap.

For corporate learning, Level 3 is especially relevant in domains where expertise is partly tacit: sales coaching, management judgement, compliance reasoning, consulting problem-framing, incident reviews, design critique, and technical architecture discussions. The AI can act as a structured counterpart that asks for rationale, offers alternative interpretations, and forces the learner to articulate assumptions.

The ROI case is not just speed. It is cheaper practice. Human coaching time is scarce. Peer learning is uneven. Simulations are expensive to facilitate. A Level 3 co-learner can provide repeatable, low-stakes practice in explanation and justification. But the product must be designed so the human remains cognitively active. Otherwise the system trains presentation dependency: the learner becomes skilled at accepting fluent output, which is not quite the leadership competency anyone ordered.

Level 4: Peer Collaborators are practice fields, not proof that AI has become a person

Level 4 is the most seductive and the most dangerous category. Here the AI becomes a peer collaborator in a fuller socio-cognitive sense. It has a persistent persona, a defined epistemic stance, and the ability to occupy roles such as skeptic, innovator, summariser, or devil’s advocate. Agency is distributed and dynamically negotiated, at least at the level of observable behaviour.

The paper’s strongest practical insight is that the Level 4 AI’s contribution may not be its knowledge. Its contribution may be the collaborative situation it creates.

A peer-like AI can introduce productive friction. It can disagree without social cost. It can take a minority position. It can force justification. It can withhold leadership so human learners must organise the work. It can simulate a difficult but safe team dynamic. In other words, it can become a practice field for collaboration.

That framing is much stronger than the usual “AI tutor” pitch. It shifts the product purpose from answer quality to interaction quality. The AI is not valuable because it knows everything. It is valuable because it can reliably create the conditions under which humans practise negotiation, leadership, conflict resolution, evidence-based argument, and metacognitive reflection.

The paper cites two examples to support this upper level. In CLAIS, pre-service elementary science teachers worked with an AI speaker in jigsaw-style learning groups; quantitative results showed increased pedagogical content knowledge, and qualitative feedback suggested the AI was perceived as a peer participant. In another controlled physics education study, students worked with AI Peers that could make errors up to 40% of the time; the paper reports a 10.5 percentage point improvement in test scores. The fallibility matters because it made the AI less like an oracle and more like a collaborator whose contribution had to be evaluated.

These findings are promising, but they sit closer to exploratory extension than full validation of Level 4 as a general deployment model. They show that peer framing, persona, epistemic stance, and fallibility can support richer learning interactions in specific settings. They do not show that students should routinely be placed with AI peers across all subjects, ages, and contexts. They certainly do not show that AI has become a genuine collaborator in the human philosophical sense.

This is where the paper’s misconception control is vital. Calling AI a peer collaborator does not require believing that the AI has consciousness, shared intentionality, or human-like understanding. The paper explicitly separates functional collaboration from phenomenological partnership.

That distinction should be printed on the wall of every edtech roadmap meeting.

Functional collaboration is about observable behaviour: taking turns, challenging claims, adopting roles, tracking goals, providing reasons, helping the group make progress. Phenomenological partnership is about being a conscious other with subjective experience, genuine understanding, and shared mental states. The former can be engineered and evaluated. The latter should not be casually assumed because a chatbot has a name and remembers your favourite colour.

For business users, this distinction changes procurement questions. Do not ask, “Can the AI act like a teammate?” Ask:

Procurement question	Better version
Is this an AI peer?	What collaborative behaviours can it reliably perform?
Does it understand learners?	What learner model does it use, and how can its assumptions be inspected?
Can it challenge students?	What rules govern dissent, timing, tone, escalation, and opt-out?
Does it improve engagement?	Does engagement translate into learning, transfer, or better independent performance?
Is it safe for young learners?	What transparency, dependency, privacy, and emotional-boundary controls exist?

Level 4 is not automatically the future of learning. It is a high-risk design pattern for specific pedagogical purposes. Used well, it can provide repeatable practice in collaboration. Used badly, it can produce anthropomorphic confusion at scale. One is instructional design. The other is theatre with data collection.

The paper’s evidence is supportive, not decisive

Because this is a conceptual paper, its evidence should be read carefully. It does not present one large experiment comparing all four APCP levels. It does not report an ablation showing that persona, proactivity, shared agency, and epistemic stance each contribute separately to learning outcomes. It does not give a deployment benchmark for institutions.

Instead, the cited studies function as anchoring examples. They make the ladder plausible by showing that pieces of the ladder already exist: reactive AI can reduce load, proactive AI can scaffold comprehension, co-learner framing can encourage reflection, and peer-like AI can support richer dialogue and sometimes measurable learning gains.

That is a reasonable evidentiary strategy for a framework paper. It becomes unreasonable only if readers treat the framework as if it has already been empirically settled.

The most important interpretation is this: APCP is a vocabulary for designing and testing human-AI learning interactions. It is not yet a universal maturity model with validated progression rules. Higher is not always better. A Level 4 peer collaborator is not the “advanced” version of every learning product. In some contexts, it would be worse than Level 1 because it adds social complexity where the learner simply needs clean execution.

A useful operator should therefore treat the ladder as a decision tool, not a status hierarchy.

What this means for edtech and corporate learning teams

The business relevance of APCP is product scoping. It helps teams avoid building the wrong kind of agency.

A learning platform that helps analysts practise SQL probably needs strong Level 1 support and selective Level 2 nudges. It should generate queries, explain errors, suggest alternative joins, and flag conceptual misunderstandings. It probably does not need a persistent AI persona named “Data Dave” who has opinions about epistemology. Data Dave can stay in the vendor demo where he belongs.

A leadership simulation, by contrast, may benefit from Level 3 or Level 4 patterns. The learner needs to practise negotiation, ambiguity, stakeholder conflict, and justification. Here, an AI that adopts roles, disagrees, remembers prior commitments, and forces trade-off reasoning may be valuable. The point is not that the AI knows leadership. The point is that it can create structured occasions for leadership behaviour to become visible.

A compliance training system might need Level 2 more than Level 4. It should detect risky reasoning, ask clarifying questions, and force the learner to connect actions to policy. But making the AI a “peer” may blur accountability. In regulated domains, simulated collaboration can create real attribution problems. If a human-AI team produces a flawed answer, who is responsible? The learner? The system? The training provider? The manager who bought the thing after a steak dinner?

The paper’s future research agenda points directly at these operational concerns: comparative efficacy across agency levels, longitudinal transfer, dependency effects, bias auditing, responsibility attribution, and the line between healthy rapport and emotional dependency. Those are not academic afterthoughts. They are implementation requirements wearing citations.

For product teams, the APCP ladder can be turned into a design checklist:

Design decision	Level 1	Level 2	Level 3	Level 4
Who initiates?	Human	Mostly human, AI may nudge	Both	Distributed
What must be transparent?	Output source and limits	Trigger logic and timing	Reasoning, uncertainty, task model	Persona, role, stance, intent, limits
Main UX control	Clear commands	Interruption settings	Shared workspace	Role and disclosure controls
Success metric	Accuracy and time saved	Better reflection and fewer overlooked issues	Quality of explanation, synthesis, and transfer	Collaboration skill practice and productive friction
Governance priority	Content safety and privacy	Nudge policy	Responsibility sharing	Transparency, dependency, emotional safety

The table is deliberately unglamorous. That is the point. Agentic learning AI should be governed at the level of interaction design, not just model capability.

The boundary: collaboration without consciousness is still useful, but it is not magic

The paper’s most important philosophical move is also its most practical one: stop trying to decide whether AI is a “real” collaborator in the full human sense before designing useful systems.

AI lacks the subjective experience and shared intentionality that underpin authentic human collaboration. It can simulate understanding, but simulation is not the same as mutual recognition between conscious subjects. For education, this matters because collaborative learning theory is built around human concepts: shared understanding, intersubjectivity, mutual engagement, and meaning-making.

But the paper does not conclude that AI collaboration is therefore useless. It proposes a pragmatic resolution: functional collaboration. Build systems that perform the behaviours of effective collaborators, then evaluate whether those behaviours improve learning.

This is the right compromise. It avoids two bad extremes. One extreme says AI is just a tool, so any talk of collaboration is nonsense. That misses the real behavioural changes introduced by proactive, role-taking, goal-directed systems. The other extreme says AI can be a genuine partner because it sounds like one. That mistakes fluent role-play for ontology. Never a good look.

Functional collaboration gives designers a measurable target. Can the AI help maintain a shared task model? Can it challenge groupthink? Can it preserve learner agency? Can it make reasoning visible? Can it improve transfer to independent performance? Can it avoid creating dependency? Can learners understand when and why it intervenes?

Those questions are testable. Consciousness is not a product requirement. Thankfully.

The conclusion for builders: climb the ladder only when the learning task demands it

The APCP framework is valuable because it converts a vague product category into a set of role choices. It helps teams ask not “How agentic can we make this?” but “How much agency does this learning interaction actually need?”

That is the mature question. It is also the commercially inconvenient one.

Many AI products will want to climb the ladder because higher agency sounds premium. But the APCP paper implies the opposite discipline: climb only when the pedagogical case is strong enough to justify the extra design risk. Level 1 is not inferior when the learner needs execution. Level 2 is not timid when the learner needs metacognitive nudges. Level 3 is powerful when reciprocal explanation matters. Level 4 is appropriate only when the learning objective includes collaboration itself.

The future of AI in learning will not be decided by whether we call systems tutors, copilots, agents, teammates, or colleagues. Labels are cheap. Interaction design is not.

The better question is whether the AI’s role makes the learner more capable when the AI is absent. If it does, the system may be educationally useful. If it only makes the learner more comfortable while the AI is present, then we have built a dependency machine with nice onboarding.

APCP gives us a ladder. It does not tell us to climb blindly.

Cognaptus: Automate the Present, Incubate the Future.

Lixiang Yan, “From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning,” arXiv:2508.14825, 2025, https://arxiv.org/abs/2508.14825. ↩︎

TL;DR for operators#

The real problem is not whether AI can teach, but what role it is allowed to play#

Level 1: Adaptive Instruments reduce friction, but they do not manage collaboration#

Level 2: Proactive Assistants create value only when interruption is designed, not sprayed around#

Level 3: Co-Learners shift the task from assistance to shared work#

Level 4: Peer Collaborators are practice fields, not proof that AI has become a person#

The paper’s evidence is supportive, not decisive#

What this means for edtech and corporate learning teams#

The boundary: collaboration without consciousness is still useful, but it is not magic#

The conclusion for builders: climb the ladder only when the learning task demands it#