When the AI revolution hits your job, will it help or replace you?
Microsoft’s new study, analyzing 200,000 real-world conversations between users and Bing Copilot, offers the most grounded answer to date. Rather than speculating what LLMs could do, this research asks what users are actually doing with them — and how often those interactions overlap with real occupational tasks.
The key innovation? The authors distinguish between user goals (what users ask AI to help with) and AI actions (what the AI does in response). This split allows them to track when Copilot acts as a coach, co-pilot, or full-on doer of tasks — a nuance missing from many economic forecasts.
Not All Work Is Equal (To AI)
Using O*NET’s occupational database and a smart LLM classification pipeline, the researchers mapped each conversation to a set of “Intermediate Work Activities” (IWAs) and scored:
- Frequency of use (how often Copilot is involved in the IWA)
- Completion rate (how often tasks are finished)
- Impact scope (how much of the activity the AI seems able to cover)
They then aggregated these to form an AI applicability score per occupation.
Table 1: Top 5 Most AI-Affected Occupations (by applicability score)
Rank | Occupation | Completion | Scope | Notes |
---|---|---|---|---|
1 | Interpreters & Translators | 88% | 57% | High overlap with language-based tasks |
2 | Historians | 85% | 56% | Info retrieval, synthesis, writing |
3 | Passenger Attendants | 88% | 62% | Communication-heavy service role |
4 | Writers & Authors | 84% | 60% | Creative and commercial content writing |
5 | Customer Service Reps | 90% | 59% | Info provision and client interaction |
These aren’t just “office jobs.” The AI’s strength in information gathering, writing, and communication cuts across domains, affecting everyone from PR specialists to CNC tool programmers.
Meanwhile, occupations like dishwashers, cement masons, and pile driver operators scored near zero, not because their work isn’t valuable — but because it’s far removed from language-mediated tasks.
Coaching, Not Replacing (Yet)
Surprisingly, 40% of Copilot conversations showed a mismatch between the user’s goal and what the AI actually did. Often, the human is trying to perform a real-world task, while the AI merely advises, trains, or explains.
In other words, Copilot often acts like a knowledgeable coworker, not a replacement. For example:
-
User goal: Design a resume
-
AI action: Provide resume structure tips and wording
-
User goal: Troubleshoot software error
-
AI action: Explain potential causes and solutions
This distinction matters. Augmentation ≠ automation. The same activity might support more hiring (by boosting worker productivity) or less (if firms treat it as substitution).
What AI Does Best (and Worst)
Best-rated activities:
- Editing and writing content
- Researching health, law, or cultural issues
- Advising or summarizing product info
Worst-rated activities:
- Visual design and display creation
- Financial and scientific data analysis
- Interpersonal client coordination
This challenges a popular belief that LLMs are good at logic-heavy, analytical tasks. In practice, they shine more in expressive and synthesizing roles than in quantitative or design precision.
Augmentation Is Unevenly Distributed
Contrary to some hype, high-wage jobs aren’t always the most affected. When weighted by employment:
- Sales and office support roles showed the highest AI applicability scores
- Bachelor’s-level jobs had higher scores than those with less education, but not dramatically so
- Wage correlation was weak (r = 0.07 overall), meaning AI doesn’t just “go after the top”
Table 2: AI Applicability by Major Occupation Group
Major Group | AI Score | Employment (M) |
---|---|---|
Sales and Related | 0.32 | 13.3 |
Computer and Mathematical | 0.30 | 5.2 |
Office and Admin Support | 0.29 | 18.2 |
Educational Instruction & Library | 0.23 | 8.3 |
Healthcare Support | 0.05 | 7.1 |
AI doesn’t discriminate by status. It follows task structure, not title. If your job involves explaining, summarizing, or retrieving information — the LLM is already peeking over your shoulder.
Real Data vs. Theoretical Hype
Compared to Eloundou et al.’s 2024 predictions of AI labor exposure, this study finds high agreement at the broad level (r = 0.91), but divergence in specifics. For instance:
- Copilot overestimated impact for roles like Passenger Attendants
- It underestimated impact for CNC Programmers and Market Research Analysts
Why? Because real usage reveals what people actually try with AI, while theoretical ratings imagine what AI could do. The gap reflects behavioral inertia, interface design, and trust.
The Hidden Frontier: Scope
Perhaps the most novel insight is the importance of impact scope: how deeply AI touches the sub-tasks within a work activity.
For example:
- AI can “edit a document” completely, but only “research biological phenomena” minimally
While completion and satisfaction are high-level signals, scope tells us where future substitution pressure may accumulate. It’s a better predictor of where routine turns into reliance.
Final Thoughts: Don’t Just Ask What AI Can Do. Ask What People Use It For.
This paper offers a much-needed empirical pivot in the AI-labor debate. The fantasy of fully automated occupations is giving way to a messier reality:
“AI does part of your job. It helps with the rest. And it changes which parts matter most.”
Instead of fearing a sudden displacement event, firms and workers must monitor which micro-activities are being gradually offloaded to AI, and which new ones emerge in response.
Occupational strategy in the AI age isn’t about building moats around entire jobs. It’s about staying upstream of the task flow.
Cognaptus: Automate the Present, Incubate the Future