Control is a comforting word. It suggests a hand on the wheel, a dashboard of indicators, and a human being somewhere nearby who can still say no.
Machine learning makes that picture look increasingly theatrical. In AI-assisted science, researchers often do not know exactly which internal representations a model has learned, why a high-dimensional classifier separates one tumor subtype from another, or whether a model’s “useful pattern” corresponds to anything a scientist would recognize as a meaningful mechanism. The black box does not merely sit inside the laboratory. It starts to participate in deciding what the laboratory can see.
The easy reaction is pessimism: if scientists cannot interpret the model, then science has lost control. Emanuele Ratti’s paper, “Epistemic Control and the Normativity of Machine Learning-Based Science,” is useful because it refuses that convenient panic.1 The paper does not say black boxes are harmless. It says the control problem has been described too crudely.
The right question is not “Can humans fully interpret the model?” It is: which part of the scientific process needs to be controlled, by which standards, for which goal?
That small change matters. It turns the discussion from a foggy debate about opacity into a practical governance problem. Slightly less dramatic. Much more dangerous, naturally.
Epistemic control is not the same as having a human nearby
Ratti begins by adapting the idea of “meaningful human control” from debates about autonomous systems into the context of science. In weapons, credit scoring, or public-sector algorithms, meaningful control is often discussed in moral and legal terms: who is responsible, who understood the system, who could have intervened?
Science has a different version of the same problem. The issue is not only blame. It is knowledge.
Ratti calls this epistemic control: the degree to which human scientists can keep scientific tools, procedures, and outputs aligned with accepted epistemic and methodological standards. It has two conditions.
| Condition | What it asks | In plain terms |
|---|---|---|
| Tracking | Is the scientific item responsive to accepted disciplinary standards? | Does the system answer to the right criteria? |
| Tracing | Can scientists reconstruct how the item works or how the outcome was produced using discipline-specific understanding? | Can humans explain the route, not just inspect the destination? |
A PCR procedure, for example, can be tracked if its steps align with accepted biological standards. It can be traced if scientists can explain how the final result emerged from denaturation, annealing, elongation, and detection. The procedure may still fail. Control is not success. Control is the ability to scrutinize failure intelligently.
That distinction is central. In business language, epistemic control is not the KPI. It is the auditability of the KPI’s production chain.
Machine learning complicates both tracking and tracing. A deep neural network may classify images correctly while relying on features that have nothing to do with the target phenomenon. A model may be vulnerable to adversarial perturbations that humans cannot anticipate. A system may produce high performance, while the learned representation remains implicit, distributed, and unavailable to ordinary domain concepts.
This is where Paul Humphreys’ pessimistic view enters Ratti’s argument. Humphreys’ concern, as reconstructed by Ratti, is that ML introduces a non-human epistemic perspective into science: a way of representing phenomena that is constructed by the machine and cannot always be translated back into the “grammar” of human scientific understanding. If that translation fails, scientists lose the ability to track and trace. They remain in the lab, perhaps even wearing the white coat, but the epistemic center has moved elsewhere.
This worry is serious. It is just not the whole story.
Opacity breaks some goals, not all goals
The first mechanism in Ratti’s argument is goal-dependence. ML opacity does not damage epistemic control equally across all scientific aims.
Consider cancer genomics. If the scientific goal is mechanistic explanation, then an opaque ML model is a poor citizen. Mechanistic explanation requires identifying relevant entities, activities, and their organization. A high-dimensional model may produce useful outputs without offering anything that can be translated into a clean account of causal organization. In this case, tracking fails because scientists cannot determine whether the model’s internal relations correspond to biologically meaningful relations.
But mechanistic explanation is not the only scientific goal.
If the goal is molecular stratification or classification, the control question changes. Scientists do not need to convert the model into a mechanistic story. They need to evaluate whether its classifications are stable, clinically coherent, and aligned with accepted laboratory or clinical measures. Tracking can still be achieved externally, through validation against domain-relevant standards.
So the same opacity can be fatal for one purpose and manageable for another.
This is the paper’s first useful correction to black-box fatalism. It does not deny opacity. It denies that opacity has one universal meaning.
| Scientific goal | What opacity damages | What may still be controllable |
|---|---|---|
| Mechanistic explanation | The ability to translate learned representations into causal organization | Often limited, especially in high-dimensional models |
| Classification | Internal interpretability may remain weak | Output validation against accepted clinical or laboratory standards |
| Prediction | Explanatory content may be secondary | Generalization, calibration, robustness, and performance standards |
| Discovery support | The model may suggest patterns without explaining them | Follow-up testing, domain review, and pipeline-level scrutiny |
This is not a license to deploy black boxes carelessly. It is a demand to stop asking one generic question—“is it interpretable?”—and start asking the operational question: interpretable for what?
In AI-enabled R&D, that difference is not academic. A model used to generate candidate molecular subtypes should be governed differently from a model used to explain disease mechanisms. A hospital triage model, a drug-target prediction model, and a laboratory discovery assistant may all be “opaque,” but the relevant standards for control differ. One model must support clinical safety, another experimental prioritization, another biological explanation. Same word, different governance burden.
The annoying part is that procurement checklists often compress all of this into one checkbox called “explainability.” Elegant. Also useless.
The black box is wrapped in a very human pipeline
Ratti’s second mechanism is pipeline-dependence. Humphreys’ pessimistic view treats ML’s internal representations as if they were the whole epistemic perspective of ML-based science. Ratti argues that this is too narrow.
A machine learning system is not only its learned representation. It is also a pipeline: problem formulation, data acquisition, data preparation, model development, validation, interpretation, deployment, monitoring, and impact assessment. Ratti’s Figure 1 is not a result in the experimental sense; it is a conceptual implementation map. Its purpose is to show that the opaque representation occupies only one part of the wider scientific process.
That pipeline is full of human choices.
A research team decides what problem should be translated into an ML task. It decides what data are acceptable. It decides what counts as representative. It chooses performance metrics. It decides whether false positives are more costly than false negatives. It chooses how to validate, when to deploy, and when to stop trusting the system.
These decisions are not merely technical. They encode cognitive values.
Ratti emphasizes two kinds of deliberation:
| Deliberation type | What it means | Example |
|---|---|---|
| Value specification | Defining what a valued property actually means in a concrete system | “Representativeness” may be specified by demographics, geography, disease subtype, scanner type, or data quality |
| Value choice | Deciding which value dominates when values conflict | A team may prioritize data quality over representativeness, or recall over precision, depending on the consequences of error |
This is where the paper becomes directly relevant to business governance. Many organizations treat model governance as if the main question were whether the model can explain its output. Ratti’s argument points elsewhere: the most governable parts of the system may be upstream and downstream of the model.
A model’s internal embedding may be hard to trace. But the organization can still govern the task definition, dataset inclusion rules, metric choice, validation procedure, deployment threshold, escalation policy, and monitoring loop.
That does not solve opacity. It surrounds opacity with accountable structure.
For business leaders, especially in healthcare AI, scientific automation, and R&D platforms, the practical implication is simple: do not locate control only inside the model. Locate it across the pipeline.
A useful governance review should ask:
| Pipeline layer | Control question | Business consequence |
|---|---|---|
| Problem formulation | What scientific or operational goal is being optimized? | Prevents goal drift disguised as model performance |
| Data acquisition | Which populations, instruments, or contexts are included? | Reveals hidden coverage and generalization risks |
| Data preparation | Which signals are cleaned, excluded, transformed, or privileged? | Exposes where domain assumptions enter the system |
| Model development | Which architecture and training choices are justified? | Connects technical design to task requirements |
| Metrics | What does “good performance” actually mean? | Forces explicit handling of precision, recall, cost, safety, and resources |
| Validation | Which external standards test the system? | Distinguishes internal benchmark success from domain relevance |
| Deployment | Under what conditions is the system allowed to act? | Converts epistemic uncertainty into operating rules |
| Monitoring | What would count as loss of control after launch? | Keeps control from ending at deployment, where many governance documents go to die |
The quiet importance of Ratti’s argument is that it gives managers a better control surface. If the model cannot be fully opened, then governance must become more explicit around the choices that shape the model’s role.
That is not a consolation prize. In many real systems, it is the main prize.
Metrics are not neutral; they are compressed judgments
The most business-friendly part of the paper is also the easiest to underestimate: metric choice.
Ratti discusses performance measurement through the example of precision, recall, and the $F_\beta$ measure. Precision reduces false positives; recall reduces false negatives. The balance between them is not a purely mathematical issue. It depends on the consequences of being wrong.
A tumor classifier that misses a real tumor creates one kind of harm. A classifier that incorrectly flags a non-tumor as tumor creates another. The correct tradeoff cannot be read from the formula alone. It requires a judgment about risk, domain priorities, and acceptable error.
This is where “technical” decisions become epistemic governance.
The same logic applies beyond healthcare. In fraud detection, maximizing recall may overload investigation teams with false alarms. In credit risk, maximizing precision may exclude customers in ways that look efficient in the model and corrosive in the market. In scientific discovery, optimizing predictive power may deprioritize mechanistic understanding. The dashboard looks objective because the number is precise. The judgment was made earlier, when someone decided what the number should value.
Ratti’s language of value specification and value choice helps make that hidden layer visible.
A company does not merely choose a model. It chooses a theory of error.
Machine learning does not just obey values; it pushes back
So far, the argument sounds reassuring: humans still shape ML systems through pipeline decisions and cognitive values. The black box is not sovereign. Good.
Then Ratti adds the complication that makes the paper worth reading.
Epistemic control is not one-way. Humans shape ML systems, but ML systems also reshape human scientific aims. This is what Ratti calls the normativity of ML-based science.
Normativity here does not mean the system has moral agency. The model is not sitting in the server rack contemplating virtue. The point is more structural: technologies impose conditions for their successful use. To practice ML-based science effectively, researchers must accept certain norms about what counts as a valuable scientific output.
In cancer genomics, if ML systems are strong at classification and weak at mechanistic explanation, then the field may gradually prioritize classification and prediction over causal understanding. Not because anyone held a meeting and voted against explanation. Because the tool makes some goals easier, scalable, fundable, publishable, and measurable.
This is the two-way mechanism:
| Direction | Mechanism | Result |
|---|---|---|
| Human → ML | Scientists specify and choose values through problem framing, data, metrics, and validation | ML systems reflect human cognitive and methodological commitments |
| ML → Human | ML systems make some aims easier to pursue than others | Scientific communities may shift standards toward prediction, classification, and scale |
This is not a conspiracy. It is infrastructure doing what infrastructure does: making some paths smooth and others expensive.
For business, the same pattern is everywhere. Once a company installs an AI system optimized for ticket resolution speed, customer support starts seeing speed as the dominant quality. Once a sales analytics platform optimizes conversion probability, the sales organization begins to value leads that are legible to the model. Once a research platform ranks projects by predicted success, the portfolio slowly moves toward what the model can recognize as promising.
The organization remains “in control” in a formal sense. But the system has changed the cost of pursuing alternative values. That is how control becomes partial without ever disappearing.
The real risk is methodological drift
Ratti’s conclusion is disciplined: ML-based science is not post-human, but human control is partial and historically contingent. Scientists can shape ML systems through value-laden design choices, but ML systems also constrain which values become practical.
The risk, then, is not simply that black boxes produce wrong answers. Wrong answers can sometimes be detected. The deeper risk is methodological drift: the gradual narrowing of scientific aims around what ML systems can do well.
In a company, the analogue is operational drift. A tool introduced for assistance becomes a standard-setting device. A metric introduced for evaluation becomes a target. A model introduced to speed up research begins to redefine what “good research” means.
This is why generic AI governance is too weak. Governance frameworks often ask whether the model is accurate, fair, explainable, secure, and compliant. Those are necessary questions. But Ratti’s argument suggests another category: epistemic alignment.
Does the AI system preserve the organization’s intended knowledge goals, or does it quietly replace them with goals that are easier to optimize?
That question applies especially to AI-enabled R&D, healthcare AI, automated scientific discovery, and enterprise decision systems. It is not enough to ask whether the system works. The system may work beautifully while teaching the organization to care about the wrong thing.
What the paper directly supports, and what business readers should infer carefully
This is a conceptual philosophy paper, not an empirical benchmark. It does not report experiments, ablations, robustness tests, or quantitative performance results. There are no tables showing that one governance mechanism outperforms another. The evidence is argumentative: conceptual reconstruction, examples from ML-based biology and medicine, and analysis of how scientific standards interact with ML pipelines.
That boundary matters.
| Claim type | What the paper supports | What it does not prove |
|---|---|---|
| Direct contribution | Epistemic control can be defined through tracking and tracing | It does not provide an operational scoring system for control |
| Direct contribution | ML opacity affects control differently depending on scientific goals | It does not measure how often this happens in practice |
| Direct contribution | ML systems are shaped by human cognitive values across the pipeline | It does not test a specific governance framework |
| Direct contribution | ML systems can normatively reshape scientific aims | It does not quantify the business cost of such drift |
| Cognaptus inference | AI governance should focus on goals, standards, metrics, and pipeline choices, not only explainability | The ROI of such governance must still be established case by case |
This distinction keeps the business interpretation honest. The paper does not prove that better epistemic governance will reduce litigation, improve clinical outcomes, or increase R&D productivity. It gives a framework for seeing where control actually lives. The operational value comes when that framework is converted into review processes, documentation standards, validation protocols, and escalation rules.
In other words, Ratti gives us the map. The enterprise still has to build the roads. Annoying, but traditional.
A practical governance lens for AI-based science and R&D
For organizations using AI in scientific or knowledge-intensive work, the paper suggests a practical shift: move from model-centered explainability to pipeline-centered epistemic control.
A minimal governance template would ask five questions.
| Question | Why it matters |
|---|---|
| What is the system’s epistemic goal? | Explanation, prediction, classification, discovery support, and operational triage require different control standards |
| What standards should the system track? | Control depends on accepted domain criteria, not generic AI quality language |
| Which parts of the system can be traced? | Some internal representations may remain opaque, but pipeline decisions can still be documented |
| Which values were specified and chosen? | Representativeness, accuracy, parsimony, robustness, cost, and safety often conflict |
| What scientific or operational aims might the system gradually privilege? | Prevents ML normativity from becoming invisible strategy |
This is not paperwork for paperwork’s sake. It is a way to prevent AI adoption from becoming an unacknowledged redefinition of what the organization knows, values, and rewards.
The key move is to stop treating opacity as a single defect. Opacity is a condition that interacts with goals. If the goal is mechanistic explanation, opacity may be devastating. If the goal is classification, opacity may be manageable under strong external validation. If the goal is long-term research direction, opacity may matter less than the system’s tendency to shift incentives toward measurable prediction.
Different goal, different control surface.
The black box is not in charge, but neither are you by default
The most useful lesson from Ratti’s paper is not that humans still matter. Of course they do. Humans define the task, gather the data, choose the metrics, approve the deployment, and write the governance policy. We are very good at leaving fingerprints on systems and then pretending the system acted alone.
The sharper lesson is that control must be actively maintained at the level of goals and standards. Without that, ML systems do not need to rebel. They only need to work as designed.
A model optimized for prediction will privilege prediction. A pipeline built around available data will privilege what is measurable. A validation regime based on benchmark performance will privilege benchmark performance. Over time, these choices become institutional common sense.
That is the real post-black-box problem. The danger is not an alien intelligence replacing scientific judgment. The danger is a very human pipeline, full of reasonable choices, gradually allowing machine-friendly values to harden into scientific or organizational norms.
So who is really in charge?
Not the model alone. Not the scientist alone. Not the manager who approved the budget and moved on.
Control lives in the negotiated space between goals, standards, metrics, pipelines, and the technical constraints of the system. That space can be governed. But only if we stop mistaking “human-in-the-loop” for human control.
The loop is easy. The control is the work.
Cognaptus: Automate the Present, Incubate the Future.
-
Emanuele Ratti, “Epistemic Control and the Normativity of Machine Learning-Based Science,” arXiv:2601.11202, PDF version accessed because the arXiv HTML page was unavailable: https://arxiv.org/pdf/2601.11202. ↩︎