Frame Before You Aim: Why AI Needs the Right Reference Point

Business AI has acquired a slightly dangerous reflex: when a system underperforms, reach for a stronger model, a faster pipeline, or a more elaborate scoring function. Very enterprise. Very expensive. Occasionally useful.

The more interesting failure mode is quieter. A system may have enough intelligence, enough data, and enough compute, yet still be solving the wrong version of the problem because it inherited the wrong reference frame. It reads a wearable signal as if it were clinical instrumentation. It schedules network traffic as if packets only matter after they announce themselves. It ranks alternatives as if the best and worst items in the current dataset were the same thing as business aspiration and business refusal.

Three recent papers, from three very different technical neighbourhoods, make the same point from different directions: CogAdapt, which adapts clinical ECG foundation models for wearable cognitive-load assessment; AUGUSTE, which embeds online learning inside a 5G uplink scheduler; and TOPSIS-RAD, which modifies TOPSIS rankings by replacing dataset-derived extremes with decision-maker-defined desired and veto levels.¹ ² ³

The shared lesson is not “AI is coming for healthcare, telecoms, and procurement.” We have suffered enough keynote slides. The sharper lesson is this:

Useful AI needs an explicit adaptation layer between inherited intelligence and operational reality.

That layer may be learned, as in CogAdapt. It may be online and control-theoretic, as in AUGUSTE. It may be declared by a human decision maker, as in TOPSIS-RAD. But the job is the same: correct the reference frame before automation starts pretending to be wisdom.

The Reference Frame Problem

A reference frame is the implicit “world” in which a system’s inputs, outputs, and success conditions make sense.

For a model, it may be the sensor geometry and task distribution used during pre-training. For a network protocol, it may be the assumption that uplink data should wait for a scheduling request. For a ranking method, it may be the assumption that the observed best and worst alternatives define what “ideal” and “anti-ideal” mean.

Those assumptions are not always wrong. They are just often smuggled into deployment as if they were facts about the world rather than design choices. That is where systems become brittle.

The three papers sit on different layers of the enterprise stack:

Layer	Paper’s domain	Inherited reference frame	Adaptation move
Perception	Wearable ECG cognitive-load sensing	Clinical 12-lead ECG foundation model trained for cardiac diagnosis	Learn a 3-to-12 lead adapter and progressively fine-tune the encoder
Control	5G URLLC uplink scheduling	Reactive scheduling request procedure or rigid configured grants	Learn traffic timing online and proactively schedule under explicit latency-overhead knobs
Decision	Multi-criteria ranking	Dataset-derived positive and negative ideals	Replace extremes with desired and veto performance levels

The temptation is to treat these as separate papers. One is digital health. One is network automation. One is multi-criteria decision analysis. That would be a tidy literature-review mistake. Their business relevance comes from the common pattern: performance improves when the system stops treating inherited defaults as the operating truth.

Adaptation at the Perception Layer: The Wearable Is Not the Clinic

CogAdapt starts with a common problem in applied AI: a large foundation model exists, but the deployment environment refuses to resemble the world that made the foundation model successful. How rude of reality.

The source model is ECG-FM, a clinical ECG foundation model trained on large-scale 12-lead clinical recordings. The target task is cognitive-load assessment from wearable 3-lead ECG. That is not a trivial transfer. The sensor configuration is different, the signal quality is different, and the task has moved from cardiac pathology to subtle autonomic changes associated with mental load.

CogAdapt handles this mismatch with two adaptation mechanisms.

First, LeadBridge transforms wearable 3-lead inputs into 12-lead representations compatible with the clinical foundation model. The paper pre-trains this adapter on PTB-XL normal sinus rhythm recordings, then uses it downstream for cognitive-load classification. The important detail is not merely “3 becomes 12.” The adapter is trying to preserve enough anatomical consistency that the foundation model’s learned ECG representations remain useful.

Second, ProFine progressively fine-tunes the encoder. The paper tests a frozen-encoder scenario, a partial-unfreeze scenario, and a full fine-tuning scenario. This matters because the model needs to adapt to cognitive-load signals without simply bulldozing the clinical pre-training. Foundation models are not sacred artefacts. Nor are they disposable. The trick is deciding how much of the inherited representation should move.

The results support the adaptation thesis. Under leave-one-subject-out evaluation, CogAdapt’s full fine-tuning scenario improves macro-F1 from 0.514 to 0.626 on CLARE and from 0.607 to 0.768 on CL-Drive compared with the strongest training-from-scratch baseline reported in the paper. Under K-fold validation, the same scenario reaches macro-F1 of 0.785 on CLARE and 0.860 on CL-Drive. The paper is careful to highlight that leave-one-subject-out is the realistic setting, because wearable cognitive-load systems are useful only if they survive contact with new users.

That is the business lesson hiding inside the signal-processing detail. A model trained in a rich, standardised, clinical environment cannot be assumed to work on sparse, noisy, consumer-grade signals just because both contain ECG. The reference frame has changed.

For digital health, workplace safety, driver monitoring, adaptive training, and human-machine interface products, this is a warning against “foundation model laundering”: taking a powerful pretrained model, attaching a small classifier, and declaring the domain solved. The adapter is not plumbing. It is the product boundary where clinical knowledge becomes operational sensing.

Adaptation at the Control Layer: The Packet Should Not Always Have to Ask Permission

AUGUSTE makes the same argument in a very different language. Here the system is not reading physiology; it is scheduling uplink resources in 5G URLLC systems.

The inherited reference frame is the scheduling request procedure. A user equipment device has uplink data, sends a scheduling request, waits for a grant, and then transmits. Reasonable enough — until low-latency systems care about milliseconds. The paper notes that scheduling requests add delay and jitter, with SR opportunities typically configured in the 10–40 ms range and potentially much longer. On the authors’ OpenAirInterface testbed, reactive scheduling shows median RTT around 20 ms; always-on scheduling can reduce RTT into the 7–12 ms range, but wastes resources.

So the old choices are unattractive:

Strategy	Latency	Resource efficiency	Why it fails operationally
Reactive SR-based scheduling	Higher	Efficient	Waits for the device to request resources, adding delay and jitter
Always-on scheduling	Low	Poor	Burns uplink resources whether traffic exists or not
Configured grants	Low for the right traffic	Rigid	Works best for strictly periodic traffic and can drift when reality is not so polite
AUGUSTE	Low when prediction is accurate	Tunable	Learns arrival timing online and schedules proactively only when needed

AUGUSTE inserts an online learning state machine into the MAC scheduler. During a learning phase, it schedules proactively to collect unbiased arrival statistics. During a confident phase, it uses learned predictions to issue proactive grants only when uplink traffic is expected. Two knobs — a tolerance window, $TW$, and a slot restriction, $N$ — expose the latency-overhead trade-off directly.

This is a useful design pattern: do not merely predict; give operators the levers that turn prediction into a controlled operating point.

In the request-response scenario, AUGUSTE reduces median RTT from roughly 20 ms to 10 ms and achieves that at around 7% scheduled uplink slot overhead, compared with 4% for the reactive baseline. In the broader conclusion, the authors report roughly 50% median RTT reduction at 7–10% radio overhead, plus one-way latency reduction from about 13 ms to 7–10 ms across tested scenarios.

The subtle point is that AUGUSTE is not just “ML for scheduling.” It is ML correcting the scheduler’s reference frame. The scheduler no longer treats packet arrival as something discovered only after a request. It treats arrival timing as a learnable operational process.

That matters beyond telecoms. Many enterprise workflows still run on reactive control logic: wait for the exception, wait for the ticket, wait for the sensor event, wait for the customer escalation. Predictive systems often fail because they bolt forecasts onto static workflows instead of changing the action policy around those forecasts. AUGUSTE’s value is not that it predicts. It changes when the system is allowed to act.

Adaptation at the Decision Layer: The Dataset Is Not the Decision Maker

TOPSIS-RAD moves the same argument into ranking and governance. Traditional TOPSIS ranks alternatives by distance from a positive ideal solution and a negative ideal solution. In the classical formulation, those reference points are derived from the observed alternatives. The best observed value helps define the “ideal”; the worst observed value helps define the “anti-ideal.”

This is mathematically convenient. It is also often a managerial nuisance wearing a nice suit.

If an outlier enters the dataset, the scale shifts. If a dominated alternative is added or removed, rankings may change. If the best observed alternative is far beyond what the decision maker actually needs, it can distort the ranking. In business terms, the method may reward extremity rather than suitability.

TOPSIS-RAD changes the construction of the problem before the distance-based ranking stage. It asks the decision maker to define:

Vetoed Performance Levels, or $VPL$: minimum acceptable levels below which an alternative should not remain in the feasible set.
Desired Performance Levels, or $DPL$: aspiration or saturation levels above which additional performance should not keep dominating the ranking.

The method then filters non-viable alternatives, caps performances above the desired level, normalises against the fixed $[VPL, DPL]$ span, and computes the familiar TOPSIS-like distance score over the surviving alternatives.

The simplified logic is:

$$ r_{ij} = \frac{g^d_{ij} - VPL_j}{DPL_j - VPL_j} $$

where $g^d_{ij}$ is the desired-constrained performance after capping.

The paper’s toy examples are not broad empirical validation, and the authors say so. Their purpose is narrower: demonstrate how vetoes and desired caps change rankings by anchoring the evaluation in declared decision boundaries rather than dataset extremes. In one example, adding a weak alternative changes rankings under traditional TOPSIS, while TOPSIS-RAD keeps the scores of existing alternatives invariant because the normalisation bounds are fixed by $VPL$ and $DPL$. In another, a strong but uneven alternative loses its dominance once above-threshold scores are capped, allowing a more balanced profile to rise.

This is exactly the kind of issue businesses encounter in supplier selection, project prioritisation, personnel assessment, credit triage, site selection, and procurement scoring. The raw top performer is not always the best decision. Sometimes “good enough on this criterion” really means good enough. Sometimes “below this threshold” should mean disqualified, not merely penalised by a weighted average and then quietly rescued by an impressive number elsewhere.

TOPSIS-RAD is not an AI model, but it belongs in the same article because it clarifies the governance side of the same problem. If a company lets the dataset define its ideals, the ranking system will optimise around whatever happened to appear in the comparison set. That is not strategy. That is spreadsheet astrology with Euclidean distances.

The Shared Insight: Adaptation Is the Missing Middle

The three papers differ in mechanism, evidence, and maturity. That difference is useful.

CogAdapt uses learned representation adaptation. AUGUSTE uses online control adaptation. TOPSIS-RAD uses declared decision-reference adaptation. Together, they outline a spectrum:

Adaptation type	Who or what defines the reference frame?	Where it fits	Failure avoided
Learned adapter	Data and model training procedure	Sensors, signals, embeddings, perception	Treating operational inputs as if they matched pre-training conditions
Online controller	Observed behaviour plus operator knobs	Scheduling, routing, automation, resource allocation	Choosing static policies for dynamic traffic or workflow patterns
Declared reference levels	Decision maker, regulator, or policy owner	Ranking, governance, prioritisation, procurement	Letting dataset extremes masquerade as business goals

The pattern is more important than any single method. The papers show that adaptation is not a cosmetic layer added after the system is “basically working.” It is the layer that decides what working means.

This matters now because enterprise AI is moving from isolated prediction to operational integration. The relevant question is no longer only, “Can the model classify, predict, or rank?” It is:

Classify against which sensor reality? Predict into which action policy? Rank against whose thresholds?

A model output becomes valuable only when it is translated into the context where consequences happen.

What the Papers Show, and What They Do Not

It is worth separating the evidence from the extrapolation.

The papers show the following:

Paper	What is demonstrated	Important limit
CogAdapt	A clinical ECG foundation model can be adapted to wearable cognitive-load classification using LeadBridge and progressive fine-tuning, improving reported performance on two public datasets	Generalisation remains hard; LOSO performance is lower than K-fold, and future work includes personalised adaptation
AUGUSTE	Online traffic-aware proactive scheduling can reduce latency while keeping radio overhead near the reactive baseline on a real 5G testbed	Experiments are limited in scale; future work includes multi-UE congestion, adaptive knob selection, and richer learners
TOPSIS-RAD	Declared veto and desired levels can stabilise and reshape TOPSIS rankings in toy examples by fixing the reference frame	It needs meaningful decision-maker thresholds, sensitivity analysis, and broader empirical validation

The business interpretation is broader:

Do not deploy inherited models without checking whether the deployment inputs match the pre-training frame.
Do not attach predictions to workflows that still behave as if nothing has been predicted.
Do not let ranking systems define “best” from whatever alternatives happen to be in the spreadsheet this quarter.
Do not confuse a benchmark with a boundary condition.
Do not allow impressive automation to hide unexamined assumptions. That trick has been overfunded already.

A Practical Framework for Managers

For a business team evaluating an AI-enabled system, the lesson can be turned into a simple diagnostic.

1. Identify the inherited frame

Ask what assumptions the system brings with it:

What data shape was the model trained on?
What environment did the protocol assume?
What does the scoring method treat as ideal, unacceptable, or sufficient?
Which thresholds are learned, and which are declared?
Which defaults are being treated as neutral?

Most system failures are not caused by a lack of dashboards. They are caused by invisible defaults with excellent posture.

2. Locate the operational mismatch

Look for gaps between inherited assumptions and deployment conditions:

Mismatch type	Example	Business symptom
Sensor mismatch	Clinical ECG model applied to wearable ECG	Weak field performance despite strong lab model
Timing mismatch	Scheduler waits for requests in low-latency traffic	Jitter, missed service-level targets, poor real-time control
Reference mismatch	Ranking uses observed extremes as ideals	Outliers distort decisions; new alternatives reshuffle priorities
Objective mismatch	Model trained for one task used for another	Accuracy drops or outputs become hard to interpret
Governance mismatch	No explicit veto or aspiration levels	Bad alternatives survive because they compensate elsewhere

3. Design the adaptation layer

The layer may be technical or procedural:

A sensor adapter that maps field signals into the model’s expected representation.
A progressive fine-tuning strategy that lets a foundation model move without forgetting everything useful.
An online controller that learns when action should happen, not merely what is likely.
A threshold system that declares what is unacceptable and what is sufficient before ranking begins.
A monitoring loop that detects when the reference frame itself has drifted.

This is where companies should spend more design energy. The adaptation layer is often less glamorous than the model, but glamour has a poor record as an operating principle.

4. Expose the trade-off

AUGUSTE is especially useful here because it gives operators explicit knobs. The system is not simply “optimised”; it is moved along a latency-overhead frontier.

That same thinking should appear elsewhere:

In cognitive-load systems: accuracy versus calibration burden versus privacy risk.
In ranking systems: strict vetoes versus candidate availability versus decision stability.
In procurement: cost minimisation versus quality thresholds versus supplier resilience.
In edge AI: inference latency versus network usage versus device energy.

A system that hides its trade-offs is not autonomous. It is merely opinionated.

5. Test against deployment reality

The more operational the system, the less useful it is to rely only on random splits, offline benchmarks, or elegant toy examples.

CogAdapt’s leave-one-subject-out evaluation is a good example of a stricter test because new users are exactly where wearable systems struggle. AUGUSTE’s real 5G testbed is valuable because scheduling lives or dies in timing, not in simulation poetry. TOPSIS-RAD’s limitation is also instructive: toy examples clarify mechanism, but enterprise adoption would need sensitivity analysis and real decision cases.

The relevant question is not “Does the method work?” It is “Does it work under the deployment mismatch that originally made the problem difficult?”

Why This Matters for AI Product Strategy

AI product teams often talk about “model-market fit.” The phrase is useful, though usually too broad. These papers suggest a more precise version: reference-frame fit.

A product has reference-frame fit when the system’s assumptions match the environment, action policy, and decision boundaries of the customer’s actual workflow.

Without it, companies get familiar disappointments:

A pretrained model that performs well in demos but degrades in the field.
A prediction engine that cannot change operational timing because the workflow remains reactive.
A ranking dashboard that produces unstable priorities whenever the candidate pool changes.
A governance process that measures everything except the minimum conditions that actually matter.
A system that is “data-driven” mainly because nobody wanted to admit the decision criteria.

For managers, the implication is refreshingly unsentimental: do not begin with the most advanced model. Begin with the reference frame.

Ask:

What reality generated the model’s competence?
What reality will the product face?
What must be translated between the two?
Which boundaries must be declared rather than inferred?
Which trade-offs must remain visible to operators?

The best AI systems are rarely those that pretend the world is clean. They are the ones that know exactly where it is not.

The Quiet Discipline of Reframing

CogAdapt reframes clinical ECG knowledge for wearable sensing. AUGUSTE reframes uplink scheduling from reactive permission to learned anticipation. TOPSIS-RAD reframes ranking from dataset-relative extremes to decision-maker-defined sufficiency and refusal.

None of these moves is conceptually flashy. That is precisely why they matter. Enterprise systems are not improved only by making the intelligence bigger. They are improved by making the reference frame explicit.

A system that inherits the wrong frame will optimise confidently in the wrong direction. A system with the right adaptation layer may look more modest, but it will fail less theatrically. In business AI, that counts as progress.

The useful question is not whether an AI system is accurate, fast, or optimal in the abstract. The useful question is whether it is accurate for the signal it will actually see, fast for the timing that actually matters, and optimal against the thresholds the business actually believes.

Everything else is just a very polished misunderstanding.

Cognaptus: Automate the Present, Incubate the Future.

Amir Mousavi et al., “CogAdapt: Transferring Clinical ECG Foundation Models to Wearable Cognitive Load Assessment via Lead Adaptation,” arXiv:2605.22774, 2026. https://arxiv.org/pdf/2605.22774 ↩︎
Maxime Elkael et al., “AUGUSTE: Online-Learning dApp for Predictive URLLC Scheduling,” arXiv:2606.03664, 2026. https://arxiv.org/pdf/2606.03664 ↩︎
Leonardo Fernandes Costa et al., “TOPSIS-RAD: Ranking According to Desires,” arXiv:2606.07253, 2026. https://arxiv.org/pdf/2606.07253 ↩︎

The Reference Frame Problem#

Adaptation at the Perception Layer: The Wearable Is Not the Clinic#

Adaptation at the Control Layer: The Packet Should Not Always Have to Ask Permission#

Adaptation at the Decision Layer: The Dataset Is Not the Decision Maker#

The Shared Insight: Adaptation Is the Missing Middle#

What the Papers Show, and What They Do Not#

A Practical Framework for Managers#

1. Identify the inherited frame#

2. Locate the operational mismatch#

3. Design the adaptation layer#

4. Expose the trade-off#

5. Test against deployment reality#

Why This Matters for AI Product Strategy#

The Quiet Discipline of Reframing#