A robot helping a patient stand is not solving a benchmark. It is sharing weight, sensing resistance, absorbing surprise, and deciding how much force is too much. That last phrase is where most AI language starts to get suspiciously cloudy. “Deciding” sounds like a software problem. In physical systems, it is also a stiffness problem, a damping problem, an energy problem, and occasionally a liability problem wearing hospital slippers.
That is the useful provocation in Vahid Salehi’s Fundamentals of Physical AI.1 The paper does not treat Physical AI as a large language model with wheels, arms, or a mildly dramatic demo video. It defines Physical AI as a closed embodied control loop: body, perception, action, learning, autonomy, and context continuously shaping one another under physical laws. Intelligence is not something the machine merely computes and then executes. It is something the system negotiates through force, motion, resistance, and feedback.
For business readers, that distinction matters. A chatbot can be wrong and irritating. A robot can be wrong and heavy. The difference is not philosophical. It changes product design, procurement, safety assurance, insurance, regulatory exposure, and the economics of deployment.
The paper’s central claim is therefore not “robots need better AI”. That is the kind of sentence that keeps strategy decks alive and engineering teams quietly despairing. The sharper claim is this: once AI acts in the world, intelligence must be evaluated as a physical relationship, not just a predictive output.
The body is not a container for intelligence; it is part of the computation
The familiar mental model of AI is still embarrassingly disembodied. Data enters. A model processes. An answer emerges. If the answer is wrong, one adds data, parameters, tools, retrieval, guardrails, or a more expensive invoice.
Physical AI breaks that neat stack. In Salehi’s framework, the body is not the delivery vehicle for cognition. It determines what can be sensed, what can be done, what counts as feedback, and what forms of learning are even available.
The paper builds its framework around six fundamentals:
| Fundamental | What it means in the loop | Business translation |
|---|---|---|
| Embodiment | The physical structure through which the system experiences force, movement, resistance, and energy | Hardware architecture is not downstream of AI strategy; it constrains intelligence from the start |
| Sensory perception | The conversion of physical signals into meaningful state awareness | Sensor strategy is not just data capture; it defines what the system can responsibly notice |
| Motor action competence | The ability to coordinate movement with environment and goal | Performance is measured through stable interaction, not merely task completion |
| Learning ability | Adaptation through experience and feedback | Improvement comes from controlled physical learning, not only offline model updates |
| Autonomy | Self-regulation within safe limits | Autonomy should be tested as recovery and stability, not declared as a feature label |
| Context sensitivity | Adjustment to spatial, social, bodily, and situational conditions | Human-facing systems need situational appropriateness, not generic automation |
The important move is that these are not six modules waiting politely in an architecture diagram. The paper argues that they form a circular mechanism: embodiment enables perception; perception structures action; action produces experience; experience supports learning; learning stabilizes autonomy; autonomy allows context-sensitive behaviour; context then reshapes what action should mean.
That loop is the article’s real subject. Not the list. Lists are cheap. Loops are where systems either become robust or quietly become incident reports.
Physical AI is not “AI plus robot”
The most likely misconception is also the most commercially dangerous: treating Physical AI as a software layer added to mechanical equipment. Under that view, the robot body is the appliance, the AI is the brain, and the deployment question is how to connect the two without too many procurement meetings.
Salehi’s paper rejects that split. Intelligence arises from the coupling between agent and environment. The rehabilitation robot used throughout the paper is not intelligent because it calculates an abstract plan and then moves. It is intelligent, in the paper’s terms, because its soft, sensor-supported body adjusts support through resistance, balance, motion, and feedback.
That changes what “learning” means. In ordinary machine learning, learning usually means parameter adjustment over data. Here, learning is framed as a change in structural coupling between system and environment. The robot does not simply know more. It becomes better aligned with the physical relationship it is participating in.
This is why Physical AI should make executives nervous in a productive way. It implies that many AI deployment questions cannot be answered by asking which model is most capable. For embodied systems, the better questions are less glamorous:
- What physical states can the system actually sense?
- Which forces can it modulate?
- How quickly does it recover after disturbance?
- What does it treat as a successful interaction?
- Can its adaptation be explained in physical variables?
- Does it become gentler, safer, and more stable over repeated contact?
This is not anti-software. It is anti-fantasy. A useful position, admittedly unfashionable.
The virtual rehabilitation arm is a demonstration, not a clinical benchmark
The paper operationalizes the framework using a virtual rehabilitation assistant in NVIDIA Isaac Sim with the PhysX solver. The setup models a three-part rehabilitation arm interacting with a simulated human arm. The simulation uses an effective end-effector mass of $m_{\text{eff}} = 1kg$, a timestep of $\Delta t = \frac{1}{120}s$, stiffness values $k$ in the range of 2,000 to 10,000 N/m, and damping values $c$ in the range of 10 to 40 Ns/m. The simulated patient arm applies forces between 5 and 15 N, with stochastic modulation and phase disturbance. Measurement noise of $\pm 5%$ is injected, and reported averages are based on at least 50 cycles.
That detail matters because the experiment is easy to overread. It is not a deployed rehabilitation device. It is not a clinical trial. It does not prove patient outcomes, regulatory readiness, durability, manufacturability, or commercial ROI. It is a stylized virtual experiment designed to make a theoretical loop measurable.
That is still useful. A simulation can be weak evidence for market claims and strong evidence for mechanism clarity. The trick is not confusing the two, an industry habit roughly as persistent as calling every workflow “agentic”.
| Experimental element | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Rigid, soft, and adaptive impedance configurations | Main comparison | Adaptive physical coupling can outperform fixed mechanical settings in this simulated interaction | That one hardware architecture is clinically superior |
| Force, IMU, and depth sensing with injected noise | Implementation detail and realism check | Perception is modelled as multimodal physical signal integration under uncertainty | That real-world sensors will behave identically |
| Phase randomization and frozen parameter tracking | Control / ablation-style checks | The observed gains depend on coupling and adaptation, not only fixed trajectories | Full robustness across all patient types and hardware platforms |
| Disturbance recovery under impulse forces | Robustness/sensitivity test | Autonomy can be operationalized as return to low-energy stability | General safety certification |
| Fatigue and instability patient states | Exploratory context extension | Context can be inferred from energetic signatures in the simulation | Clinical diagnosis or validated patient-state recognition |
The correct business reading is therefore: the paper gives a design grammar and evaluation vocabulary for embodied AI systems. It does not give a purchase order.
Adaptive impedance is where the mechanism becomes visible
The first experimental lesson is almost disappointingly practical: neither rigid nor soft is enough.
In the simulation, the rigid configuration uses high stiffness. It is precise, but brittle under phase shifts and disturbance. The soft configuration cushions peaks, but can accumulate work over cycles. The adaptive configuration varies stiffness and damping within specified ranges, adjusting to the interaction state. This is where the paper reports the key pattern: adaptive impedance halves average energy loss per cycle from roughly 1.0 J to 0.48 J and shortens recovery after disturbance to about 0.35 seconds.
There is a small numerical wrinkle. One bullet in the paper phrases the moment variance and energy-loss direction inconsistently, while the surrounding explanation frames the adaptive case as reducing energy loss and improving stability. The prudent interpretation is to trust the mechanism and the repeated energy-loss direction, while treating the exact variance phrasing as less clean than the thesis would prefer. Physics may be elegant; manuscripts remain human.
The underlying mechanism is straightforward. A body with stiffness $k$ stores spring energy approximately as:
At $k = 5,000$ N/m, an energy scale of about 1 J corresponds to a deflection of roughly 2 cm. That is plausible for cooperative human-robot physical interaction. The point is not the charm of the equation. The point is that “intelligence” here becomes measurable as energy coherence: the system preserves structure, avoids excessive force, and remains responsive.
For businesses building or buying robots, this suggests a testable procurement principle: ask less whether the system is “AI-powered” and more whether it can adapt its physical impedance under changing load without becoming harsh, sluggish, or unstable.
Perception is not more data; it is better physical resonance
The paper’s second move is to redefine perception. In many AI systems, perception is treated as input capture. Cameras, microphones, lidar, force sensors — all become data feeds into a model. The hidden assumption is that more data means better understanding.
In the rehabilitation setup, perception is narrower and more physical. It is the system’s ability to sense how its own forces relate to the environment’s forces. The virtual robot uses force sensing at 1 kHz, IMU signals at 500 Hz, and synthetic depth images at 60 Hz. The bandwidths are tied to physical phenomena: short-term contact peaks, system natural modes, and broader spatial context.
The important reported measure is the correlation between contact force $F(t)$ and end-effector velocity $\dot{x}(t)$:
In the experiment, this correlation rises from about 0.55 to 0.86 over repeated cycles. The interpretation is not that the robot “sees” better in the human sense. It is that force absorption and movement become more phase-aligned. The system stops fighting itself.
The paper also describes phase randomization as a control: when the force-motion relationship is artificially decorrelated, the energetic benefits disappear and energy curves return to a less efficient level. That is important because it separates the claim from decorative sensor fusion. The gain depends on relational structure, not merely on adding sensors to the invoice.
For business practice, the lesson is blunt. Sensor strategy is not about hoarding signal. It is about identifying which physical relationships matter for stable action. In eldercare, manufacturing, logistics, or surgical support, a system that sees everything but understands no force relationship is not perceptive. It is just expensive.
Motor competence emerges when the system stops worshipping the path
Classical automation loves trajectories. Define the path, track the path, penalize deviation from the path. This works beautifully until the world becomes soft, tired, slippery, variable, human, or otherwise rude.
The paper’s motor competence section shifts the emphasis from path dominance to impedance coordination. The robot begins with a smooth reference trajectory, but adjusts stiffness and damping according to measured forces and cycle energy. If average cycle energy rises over a moving horizon, the current parameterization is weakened; if energy falls, it is reinforced.
The reported result is organized improvement rather than perfection. Trajectory deviation falls from about 4.1 mm to 1.4 mm, while force fluctuations fall by roughly 35%. The paper says these figures are robust against patient load variation between 5 and 15 N and moderate phase disturbance. When parameter tracking is frozen, trajectory errors and energy losses rise again; when adaptation restarts, the system normalizes over several dozen cycles.
That makes the result more interesting. It suggests that motor competence is not simply embedded in a precomputed motion plan. It is maintained by ongoing adjustment in the impedance space. In other words, competence is less “follow the line” and more “keep the interaction livable”.
This distinction is operationally useful. In a warehouse cobot, a hotel service robot, or a rehabilitation device, the goal is rarely just to minimize geometric deviation. The goal is to complete the task without creating unsafe force peaks, excessive wear, user discomfort, or repeated human override. The path matters. The relationship matters more.
Learning becomes explainable when the objective is physical
The paper defines learning as stabilization of energetically favourable body configurations. After each time block, the system evaluates average cycle energy:
If $E_{\text{cycle}}$ decreases relative to a moving reference, the current parameters are reinforced. If it increases, they are attenuated. The paper describes this as a physically expressed analogue of policy-gradient learning in the parameter space of stiffness and damping.
Over 100 cycles, average cycle energy falls from about 1.6 J to 0.82 J, nearly a 50% reduction. The curve is described as rapid early adjustment over the first 10 to 15 cycles, then slower optimization, then asymptotic stabilization from roughly cycle 25 onward.
This is the rare kind of learning story that should please engineers and governance teams at the same time. The system’s adaptation is traceable. One can inspect which parameters changed, when they changed, and why. The explanation is not “the neural network found a latent representation, please clap”. It is “energy flows improved under these physical settings”.
That does not make the approach magically safe. But it does make safety more inspectable. In physical AI, explainability is not only about explaining a decision. It is also about explaining a movement.
For business users, this could become a practical advantage. In safety-critical sectors, opaque learning is expensive. It increases validation burdens, slows certification, and frightens everyone who has ever signed a liability document. A physical learning loop that exposes its adaptation through force, energy, damping, and recovery metrics may be easier to audit than a black-box controller optimising an abstract reward.
Autonomy is recovery, not vibes
Autonomy is one of those words the technology sector has tried very hard to ruin. It can mean anything from a thermostat to a philosophical hostage situation. The paper wisely defines it physically: the ability to maintain coherence under changing conditions and return to a low-energy stable state after disturbance.
In the simulation, disturbance is introduced through unpredictable patient-arm impulses of $\pm 8$ N on top of the base 5 to 15 N force range. The adaptive system has only its stiffness and damping instruments. No hard reset. No external rescue controller. The question is whether it can absorb disturbance and recover.
The reported return time is about 0.35 seconds, corresponding to two to four natural periods in the stated frequency range of 7 to 16 Hz. The paper defines a stability rate as the proportion of cycles where energy deviation from local baseline remains below 10%; under that definition, the adaptive system reaches about 91%. The result is described as robust to $\pm 5%$ sensor noise and an 80 ms patient-response delay.
This is a more useful definition of autonomy than most boardroom language. It makes autonomy measurable. Can the system return to stability after the world pushes back? How often? How fast? At what energy cost? With what force burden on the human?
Those questions travel well beyond rehabilitation. They apply to industrial cobots, autonomous forklifts, robotic cleaning systems, exoskeletons, mobility devices, drones, and inspection robots. Autonomy without recovery is not autonomy. It is unsupervised optimism.
Context sensitivity is not a label classifier hiding in a robot
The sixth fundamental is context sensitivity. The paper models three patient states: stable, fatigued, and unstable. Fatigue is represented through reduced stiffness and increased reaction latency; instability adds greater variance in force phase. The system does not directly “know” these labels. It reads sensor data and adjusts stiffness and damping to smooth energy flows.
The paper reports condition detection at around 93% accuracy, with a 95% confidence interval of roughly $\pm 3%$ over at least 300 time windows. During fatigued phases, the system reduces contact force by about 1.2 N, slightly changes phase behaviour, lowers energy per cycle by roughly 10 to 20%, and keeps trajectory error near 1.4 mm.
This is a useful result precisely because it avoids the shallow version of context awareness. The shallow version says: classify the user, then select the corresponding behaviour. The physical version says: infer the relational state from energetic signatures and adjust the interaction so it remains viable.
That distinction matters in human-facing automation. A hotel service robot does not need a metaphysical understanding of annoyance, but it does need to recognise crowding, hesitation, obstruction, and unsafe proximity. A rehabilitation robot does not need to “feel empathy”, thankfully, but it does need to reduce force when the patient’s body signals fatigue. A factory cobot does not need common sense as a TED Talk concept; it needs to distinguish routine resistance from a jammed fixture or a human hand where a human hand should not be.
Context sensitivity in Physical AI is therefore less about semantic cleverness and more about appropriate modulation.
Ethics moves into the control surface
The paper’s ethics section is ambitious, sometimes sweeping, but it lands on one commercially important point: when AI touches the world, ethics cannot remain a policy PDF attached after deployment.
In disembodied AI, governance often focuses on data provenance, bias, transparency, privacy, and output control. Those remain relevant. But embodied systems introduce additional ethical variables: force, speed, pressure, energy, proximity, fatigue, recovery, and intervention. A robot that grips too tightly has not merely made a classification error. It has turned model behaviour into physical risk.
Salehi frames ethics as embedded in the six fundamentals. Embodiment creates responsibility because physical action leaves consequences. Perception creates responsibility because sensing can become surveillance. Motor action creates responsibility because intervention changes the environment. Learning creates responsibility because the system should correct harmful patterns. Autonomy creates responsibility because self-regulation must remain bounded. Context sensitivity creates responsibility because the same action can be supportive in one situation and intrusive in another.
This is where Physical AI becomes uncomfortable for organisations that want “AI governance” to be mainly a compliance function. In embodied systems, governance must reach into engineering specifications. Acceptable force ranges, recovery thresholds, sensor boundaries, fallback behaviour, parameter limits, human override design, maintenance drift, and simulation-to-real validation become governance artefacts.
The ethics are in the control surface. Very inconvenient. Also true.
The business value is better diligence, not immediate ROI
The paper does not provide a business case in the narrow financial sense. There is no deployment cost model, no comparison against commercial rehabilitation robotics, no patient recovery study, and no hardware bill of materials. Good. Not every paper needs to cosplay as a McKinsey slide.
Its business value lies elsewhere: it gives leaders a better diligence framework for embodied AI.
| Business question | Physical AI version |
|---|---|
| Does the system use AI? | Which physical feedback loop does the AI close? |
| How accurate is it? | Accurate under what force, delay, noise, and disturbance conditions? |
| Is it autonomous? | How quickly and reliably does it recover after perturbation? |
| Does it learn? | What physical variables change, and can those changes be audited? |
| Is it safe? | What force, energy, proximity, and failure thresholds define safety? |
| Is it context-aware? | How does the same action change under different human or environmental states? |
| Can it scale? | How much behaviour transfers from simulation to real hardware, and what must be recalibrated? |
This reframing is useful across several markets.
In rehabilitation and eldercare, it shifts attention from “robot assistant” as a service concept to measurable assistance quality: force modulation, fatigue response, recovery stability, and patient-specific adaptation.
In manufacturing, it reframes cobot value around tolerance handling, safe contact, material variability, and recovery from unexpected physical states — not merely task automation.
In logistics and mobility, it points toward embodied robustness: the system’s ability to operate under friction, weight shifts, obstruction, weather, surface changes, and sensor uncertainty.
In digital-twin strategy, it suggests that simulators are not just testing environments. They can become pre-deployment laboratories for physical intelligence, provided that simulation evidence is not lazily promoted into field evidence. A virtual world can discipline a design. It cannot certify reality by vibes.
The limits are not footnotes; they define how to use the paper
The main limitation is evidence level. The paper combines a conceptual framework with a stylized Isaac Sim demonstration. That is enough to clarify mechanisms and propose metrics. It is not enough to establish clinical efficacy, hardware reliability, regulatory acceptability, or production economics.
Several boundaries matter.
First, the rehabilitation scenario is virtual. Simulated gravity, friction, elasticity, and energy flows are useful, but real hardware adds actuator limits, sensor drift, latency quirks, calibration issues, wear, human unpredictability, and institutional workflow constraints. Reality has a talent for refusing to be a well-behaved solver.
Second, the setup is deliberately simple. A three-part arm and simulated patient-force model make the six fundamentals visible. They do not cover the full complexity of rehabilitation therapy, human trust, pain response, clinician supervision, or heterogeneous patient populations.
Third, some results should be treated as demonstration metrics rather than benchmark claims. Energy reduction, trajectory improvement, stability rate, and detection accuracy are informative within the simulation’s parameter range. They should not be generalized without hardware trials and independent replication.
Fourth, the ethical argument is stronger as an engineering orientation than as a complete governance framework. Saying that ethics is embodied does not eliminate the need for external oversight, audit, regulation, consent, data governance, and accountability. The body may do some thinking. The lawyers, regrettably, will still attend meetings.
Physical AI changes the intelligence game because the game stops being digital
The paper’s strongest contribution is not that it adds another term to the AI glossary. We have enough of those. Some should be composted.
Its stronger contribution is a change in evaluation logic. When AI systems act physically, intelligence is no longer adequately measured by prediction accuracy, language fluency, or task completion alone. It must be measured through coupling: whether the system can sense relevant physical states, act without excessive force, learn from interaction, recover from disturbance, and adjust behaviour to context.
That is why the mechanism-first reading matters. The six fundamentals are not a checklist for a robotics brochure. They are a causal loop. Body enables perception. Perception shapes action. Action creates experience. Experience supports learning. Learning stabilizes autonomy. Autonomy makes context-sensitive behaviour possible. Context then changes what the next action should be.
The business implication is equally direct. Companies deploying embodied AI should stop asking whether intelligence can be installed into machines as a software upgrade. The better question is whether the machine’s body, sensors, actuators, learning loop, and safety boundaries are designed as one coherent physical system.
Physical AI does not make machines more human. That is the wrong ambition, and frankly humans are an inconsistent benchmark. It makes machines more accountable to the world they touch. For robotics, healthcare, mobility, manufacturing, and human-assist systems, that is the difference between automation that performs in a demo and automation that survives contact with reality.
The future of AI will not only be written in tokens. Some of it will be written in stiffness, damping, force, recovery time, and the quiet discipline of not knocking people over.
Cognaptus: Automate the Present, Incubate the Future.
-
Vahid Salehi, “Fundamentals of Physical AI,” arXiv:2511.09497, 2025, https://arxiv.org/pdf/2511.09497. ↩︎