Car settings are usually pretending to know you.

Sport mode assumes you are impatient. Eco mode assumes you have discovered moral superiority through fuel efficiency. Comfort mode assumes everyone in the vehicle prefers to be gently transported like a bowl of soup. These modes are not useless. They are just blunt. They adjust a handful of parameters and call the result personalization, which is a bit like calling a restaurant “personalized” because it offers small, medium, and large.

The paper Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving takes a more ambitious view.1 It does not treat personalized driving as a switch between aggressive, neutral, and conservative modes. It treats personalization as an alignment problem: how can an autonomous driving policy learn a driver’s long-term habits, interpret short-term natural-language instructions, and still preserve the safety and comfort constraints that make driving different from ordinary recommendation systems?

That distinction matters. A music app can personalize itself by giving you too much soft jazz. An autonomous vehicle personalizing itself badly can become a two-ton behavioral experiment with liability attached.

The paper’s contribution is therefore not simply “cars can follow commands.” The more interesting claim is architectural: personalization needs a mechanism that connects three different kinds of preference signal.

First, there is the driver’s persistent style: how they accelerate, brake, follow, yield, merge, and overtake over time. Second, there is situational intent: “I’m late,” “I feel carsick,” “please be careful here.” Third, there is the operating envelope: safety, efficiency, and comfort cannot be optimized independently, because in driving they constantly trade against one another.

Drive My Way, or DMW, is an attempt to put those three pieces into the same Vision-Language-Action driving framework.

The useful framing is not “personalized driving mode”; it is “preference alignment under control”

The obvious summary of the paper would say: DMW learns user embeddings, accepts language instructions, and performs better than several baselines. Accurate, but not very enlightening.

The useful reading is more mechanical. DMW is trying to solve a translation problem:

convert messy human preference into small, controlled changes in driving behavior, without destroying the base policy’s ability to drive safely.

That is why the paper uses a mechanism-first design. It does not simply prompt a language model to decide whether the vehicle should be cautious or assertive. It builds a pipeline in which a pretrained VLA driving backbone produces a base action, then a personalization component adds residual adjustments. In plain language: the model first learns to drive; then it learns how this kind of person, in this kind of situation, might drive differently.

The authors use SimLingo as the VLA backbone. SimLingo processes front-view camera images, instructions, and route targets, then predicts motion waypoints that can be converted into speed and steering commands. DMW adds a long-term preference encoder, a residual decoder, and reinforcement fine-tuning around that base.

That ordering is important. If personalization directly replaced the core driving policy, the system would risk treating user preference as a license to destabilize the planner. DMW instead applies personalization as a residual adjustment on top of the base action:

$$ a_t = a_t^{base} + a_t^{\Delta} $$

The base action comes from the motion predictor. The residual adjustment changes speed and steering in discrete ways. This is a reasonable design choice: preferences often live in the margin between multiple feasible actions, not in the decision to obey physics today and ignore it tomorrow.

A driver who is late might take a slightly earlier gap. A cautious driver might wait longer before overtaking. A carsick passenger might prefer smoother acceleration. None of those require inventing a new traffic rule. They require selecting among safe-enough alternatives with different comfort and efficiency profiles.

That is the product-relevant part of the paper. Personalization is not a personality sticker on the dashboard. It is a controlled policy adjustment layer.

The dataset matters because “style” is behavioral, not just descriptive

DMW depends on a new Personalized Driving Dataset, or PDD. The dataset includes 30 real drivers, each driving 20 CARLA routes across scenario types such as overtaking, merging, intersections, pedestrian crossings, and vehicle cut-ins. Participants provide profile information before driving, including demographic information, driving history, and typical driving purposes. The driving data then records camera input, ego-vehicle motion, surrounding agents, traffic context, route geometry, human control actions, and an expert target speed from PDM-Lite.

This is not just administrative detail. It is the difference between describing a driver and observing a driver.

A profile might say that someone is calm, experienced, and usually drives for school commuting. But the model needs to connect that semantic description to behavior: following distance, acceleration, lane changes, reaction to cut-ins, willingness to overtake, and speed relative to an expert target. The paper’s dataset is designed to support that mapping.

The authors’ key data move is to align profile information with actual route behavior. The profile is not used as a decorative biography. It becomes an input to a long-term preference encoder. The driving history becomes input to a route processor. The model then learns a shared latent space where a profile embedding and the corresponding behavior embedding from the same driver are close, while embeddings from different drivers are pushed apart.

The contrastive objective is the quiet center of the method. It tries to answer: can the model infer from a driver profile enough about the driver’s behavioral tendency to condition future driving?

Component What it learns Operational role Business boundary
Personalized Driving Dataset Links driver profiles to observed driving behavior Supplies the preference signal 30 drivers in CARLA is useful research evidence, not a production population sample
Long-term preference encoder Converts profile text into a user embedding Gives the policy a persistent driver prior Profile fields raise consent, privacy, and fairness questions in real deployment
Route processor Converts past trajectories into behavior embeddings Provides the behavioral counterpart for contrastive learning Simulator routes may not capture real-world regional driving norms
Residual decoder Adjusts base speed and steering Expresses personalization without replacing the whole planner Residual changes still require strict safety validation
Style-aware reward mapping Converts language intent into safety-efficiency-comfort weights Handles short-term instructions LLM-generated reward parameters need governance, review, and auditability

This table is where the misconception should die a peaceful death. DMW is not “choose conservative mode.” It is closer to “learn a driver-conditioned latent prior, then adjust the action distribution using both long-term profile evidence and short-term language intent.” Less catchy, admittedly. Also less misleading.

Long-term preference is encoded before the car hears today’s instruction

DMW separates long-term and short-term preference.

The long-term part comes from user embeddings. The paper uses a DeBERTaV3-based text processor and projection head to encode the driver profile. Separately, a temporal route processor handles windows of past trajectory data: camera images, ego states, and actions. These two encoders are trained with an InfoNCE-style contrastive loss so that the profile embedding for a driver aligns with that driver’s behavior embedding.

That design gives the policy a persistent preference prior. The vehicle is not supposed to behave as a blank slate every time the passenger says something. If two users both say “take it easy,” the system should eventually learn that “easy” means different things for someone who usually leaves a 50-meter headway and someone who treats lane changes as a competitive sport.

The paper then uses reinforcement fine-tuning to align the policy with these embeddings. To increase behavioral diversity, the authors also construct augmented trajectories by pairing a target driver with a dissimilar driver embedding. The reward is based on similarity between the model’s sampled action and either the original driver action or an augmented action derived from route-level action statistics.

The goal is not merely to cluster drivers into coarse categories. It is to give the VLA policy a way to express individual differences through action choices.

That is why the paper’s comparison with MORL-PD is relevant. MORL-PD conditions policy behavior on preference vectors such as speed and comfort. DMW instead uses a learned user embedding from profile and behavior alignment. In the reported long-term preference alignment results, DMW achieves Alignment Scores of 0.92 and 0.92 for two in-distribution drivers, and 0.83 and 0.83 for two out-of-distribution drivers. MORL-PD reports 0.42, 0.58, 0.25, and 0.33 on the same four driver cases. Human ratings also favor DMW: 8.7, 8.3, 7.8, and 8.0 versus MORL-PD’s 5.1, 6.2, 3.9, and 3.5.

Those numbers should be read carefully. The Alignment Score is based on whether generated rollouts are classified into the target driver’s cluster, and the ratings come from evaluators comparing model rollouts to driver logs. This is meaningful evidence that DMW expresses recognizable driver-specific behavior. It is not proof that a real driver would accept the behavior on a real road after three months of commuting in rain, traffic, and human impatience. Small distinction. Rather important.

Short-term language changes the reward, not merely the label

The second half of personalization is instruction following.

DMW constructs style instructions across 20 driving scenarios. Each scenario includes nine instructions covering three styles—aggressive, neutral, and conservative—at three levels of directness. That means the model is not only trained on commands like “drive aggressively.” It also sees more natural expressions of intent, such as being in a rush, feeling carsick, or wanting to be patient.

The interesting mechanism is style-aware reward adaptation. DMW formulates the driving reward as a weighted combination of safety, efficiency, and comfort:

$$ R(s_t, a_t) = w_s R_{safety} + w_e R_{efficiency} + w_c R_{comfort} $$

Safety is tied to Time-to-Collision thresholds. Efficiency is tied to preferred speed. Comfort is tied to smoothness constraints on steering and acceleration. The key is that these weights and thresholds are instruction-dependent and scenario-dependent. Aggressive instructions receive higher efficiency weight and preferred speed; conservative instructions produce larger safety margins and smoother behavior.

The paper uses advanced LLM reasoning—explicitly giving GPT-5 as an example—to infer reward weights and thresholds from the scenario description and instruction style, then refines the generated parameters through expert review. This is not a fully autonomous reward-design machine. It is a semi-automated reward specification workflow.

That distinction is not a footnote; it is an operating model. For a deployed vehicle system, reward translation would need audit trails, bounded parameter ranges, human validation, and regulatory review. “The LLM decided I should tailgate because the passenger was late” will not be a popular sentence in court.

Still, the architecture is useful because it locates the role of language. Language does not directly become throttle. Language changes the optimization context under constraints.

The main results show a trade-off: style sensitivity improves, generic score does not always dominate

The first major experiment evaluates closed-loop driving on Bench2Drive under aggressive, neutral, and conservative style instructions. The paper compares SimLingo, a StyleDrive-like baseline, DMW-Vanilla with fixed reward weights, and full DMW with style-aware reward weights.

The clearest pattern is not “DMW wins every cell.” It does not. The clearer pattern is that DMW creates larger and more interpretable behavioral separation across styles.

Under full DMW, aggressive driving has higher efficiency: 281.56 versus 244.98 for neutral and 237.06 for conservative. Conservative driving has higher comfort: 34.62 versus 28.67 for neutral and 21.62 for aggressive. Conservative driving also has larger headway: 30.05 versus 27.60 and 26.37. Lane changes fall from 0.70 under aggressive instructions to 0.60 under conservative instructions.

That is what one would expect if the instruction is not just a label but a reward-shaping signal.

The authors highlight that aggressive DMW produces an 18.77% efficiency gain relative to conservative DMW, while Driving Score drops by only 3.89% relative to the conservative condition. The conservative condition has the highest Driving Score and Success Rate among the DMW style variants: DS 82.72 and SR 71.56. Aggressive DMW has DS 79.50 and SR 67.36.

So the practical reading is this: DMW can push the vehicle toward more efficient behavior when instructed, but that style shift is not free. It changes the performance profile. That is exactly why style-aware driving is a governance problem, not just a user-experience feature.

DMW-Vanilla is also revealing. With fixed reward weights, it reports strong generic DS and SR—82.19 and 70.97 under aggressive, 81.96 and 70.63 under neutral, and 81.48 and 71.05 under conservative. But the behavioral differences across styles are weaker. The paper interprets this as evidence that fixed weights optimize general driving objectives, while style-aware weights better separate aggressive, neutral, and conservative behavior.

In business language: generic reliability and expressive personalization are not the same product metric. A system can be safer-looking because it behaves similarly under every instruction. It can also be more personal because it actually changes behavior. The useful product question is how much behavioral elasticity is allowed before safety, comfort, and regulatory constraints say “thank you, that is enough personality for today.”

The ablations test whether personalization is really coming from the preference mechanism

The ablation studies are easy to misread as extra performance tables. They are more useful as mechanism checks.

The Adaptive Average Pooling ablation asks whether the preference encoder is preserving the right parts of the profile representation. Without the masked Adaptive Average Pooling module, the model compresses textual profile features using an unmasked global mean. The paper reports that this weakens user-embedding expressiveness.

The result is visually obvious in the metrics. Without AAP, speeds for D1 to D4 cluster tightly around 4.21 to 4.52, and ratings range from 4.6 to 6.1. With AAP, the same drivers show more distinct speeds—8.54, 5.57, 6.40, and 8.77—and higher ratings of 8.7, 8.3, 7.8, and 8.0. Alignment Scores also improve: for example, D3 rises from 0.25 without AAP to 0.83 with AAP.

That supports the claim that profile representation quality matters. It does not prove that the chosen profile fields are optimal, fair, or stable across cultures. But it does support the paper’s narrower argument: if the encoder washes out profile semantics, the model loses behavioral individuality.

The second ablation compares fixed reward weights with style-aware reward adaptation. This is where Table 1 becomes more than a leaderboard. DMW-Vanilla’s strong DS and SR show that fixed reward fine-tuning can improve generic driving. Full DMW’s larger differences in speed, comfort, headway, and efficiency show that dynamic reward weights are needed to make instruction styles distinct.

The ablations therefore serve two purposes:

Test Likely purpose What it supports What it does not prove
Adaptive Average Pooling ablation Check whether the preference encoder preserves driver-specific semantics Better user embeddings produce more distinct driver-conditioned behavior That these profile variables are sufficient for production personalization
Fixed vs style-aware reward weights Check whether instruction sensitivity comes from dynamic reward adaptation Style-aware reward mapping creates clearer aggressive/conservative differences That all style shifts remain safe in real traffic
Long-term user study Check whether generated behavior is recognizable as driver-specific DMW rollouts align better with driver clusters and evaluator ratings than MORL-PD That drivers themselves would prefer the generated behavior over time
Appendix style user study Check whether human evaluators perceive instruction alignment DMW receives higher style-match ratings than SimLingo and StyleDrive-like baselines That perception ratings equal real-world safety or comfort
Qualitative scenarios Show behavioral contrast in critical scenes Aggressive and conservative prompts produce visibly different route/speed choices That the policy is robust across all rare edge cases

This is good experimental structure. The main evidence shows closed-loop metrics and alignment. The ablations probe whether the proposed mechanisms are doing the work. The appendix extends the behavioral interpretation through additional driver metrics, user ratings, and qualitative examples. None of it should be inflated into real-road validation. Fortunately, the paper itself states that the current validation is in CARLA and identifies sim-to-real deployment as future work.

The business value is not “cars with personalities”; it is configurable trust

The title of this article says autonomous cars start having personalities. That is partly a joke. The less cute phrase is probably more useful: vehicles may need adaptive behavioral contracts.

A passenger does not only want the car to complete the route. They want the car to complete the route in a way that feels acceptable. “Acceptable” varies by person, trip purpose, weather, traffic density, physical comfort, and urgency. A commuter late for a meeting, an elderly passenger, a parent with a sleeping child, and a motion-sensitive passenger might all want different behavior from the same vehicle in the same city.

Current mode-based systems compress that diversity into a few presets. DMW points toward a richer product layer:

Business setting Possible use What the paper directly supports What remains uncertain
Private autonomous vehicles Learned driver/passenger profiles that influence acceleration, following distance, and overtaking behavior Driver-conditioned rollouts show recognizable long-term preference patterns in simulation Real-world acceptance, safety certification, and long-term adaptation
Robotaxi fleets Passenger-selected or learned ride styles, such as calmer rides or time-sensitive rides Language instructions can shift behavior along efficiency-comfort-safety dimensions Liability when user preference conflicts with fleet safety policy
ADAS suppliers Personalization adapter layered on top of a base driving stack Residual adjustment design suggests personalization can be modular Integration with proprietary stacks and validation pipelines
In-car AI assistants Natural-language ride preference interface DMW maps situational language into reward parameters Whether casual language is reliably interpreted under stress
Insurance and compliance Auditable preference boundaries and behavior logs The framework separates base policy, user embedding, and style reward Regulatory standards for personalized driving behavior are not established

The most attractive commercial idea is not giving the car a quirky mood. Please no. The attractive idea is reducing the gap between objective safety and perceived safety.

Many passengers feel uncomfortable not because the vehicle is objectively unsafe, but because its choices are hard to predict. It brakes too late for their taste. It merges too assertively. It waits too long. It accelerates smoothly but feels timid. A personalized policy could make behavior more legible and therefore more trusted, as long as the personalization remains inside a validated safety envelope.

There is also a data-product implication. If automakers and fleet operators want personalization, they need longitudinal, consented behavior data—not merely one-time preference surveys. DMW’s dataset design hints at the required data architecture: profile data, behavior logs, scenario context, expert references, and evaluation metrics that can separate efficiency, comfort, safety margin, and style recognition.

That is expensive. But for premium mobility, robotaxi retention, and differentiated in-car AI, it may be more valuable than another glowing dashboard animation. The bar is low; dashboards have been trying very hard lately.

The uncomfortable part: preference data in cars is intimate data

Personalized driving sounds friendly until one asks what the system needs to know.

DMW uses driver profiles containing background, habits, driving experience, and driving purposes. In a research setting, this is reasonable. In a commercial setting, it becomes sensitive behavioral data. A vehicle preference profile can reveal commuting patterns, risk tolerance, family routines, medical discomfort, work urgency, and perhaps emotional state if the system is allowed to interpret phrases like “I’m tired” or “I feel anxious.”

That creates several design obligations.

First, personalization should be opt-in and inspectable. Users need to know what is being learned and how it affects vehicle behavior. Second, profile-derived behavior should be editable or erasable. A user should not be trapped forever by the car’s interpretation of their “aggressive driver era.” Growth is possible. So is litigation.

Third, fleet systems need hard safety overrides. A passenger saying “I’m late” should not directly reduce safety margins beyond certified thresholds. DMW’s reward-weight design can be interpreted as a step in this direction because it still keeps safety, efficiency, and comfort inside an explicit reward formulation. But production systems would need stronger constraints than a research reward function.

Fourth, personalization must avoid encoding unfair or irrelevant assumptions. If profile fields correlate with demographics, geography, or occupation, the model might learn behavior patterns that are statistically useful but socially or legally problematic. The paper does not solve this. It opens the door to it.

That does not weaken the paper’s research contribution. It clarifies the next layer of work.

The real boundary is sim-to-real, sample size, and evaluation depth

The paper is careful about one major limitation: DMW is validated in CARLA. That matters because autonomous driving is full of sim-to-real gaps. Perception noise, local driving customs, road surface variation, unusual human behavior, weather, sensor failures, and ambiguous interactions do not politely match a simulator’s distribution.

There are other boundaries as well.

The dataset uses 30 drivers. That is enough to demonstrate a research mechanism, not enough to claim broad population coverage. The paper evaluates 25 in-distribution and 5 out-of-distribution drivers for preference alignment. That is a useful zero-shot test, but still a small slice of possible driver diversity.

The user studies are also limited. The long-term alignment study uses ten evaluators rating similarity between driver logs and model rollouts. The appendix style-adaptation study uses five evaluators rating videos across representative scenarios. These are valid supporting evaluations, but they measure recognizability and perceived instruction alignment, not longitudinal trust, motion sickness outcomes, or actual passenger preference after repeated rides.

The reward-generation workflow uses LLM-inferred parameters refined by experts. That is practical for research, but production would need version control, validation rules, scenario coverage analysis, and a process for handling conflicting instructions. “Drive faster, but also be safer, but also my coffee is open” is not a rare edge case. It is Tuesday.

Finally, the paper’s strongest evidence is comparative within its experimental setup. DMW outperforms SimLingo, StyleDrive-like, and MORL-PD baselines on the personalization dimensions the authors define. It does not prove that personalized VLA driving is deployable today.

That boundary is not disappointing. Research papers are allowed to be research papers. The useful question is whether they move the design conversation forward. This one does.

What Cognaptus infers for AI product strategy

The broader lesson is not limited to autonomous driving. DMW is an example of a pattern that will likely appear in many physical AI systems:

  1. keep a competent base policy;
  2. learn a persistent user or operator embedding;
  3. allow natural-language situational intent;
  4. translate that intent into bounded reward or control parameters;
  5. evaluate both task success and preference alignment.

For software products, personalization often means ranking content or adjusting interface defaults. For embodied AI, personalization means changing actions in the physical world. The second is much less forgiving.

That is why DMW’s residual design is strategically interesting. It suggests a way to personalize behavior without letting preference overwrite competence. The base driving system remains responsible for route-following and basic action generation. The personalization layer adjusts within the feasible action space. In product architecture terms, this is a safer story than “the AI assistant drives however you ask.”

It also creates a clearer market map. The defensible asset is not just the VLA model. It is the preference-alignment stack around it: data collection, user embedding, scenario-conditioned reward design, evaluation metrics, and governance. Automakers may eventually compete less on whether the car can follow a lane and more on whether it can make its behavior feel naturally aligned with the passenger without becoming unsafe, annoying, or legally radioactive.

That is a narrow path. But it is a real one.

Conclusion: personality is the interface; alignment is the engineering problem

Drive My Way is easy to describe in consumer language: an autonomous car that adapts to how you like to drive. That description is useful, but incomplete.

The paper’s deeper contribution is showing how personalized driving can be formulated as preference alignment for a VLA policy. Long-term driver behavior becomes a learned embedding. Short-term language changes reward weights and thresholds. The base policy remains grounded in driving competence, while residual actions express preference. The evaluation then checks not only route performance but whether the resulting behavior is recognizably different across drivers and instructions.

For business readers, the implication is straightforward: personalization in autonomous systems will not be a UI feature pasted onto the product at the end. It will require data infrastructure, model architecture, reward governance, and safety validation from the beginning.

Cars may indeed start having personalities. The serious question is who gets to define those personalities, how tightly they are bounded, and whether the vehicle can explain the difference between “driving my way” and “driving badly with confidence.”

That final distinction, as usual, is where the money and the lawsuits live.

Cognaptus: Automate the Present, Incubate the Future.


  1. Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, and Jiachen Li, “Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving,” arXiv:2603.25740, 2026. https://arxiv.org/abs/2603.25740 ↩︎