When humans stop shopping and AI takes over, the cart becomes a new battleground. A recent study titled “What Is Your AI Agent Buying?” introduces a benchmark framework called ACES to simulate AI-mediated e-commerce environments, and the results are far more consequential than a simple switch from user clicks to agent decisions.

The ACES Sandbox: Agentic E-Commerce Under the Microscope

ACES (Agentic e-Commerce Simulator) offers a controlled environment that pairs state-of-the-art vision-language-model (VLM) agents with a mock shopping website. This setup enables causal measurement of how different product attributes (price, rating, reviews) and platform levers (position, tags, sponsorship) influence agentic decision-making.

The methodology is clever: a simplified “Veni, Vidi, Emi” cycle where the AI agent comes, sees the product grid, and buys. Each product page displays a fixed 2x4 grid of listings. There are no details pages or scrolling—everything is visible at a glance. This constraint helps isolate choice behavior from navigation failures.

Irrationality in the Age of Rational Agents

Do AI agents always make rational choices? Surprisingly, no.

Test Type GPT-4.1 Fail Rate Claude 4.0 Fail Rate Gemini 2.5 Flash Fail Rate
Chooses Cheapest Item 9.3% 0.5% 0%
Chooses Highest Rating 15.1% 28.7% 0%

When a single product was 10% cheaper or 0.1 stars higher in rating, agents often still chose inferior options. Even the most advanced models like GPT-4.1 failed more often than one might expect. Worse, when agents made mistakes, they often rationalized them—blaming “display errors” or dismissing meaningful differences as trivial.

This finding casts doubt on the viability of fully delegating shopping to AI agents without oversight. Your AI may be intelligent, but it’s not always smart.

Bias in the Browser: The Layout Effect

One of the most surprising findings? Position biases are not just strong—they’re model-specific.

Model Prefers Row Column Preference
GPT-4.1 Top Strongly favors Column 1
Claude 4 Top Columns 2 and 3
Gemini 2.5 Flash Top Column 3

A product’s visibility alone can boost its selection rate by up to 5x depending on where it’s placed. If AI shoppers dominate future demand, this means platforms must rethink layout, and sellers must obsess not only over what their listing says—but where it appears.

When Tags Talk: Sponsored vs Endorsed

Badges matter. But not in the way platforms might hope:

  • Sponsored tag hurts selection probability (agents seem to penalize ads)
  • Overall Pick tag boosts selection significantly (agents trust platform endorsement)
  • Scarcity tag has minimal impact (perhaps agents don’t fear missing out)

This reveals a credibility gap: while paid promotions backfire, endorsements like “Overall Pick” act as trust signals. For platforms, this suggests a strategic pivot: from monetizing attention via ads to monetizing credibility via trusted curation.

Price vs Power: How Much Can You Charge?

By computing trade-offs using a conditional logit model, the authors show how much price premium a seller can charge if they gain certain advantages:

Advantage Gained GPT-4.1 Claude 4.0 Gemini 2.5
Top Row Position +91% +113% +17%
+0.1 Star Rating +67% +35% +28%
“Overall Pick” Tag +65% +92% +138%
Double Review Count +37% +19% +17%

In other words, getting an “Overall Pick” endorsement is like doubling your price—without scaring off AI shoppers. This metricization of agent psychology opens up a new domain: algorithmic SEO for AI buyers.

Can Sellers Fight Back? Yes, with Their Own Agents

In a simulation where seller-side AI modified only the product description, 25% of cases saw dramatic gains in market share—sometimes over 20 percentage points. That’s with no change in price, placement, or rating. Just wording.

Think of it as SEO meets behavioral targeting—except now your target is an AI.

Notably, the effectiveness varied across agents. What seduced GPT-4.1 didn’t necessarily impress Gemini. This fragmented landscape suggests the emergence of agent-specific optimization arms races.

Implications: E-Commerce Is Becoming a Meta Game

This study isn’t just about online shopping. It’s about a paradigm shift in delegated decision-making, and the economic rules are being rewritten. Here’s how:

  • For Platforms: Ranking algorithms must now account for machine perception, not human UX. Sponsored listings might need redesign, and credibility tokens like “Overall Pick” may become prime real estate.

  • For Sellers: Listing optimization becomes an AI-native discipline. Expect A/B testing not just for humans, but for bots. A new niche for SaaS tooling is wide open.

  • For Regulators: AI shoppers are agents with biases. Disclosures, auditing, and possibly even fairness mandates may become necessary—especially if market share starts concentrating due to model quirks.

  • For Consumers: Delegation saves time, but at a cost. As with financial advisors, we may soon need a diversified portfolio of AI agents—or risk monoculture consumption.


Cognaptus: Automate the Present, Incubate the Future