Your Shopify Store Is Invisible to AI Agents

AI shopping agents can already buy from your Shopify store. The infrastructure exists. Shopify has shipped a live catalog endpoint that lets any buyer's assistant, on behalf of any customer, query your products, inspect your descriptions, check your pricing, and route toward a purchase. The pipe is open.

What travels through that pipe is your problem now.

We ran a field audit of 2,483 top US Shopify merchants to map the structural gaps in what stores actually expose. Then we pulled a deeper readout: identical signed catalog requests sent to six apparel merchant endpoints, same query, same context, so we could compare what each one actually returns. The picture is consistent. Most stores are technically reachable and practically useless to a shopping assistant trying to make a real recommendation.

The protocol is already on

The Universal Commerce Protocol gives every Shopify merchant a standardized interface to AI agent surfaces. When an agent wants to shop your store, it sends a structured request to your UCP endpoint asking for products matching a query, along with context like country, currency, and pagination. Your store responds with a defined set of product fields. The agent uses those fields to rank, filter, compare, and recommend.

Every merchant on a current Shopify plan has this endpoint active. The schema every merchant returns is identical: product ID, title, handle, description, URL, price range, compare-at price range, options, variants, images, tags, and a category taxonomy field. Same shape, every store.

The competitive variable is what each merchant chooses to fill in.

What we measured

For the deep audit, we sent the same request to six apparel merchant endpoints: a signed catalog query for "jacket," US context, top three products. We measured description length, checked whether real compare-at pricing came through, catalogued tag conventions, and looked for any review or rating data anywhere in the response payload.

We grepped every response for the words review, rating, star, and score. Across all six apparel merchants, across every product returned, we got zero hits. Zero. Every one of those brands drops their social proof signal at the agent boundary.

Gymshark renders star ratings on every product page; Taylor Stitch carries them; UNRL has them too. A customer shopping through an AI-assisted experience sees none of that accumulated social proof. The system has no basis to say "this jacket has 4.8 stars across 2,000 reviews" because the data simply does not arrive at the agent layer.

This is the largest shared gap in the audit, and it affects every merchant on the protocol today.

Same schema, very different substance

Set reviews aside for a moment and look at what merchants actually ship inside the fields that do exist.

Description length ranged from 146 characters at Everlane to 828 at Oliver Sweeney, a 4.8x spread. A 146-character description is a tagline. Oliver Sweeney's 828-character descriptions cover fit, fabric, occasion, and construction in enough detail that a shopping agent can actually reason over them. A query like "is this jacket suitable for a business casual setting?" finds traction in one of those descriptions and none in the other.

Useful tags ranged from 1 to 55 per SKU. Half the merchants in our broader audit left image alt text completely blank, which means half the catalog returned to any multimodal agent arrives with no image-to-language bridge at all.

Same protocol fields. Wildly different data density.

Six brands, one honest comparison

Cuyana is a useful case study in the floor. Their titles are clean, color and size options are properly structured, and the taxonomy is correct. But the data behind the title is starved. Descriptions run 161 characters. One SKU came back with a tags array that was literally empty; another carried a single tag, "clothing." A buyer's assistant trying to rank Cuyana against competitors on anything except price and image has almost nothing to work with.

Everlane sits in the middle on nearly every dimension. Their descriptions run around 146 characters, but their tags use a parseable colon-namespaced convention: "fabric: cotton," "category: outerwear." They are one of the few merchants where the compare-at field carries an actual value, which means the "was $248, now $X" framing survives into the shopping session intact. Their weakness is that color lives in the title ("Field Jacket | Black") rather than as a structured variant option, making it invisible to any faceted query for a specific shade.

Gymshark has the richest tag taxonomy of the group by a wide margin, 55 per product, with feature, activity, fit, garment length, and season prefixes that function like a structured product graph. A query for "breathable, lightweight jackets for running" can actually be parsed against those tags. Two things undercut it: no category taxonomy (the one merchant in the audit missing it), and no alt text on images. More importantly, Gymshark's on-site star ratings vanish entirely at the UCP layer.

Oliver Sweeney earns the prize for description depth. 828 characters of actual copy, structured Size and Colour options, working compare-at pricing. Their tag conventions use a Key_Value format that is machine-readable. Every description is a genuine reasoning surface with enough substance to support a purchase decision. What that still leaves out of the agent feed is the subject of a worked example built around one of their jackets.

Taylor Stitch runs the cleanest tag schema of the six. Title-cased semantic prefixes: "Material: Corduroy," "Construction: Woven," "Primary Color: Tan." Mid-length descriptions, working compare-at, alt text present. Their weakness is that shade selection appears in the title and in tags but never as a structured variant option, so swatch-level queries require string matching rather than a direct field lookup. Across all six, they come closest to "structured attributes by convention."

UNRL has working compare-at, proper color options, and proper variant media. Their tags, though, are almost entirely merchandising flags: BFCM-25-ALL, LD25-20, MDW-SALE, RPT_Athleisure. These are notes-to-self. An agent cannot use "BFCM-25-ALL" to answer any customer question about the product. Notably, UNRL even has a tag pointing at their external review widget, but ships no actual review data alongside it. The pointer exists; the payload does not.

The bar, set by a hat store

The closest thing to agent-ready in our audit is DapperFam, a specialty hat and accessories retailer. Their feed does things that none of the six apparel brands do.

On a single SKU, they expose three structured option axes: Color, Accessory size, and Fabric. A query for a Panama hat in Grade 3 fabric, size medium, resolves directly against a structured field, with no string parsing required. Every apparel merchant in the audit maxes out at two axes, typically just sizing and occasionally a shade option, so a three-axis feed is a meaningful gap.

More consequentially, DapperFam promotes material to a structured option. For every other merchant in the audit, fabric lives in description prose or, at best, in a free-form tag. Making it a selectable, queryable dimension at the variant level is a structural choice, and it is the right one for a catalog aimed at agent surfaces.

Their variant-level compare-at pricing is also populated, which makes them the only merchant in the apparel and accessories set where an agent can render accurate strike-through prices broken out by size and color. Every apparel merchant returns null on that dimension, even when the product is visibly on sale on their own site.

DapperFam also surfaces a harder truth about agentic readiness. Their own house-brand catalog carries structured tags with consistent prefixes: brim_size, gender, material, occasion, shape. Their Shopify Collective listings, sourced through the dropship marketplace, carry none of that. The structured tagging applies to what the merchant authors directly; Collective SKUs pass through as-is, with vendor names and price bands in place of product attributes. Sourcing decisions shape feed quality just as much as authoring decisions do. Any merchant onboarding through Collective inherits a data quality cliff at exactly the point where that dropship catalog enters the agent response.

When the merchant's own intent leaks out

Our audit of Lover's Tempo, a jewelry brand, surfaces a different category of problem. Their UCP endpoint is fully wired up. The catalog search works. But when we ran an empty-query request, the top five products returned were wholesale B2B prepacks priced between $22 and $2,316, every one carrying explicit "hidden" and "b2b" tags.

The merchant has done the data work. They have tagged these products correctly for their own admin purposes. The UCP projection simply ignores those tags. An AI agent doing exploratory discovery against Lover's Tempo lands on wholesale bundles before it finds any consumer product.

The same leak persists on real queries. Category searches for rings and necklaces surface products the merchant has explicitly flagged as not-for-retail-customers in the top five results. Readiness involves the projection layer, not just what the merchant authors into their product records.

There is a notable bright spot: Lover's Tempo is the only jewelry brand in the audit where variant-level sale pricing is actually populated. For products on sale, each ring size carries its own before-and-after price. Every apparel merchant in the audit returns null on this field, even for products visibly discounted on their own site. The discipline is there; it just hasn't been applied to the projection layer.

What every brand can fix in a week

None of these require protocol changes. Every field involved already exists in the schema. The work is deciding what you actually want to ship through it, then doing it.

Write descriptions that differentiate. A 146-character description answers no customer questions. Gymshark and Oliver Sweeney give a shopping assistant five times more substance to work with. Copy that addresses fit, fabric, occasion, and construction will rank over copy that reads like a warehouse label, because the system weights merchants whose descriptions actually answer the query.
Carry compare-at pricing at the product level. Four of the six apparel merchants have real values in the product-level compare-at slot. The per-variant equivalent is null across all six. Any integration consuming the granular pricing data will silently drop the "was X, now Y" framing on every promotional SKU, so set the product-level value and verify it actually populates.
Populate image alt text. Half the merchants in our broader 2,483-merchant scan leave it blank, which means half their catalog images arrive at the agent layer with no language attached. Alt text is the single most useful image-to-language pairing for multimodal reasoning, and the cheapest fix on this list by far.
Promote shade to a structured option. Everlane, Gymshark, and Taylor Stitch all encode color in the title or in tags rather than as a structured option. The consequence is that any query for a specific colorway has to fall back to string matching, which is fragile and breaks whenever title formatting varies. Structured options are directly queryable and require no per-merchant parsing logic.
Pick one tag namespace and enforce it. Cuyana's "clothing" and UNRL's "BFCM-25-ALL" are internal signals that carry no meaning for a buyer's assistant. Taylor Stitch's "Material: Corduroy" and Gymshark's "feature:breathable" do useful work for any system trying to match a natural-language query against the catalog. Decide whether you want "Key: Value," "key:value," or "Key_Value," apply it catalog-wide, and strip everything else at the projection boundary.
Tag product attributes, not operational state. Promotional cohorts, fulfillment routing flags, and report buckets belong in your internal systems. What ships over the protocol endpoint should describe the actual item: material, feature, fit, season, activity, the vocabulary a shopping assistant reasons over when deciding which product to surface to a buyer. Promo flags and ERP codes add noise without adding any signal a customer would recognize.

What the data actually says

A composite best-in-show is still hypothetical. No merchant in our audit carries more than three of the desirable patterns simultaneously. Gymshark's feature taxonomy, Taylor Stitch's prefix discipline, Oliver Sweeney's description depth, Everlane's compare-at fidelity, DapperFam's material option, Lover's Tempo's per-SKU sale pricing: each brand owns one or two of these. None owns all of them, which means the ceiling is open.

The protocol works as designed, giving every Shopify merchant equal access to the emerging layer of shopping surfaces. What separates agent-transactable stores from the technically reachable but commercially invisible ones is the quality of data each merchant chooses to ship. That distinction is becoming more consequential: AI-referred buyers already convert 42% above baseline and spend 14% more per order, so the stores best-positioned in the agent feed are capturing the highest-value traffic in the channel.

Every item on the fix list above is addressable in a week of focused catalog work. That work is genuinely worth doing. Catalog hygiene is table stakes, though, and table stakes do not compound. The brands that build durable advantage in agentic commerce ship more than product attributes through the pipeline. They ship reasoning: fit guidance, substitution logic when a size is gone, return policy framed for a confidence-assessing buyer, brand voice. That layer lives above the protocol schema entirely, and it is what the brand-truth layer is built to carry.

If you want to see what your store actually looks like through a live UCP request, we will run the audit for you. Request early access.

Sumit Jagdale is the founder of Sartorial.