GPTBot robots.txt for Shopify: Practical Examples and Checks

robots.txt is the gate that decides whether AI shopping engines can even read your Shopify store before they cite it. ChatGPT, Perplexity, and Gemini answer commercial queries about Shopify products only when their crawlers (GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot) have actual fetched the page content. Block them in robots.txt and you’re invisible — every other AI-visibility signal (schema, llms.txt, content quality) becomes moot.

This guide ships a practical Shopify robots.txt example calibrated for the AI shopping era: allow the public store content (products, collections, blogs, pages) for the AI crawlers that matter, block the checkout / account / admin paths that don’t, and verify it works using the Robots Analyzer.

What GPTBot access can and can’t do

Can do	Can’t do
Crawl public product pages, collections, blogs	Bypass authentication on private pages
Index content for ChatGPT shopping answers	Read content that requires login or app permissions
Read product schema, llms.txt, FAQ, policy pages	Identify individual buyers or their cart contents
Respect `Disallow` directives on compliant crawlers	Enforce privacy — adversarial scrapers ignore robots.txt
Honor `noindex` meta directives on crawlable pages	Substitute for proper Shopify access controls

Concretely: allowing GPTBot is necessary for ChatGPT Shopping visibility, but not sufficient. The crawler reaching the page is the floor. Visibility is gated by content quality + structured data on top of that.

Shopify paths that should stay crawlable

Path	Why
`/products/`	Most-cited URL class for AI shopping queries
`/collections/`	Category-level browse pages; AI uses them for “store sells X” answers
`/blogs/`	Long-form content where AI extracts product context
`/pages/faq`	FAQ content — AI cites for “does X support Y” queries
`/pages/shipping`	Shipping policy — AI cites for “ships to X” queries
`/pages/returns`	Return policy — AI cites for “what if it doesn’t fit” queries
`/pages/size-guide`	Sizing context — AI cites for “what size am I” apparel queries
`/pages/ingredients`	Beauty ingredient guide — AI cites for compatibility queries
`/pages/warranty`	Electronics warranty — AI cites for “what’s the warranty” queries
`/llms.txt`	The compact content map — AI reads it on every crawl
`/sitemap.xml`	URL discovery — AI uses to find new product pages

Shopify paths that should stay protected

Path	Why
`/cart`	Personalised state — never makes sense to crawl
`/checkout`	Payment flow — must be private
`/account`	Logged-in customer dashboard
`/admin`	Shopify admin (already protected by auth, but explicit is better)
`/orders/`	Order history per customer
`/apps/<private>`	Third-party app endpoints that expose private data
Internal search result pages	Thin or duplicate content; AI engines downweight crawl-heavy sites
Preview / staging URLs	Not for public visibility

The robots.txt example

Drop this into your Shopify theme’s robots.txt.liquid (or robots.txt asset). Review against your specific apps + theme before publishing — this is a starting baseline, not a one-size-fits-all answer.

Shopify AI-crawler robots.txt example txt

# Shopify AI-crawler robots.txt — starter baseline.
# Review against your theme, apps, privacy needs, and policy before
# publishing to production.

User-agent: GPTBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: OAI-SearchBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: ChatGPT-User
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: PerplexityBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: Googlebot
Allow: /
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

Sitemap: https://your-store.myshopify.com/sitemap.xml

Review checklist

Before publishing, run through this. Each item maps to a way Shopify stores commonly leak AI visibility or expose private data through robots.txt.

Shopify AI-crawler robots.txt review checklist txt

Shopify AI-crawler robots.txt review checklist

[ ] Public product pages (/products/) are not blocked.
[ ] Public collection pages (/collections/) are not blocked.
[ ] Public blog (/blogs/) and pages (/pages/) content is not blocked.
[ ] Cart, checkout, account, admin, orders paths stay protected.
[ ] Each AI crawler has its own User-agent block (no shared rules).
[ ] noindex pages remain crawlable (so the crawler can read noindex).
[ ] robots.txt is not the only thing protecting private data —
    authentication handles that.
[ ] Sitemap directive points at the real /sitemap.xml URL.
[ ] Tested in /tools/robots-analyzer after deploying.
[ ] Re-checked after any theme update that touches robots.txt.liquid.

How to install in Shopify

In Shopify admin, go to Online Store → Themes → Edit code.
Under Templates, look for robots.txt.liquid. If it doesn’t exist, click “Add a new template” → “robots” → “.liquid”.
Replace the file contents with the example above (adjusted for your actual store URL and any custom paths).
Save the template.
Verify at https://your-store.myshopify.com/robots.txt that the new content is served (browser cache + Shopify edge cache may take a few minutes to clear).
Paste the robots.txt URL into the Robots Analyzer and confirm GPTBot, OAI-SearchBot, ChatGPT-User, and PerplexityBot all show as “allowed” for /products/ and /collections/.

You’re done. AI crawlers will pick up the updated robots.txt on their next visit (usually within 24 hours). Pair this guide with the fashion llms.txt template (or the beauty/electronics sibling) so once crawlers can reach your store, they find a content map worth reading.

Validation checklist

Public product pages are not blocked

GPTBot, OAI-SearchBot, ChatGPT-User, and PerplexityBot all get `Allow: /products/` (or no explicit Disallow that covers `/products/`).

Public collection pages are not blocked

Same crawlers can reach `/collections/`. Shopify collection pages are the most-cited URL class for shopping queries; blocking them is the #1 self-inflicted AI-visibility wound.

Public blog and guide content is not blocked

Same crawlers can reach `/blogs/` and `/pages/` (where Shopify hosts FAQs, size guides, ingredient pages, and policy pages). AI shopping answers cite blog and guide content heavily.

Cart, checkout, account, and admin paths stay protected

Every AI crawler block has explicit `Disallow: /cart`, `Disallow: /checkout`, `Disallow: /account`, `Disallow: /admin`. These paths leak personalised state when crawled and don't belong in AI shopping answers.

robots.txt is not used as a privacy or security mechanism

Sensitive data (customer info, order details, private app data) is protected by authentication, not by `Disallow`. robots.txt is a crawler hint, not a security boundary.

noindex pages stay crawlable

If a page should be excluded from search, it must remain crawlable (no `Disallow`) so the crawler can read the `<meta name="robots" content="noindex">` directive. Disallowing a noindex page makes it un-checkable and Google may still index it.

GPTBot and OAI-SearchBot are handled with separate rule blocks

Don't share a single `User-agent: GPTBot,OAI-SearchBot` line. The two have different policy semantics (training vs search-time fetch) and merchants may want different rules per crawler.

Changes are tested in the Robots Analyzer after publishing

After deploying the updated robots.txt to the Shopify theme, paste the URL into /tools/robots-analyzer and confirm every AI crawler shows the expected access status.

Frequently asked questions

Does allowing GPTBot guarantee ChatGPT Shopping visibility?

No. Crawler access is the floor, not the ceiling. GPTBot reaching your product page is necessary for it to be indexed by OpenAI's models, but visibility in ChatGPT Shopping also depends on having useful product context (Product schema, llms.txt, accurate descriptions, real reviews). Allow GPTBot, then audit the content it can see — that's the full job.

Should GPTBot and OAI-SearchBot use the same rule block?

Not necessarily. They have different policy semantics: GPTBot is OpenAI's training crawler (its access affects whether your content trains future GPT models), while OAI-SearchBot is the search-time fetcher (its access affects real-time ChatGPT/Bing search answers). Some merchants want to allow search-time access but block training. Treat them as separate policy choices, with separate `User-agent:` blocks.

Can robots.txt protect private Shopify customer data?

No. robots.txt is a crawler instruction, not a security mechanism. Compliant crawlers (GPTBot, Googlebot, etc.) will respect `Disallow`, but adversarial scrapers ignore it entirely. For private customer data, order details, or app-specific endpoints, use Shopify's built-in authentication + access controls. robots.txt is one layer; auth is the actual line.

Should noindex pages be disallowed in robots.txt?

No — this is the most common robots.txt mistake. If you `Disallow:` a noindex page, the crawler never reads the `<meta name="robots" content="noindex">` tag, and the page may still get indexed (Google sometimes infers existence from external links and indexes the URL without crawling). Keep noindex pages crawlable; only block paths that should be invisible to crawlers entirely (cart, checkout, account, admin).

What GPTBot access can and can’t do#

Shopify paths that should stay crawlable#

Shopify paths that should stay protected#

The robots.txt example#

Review checklist#

How to install in Shopify#