GPTBot robots.txt for Shopify: Practical Examples and Checks
Copy practical Shopify robots.txt examples for GPTBot, OAI-SearchBot, ChatGPT-User, and PerplexityBot — with a checklist for the Shopify paths that should stay crawlable and the ones that should always be blocked.
robots.txt is the gate that decides whether AI shopping engines can
even read your Shopify store before they cite it. ChatGPT, Perplexity,
and Gemini answer commercial queries about Shopify products only when
their crawlers (GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot)
have actual fetched the page content. Block them in robots.txt and
you’re invisible — every other AI-visibility signal (schema, llms.txt,
content quality) becomes moot.
This guide ships a practical Shopify robots.txt example calibrated for the AI shopping era: allow the public store content (products, collections, blogs, pages) for the AI crawlers that matter, block the checkout / account / admin paths that don’t, and verify it works using the Robots Analyzer.
What GPTBot access can and can’t do
| Can do | Can’t do |
|---|---|
| Crawl public product pages, collections, blogs | Bypass authentication on private pages |
| Index content for ChatGPT shopping answers | Read content that requires login or app permissions |
| Read product schema, llms.txt, FAQ, policy pages | Identify individual buyers or their cart contents |
Respect Disallow directives on compliant crawlers | Enforce privacy — adversarial scrapers ignore robots.txt |
Honor noindex meta directives on crawlable pages | Substitute for proper Shopify access controls |
Concretely: allowing GPTBot is necessary for ChatGPT Shopping visibility, but not sufficient. The crawler reaching the page is the floor. Visibility is gated by content quality + structured data on top of that.
Shopify paths that should stay crawlable
| Path | Why |
|---|---|
/products/ | Most-cited URL class for AI shopping queries |
/collections/ | Category-level browse pages; AI uses them for “store sells X” answers |
/blogs/ | Long-form content where AI extracts product context |
/pages/faq | FAQ content — AI cites for “does X support Y” queries |
/pages/shipping | Shipping policy — AI cites for “ships to X” queries |
/pages/returns | Return policy — AI cites for “what if it doesn’t fit” queries |
/pages/size-guide | Sizing context — AI cites for “what size am I” apparel queries |
/pages/ingredients | Beauty ingredient guide — AI cites for compatibility queries |
/pages/warranty | Electronics warranty — AI cites for “what’s the warranty” queries |
/llms.txt | The compact content map — AI reads it on every crawl |
/sitemap.xml | URL discovery — AI uses to find new product pages |
Shopify paths that should stay protected
| Path | Why |
|---|---|
/cart | Personalised state — never makes sense to crawl |
/checkout | Payment flow — must be private |
/account | Logged-in customer dashboard |
/admin | Shopify admin (already protected by auth, but explicit is better) |
/orders/ | Order history per customer |
/apps/<private> | Third-party app endpoints that expose private data |
| Internal search result pages | Thin or duplicate content; AI engines downweight crawl-heavy sites |
| Preview / staging URLs | Not for public visibility |
The robots.txt example
Drop this into your Shopify theme’s robots.txt.liquid (or robots.txt
asset). Review against your specific apps + theme before publishing —
this is a starting baseline, not a one-size-fits-all answer.
# Shopify AI-crawler robots.txt — starter baseline.
# Review against your theme, apps, privacy needs, and policy before
# publishing to production.
User-agent: GPTBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/
User-agent: OAI-SearchBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/
User-agent: ChatGPT-User
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/
User-agent: PerplexityBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/
User-agent: Googlebot
Allow: /
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/
Sitemap: https://your-store.myshopify.com/sitemap.xml Review checklist
Before publishing, run through this. Each item maps to a way Shopify stores commonly leak AI visibility or expose private data through robots.txt.
Shopify AI-crawler robots.txt review checklist
[ ] Public product pages (/products/) are not blocked.
[ ] Public collection pages (/collections/) are not blocked.
[ ] Public blog (/blogs/) and pages (/pages/) content is not blocked.
[ ] Cart, checkout, account, admin, orders paths stay protected.
[ ] Each AI crawler has its own User-agent block (no shared rules).
[ ] noindex pages remain crawlable (so the crawler can read noindex).
[ ] robots.txt is not the only thing protecting private data —
authentication handles that.
[ ] Sitemap directive points at the real /sitemap.xml URL.
[ ] Tested in /tools/robots-analyzer after deploying.
[ ] Re-checked after any theme update that touches robots.txt.liquid. How to install in Shopify
- In Shopify admin, go to Online Store → Themes → Edit code.
- Under Templates, look for
robots.txt.liquid. If it doesn’t exist, click “Add a new template” → “robots” → “.liquid”. - Replace the file contents with the example above (adjusted for your actual store URL and any custom paths).
- Save the template.
- Verify at
https://your-store.myshopify.com/robots.txtthat the new content is served (browser cache + Shopify edge cache may take a few minutes to clear). - Paste the robots.txt URL into the
Robots Analyzer
and confirm GPTBot, OAI-SearchBot, ChatGPT-User, and PerplexityBot
all show as “allowed” for
/products/and/collections/.
You’re done. AI crawlers will pick up the updated robots.txt on their next visit (usually within 24 hours). Pair this guide with the fashion llms.txt template (or the beauty/electronics sibling) so once crawlers can reach your store, they find a content map worth reading.
Validation checklist
Public product pages are not blocked
GPTBot, OAI-SearchBot, ChatGPT-User, and PerplexityBot all get `Allow: /products/` (or no explicit Disallow that covers `/products/`).
Public collection pages are not blocked
Same crawlers can reach `/collections/`. Shopify collection pages are the most-cited URL class for shopping queries; blocking them is the #1 self-inflicted AI-visibility wound.
Public blog and guide content is not blocked
Same crawlers can reach `/blogs/` and `/pages/` (where Shopify hosts FAQs, size guides, ingredient pages, and policy pages). AI shopping answers cite blog and guide content heavily.
Cart, checkout, account, and admin paths stay protected
Every AI crawler block has explicit `Disallow: /cart`, `Disallow: /checkout`, `Disallow: /account`, `Disallow: /admin`. These paths leak personalised state when crawled and don't belong in AI shopping answers.
robots.txt is not used as a privacy or security mechanism
Sensitive data (customer info, order details, private app data) is protected by authentication, not by `Disallow`. robots.txt is a crawler hint, not a security boundary.
noindex pages stay crawlable
If a page should be excluded from search, it must remain crawlable (no `Disallow`) so the crawler can read the `<meta name="robots" content="noindex">` directive. Disallowing a noindex page makes it un-checkable and Google may still index it.
GPTBot and OAI-SearchBot are handled with separate rule blocks
Don't share a single `User-agent: GPTBot,OAI-SearchBot` line. The two have different policy semantics (training vs search-time fetch) and merchants may want different rules per crawler.
Changes are tested in the Robots Analyzer after publishing
After deploying the updated robots.txt to the Shopify theme, paste the URL into /tools/robots-analyzer and confirm every AI crawler shows the expected access status.
Run the Robots Analyzer
Prefilled with the Shopify AI-crawler example robots.txt below. Paste your real Shopify robots.txt to compare, or use the prefill to test the recommended baseline against your store.
Frequently asked questions
Does allowing GPTBot guarantee ChatGPT Shopping visibility?
No. Crawler access is the floor, not the ceiling. GPTBot reaching your product page is necessary for it to be indexed by OpenAI's models, but visibility in ChatGPT Shopping also depends on having useful product context (Product schema, llms.txt, accurate descriptions, real reviews). Allow GPTBot, then audit the content it can see — that's the full job.
Should GPTBot and OAI-SearchBot use the same rule block?
Not necessarily. They have different policy semantics: GPTBot is OpenAI's training crawler (its access affects whether your content trains future GPT models), while OAI-SearchBot is the search-time fetcher (its access affects real-time ChatGPT/Bing search answers). Some merchants want to allow search-time access but block training. Treat them as separate policy choices, with separate `User-agent:` blocks.
Can robots.txt protect private Shopify customer data?
No. robots.txt is a crawler instruction, not a security mechanism. Compliant crawlers (GPTBot, Googlebot, etc.) will respect `Disallow`, but adversarial scrapers ignore it entirely. For private customer data, order details, or app-specific endpoints, use Shopify's built-in authentication + access controls. robots.txt is one layer; auth is the actual line.
Should noindex pages be disallowed in robots.txt?
No — this is the most common robots.txt mistake. If you `Disallow:` a noindex page, the crawler never reads the `<meta name="robots" content="noindex">` tag, and the page may still get indexed (Google sometimes infers existence from external links and indexes the URL without crawling). Keep noindex pages crawlable; only block paths that should be invisible to crawlers entirely (cart, checkout, account, admin).
Related resources
Fashion Shopify llms.txt template
Sibling resource — robots.txt and llms.txt work together. Allow the crawler in robots.txt, then publish llms.txt so it knows what to read.
Product Schema example for Shopify apparel
Once GPTBot can reach the product page (this guide), the JSON-LD schema is what AI shopping engines actually parse to cite the product.
Robots.txt Analyzer
Paste your robots.txt URL or content to verify every AI crawler has the access status you intend — before and after deploying changes.
Shopify AI Visibility Optimizer
The full AI-visibility stack — crawler policy is one layer alongside schema, content map, and citation monitoring.
llms.txt for Shopify — full guide
Once robots.txt allows AI crawlers to reach your content, llms.txt is the compact navigation map they read to understand the store.
AI Crawler User-Agent List
After you fix robots.txt for GPTBot, this is the full vendor-sourced reference for every other AI crawler — what to allow, what to block, and which ones ignore robots.txt anyway.