Skip to content
🇺🇸

GPTBot robots.txt for Shopify: Practical Examples and Checks

Copy practical Shopify robots.txt examples for GPTBot, OAI-SearchBot, ChatGPT-User, and PerplexityBot — with a checklist for the Shopify paths that should stay crawlable and the ones that should always be blocked.

5 min read

robots.txt is the gate that decides whether AI shopping engines can even read your Shopify store before they cite it. ChatGPT, Perplexity, and Gemini answer commercial queries about Shopify products only when their crawlers (GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot) have actual fetched the page content. Block them in robots.txt and you’re invisible — every other AI-visibility signal (schema, llms.txt, content quality) becomes moot.

This guide ships a practical Shopify robots.txt example calibrated for the AI shopping era: allow the public store content (products, collections, blogs, pages) for the AI crawlers that matter, block the checkout / account / admin paths that don’t, and verify it works using the Robots Analyzer.

What GPTBot access can and can’t do

Can doCan’t do
Crawl public product pages, collections, blogsBypass authentication on private pages
Index content for ChatGPT shopping answersRead content that requires login or app permissions
Read product schema, llms.txt, FAQ, policy pagesIdentify individual buyers or their cart contents
Respect Disallow directives on compliant crawlersEnforce privacy — adversarial scrapers ignore robots.txt
Honor noindex meta directives on crawlable pagesSubstitute for proper Shopify access controls

Concretely: allowing GPTBot is necessary for ChatGPT Shopping visibility, but not sufficient. The crawler reaching the page is the floor. Visibility is gated by content quality + structured data on top of that.

Shopify paths that should stay crawlable

PathWhy
/products/Most-cited URL class for AI shopping queries
/collections/Category-level browse pages; AI uses them for “store sells X” answers
/blogs/Long-form content where AI extracts product context
/pages/faqFAQ content — AI cites for “does X support Y” queries
/pages/shippingShipping policy — AI cites for “ships to X” queries
/pages/returnsReturn policy — AI cites for “what if it doesn’t fit” queries
/pages/size-guideSizing context — AI cites for “what size am I” apparel queries
/pages/ingredientsBeauty ingredient guide — AI cites for compatibility queries
/pages/warrantyElectronics warranty — AI cites for “what’s the warranty” queries
/llms.txtThe compact content map — AI reads it on every crawl
/sitemap.xmlURL discovery — AI uses to find new product pages

Shopify paths that should stay protected

PathWhy
/cartPersonalised state — never makes sense to crawl
/checkoutPayment flow — must be private
/accountLogged-in customer dashboard
/adminShopify admin (already protected by auth, but explicit is better)
/orders/Order history per customer
/apps/<private>Third-party app endpoints that expose private data
Internal search result pagesThin or duplicate content; AI engines downweight crawl-heavy sites
Preview / staging URLsNot for public visibility

The robots.txt example

Drop this into your Shopify theme’s robots.txt.liquid (or robots.txt asset). Review against your specific apps + theme before publishing — this is a starting baseline, not a one-size-fits-all answer.

Shopify AI-crawler robots.txt example txt
# Shopify AI-crawler robots.txt — starter baseline.
# Review against your theme, apps, privacy needs, and policy before
# publishing to production.

User-agent: GPTBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: OAI-SearchBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: ChatGPT-User
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: PerplexityBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: Googlebot
Allow: /
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

Sitemap: https://your-store.myshopify.com/sitemap.xml

Review checklist

Before publishing, run through this. Each item maps to a way Shopify stores commonly leak AI visibility or expose private data through robots.txt.

Shopify AI-crawler robots.txt review checklist txt
Shopify AI-crawler robots.txt review checklist

[ ] Public product pages (/products/) are not blocked.
[ ] Public collection pages (/collections/) are not blocked.
[ ] Public blog (/blogs/) and pages (/pages/) content is not blocked.
[ ] Cart, checkout, account, admin, orders paths stay protected.
[ ] Each AI crawler has its own User-agent block (no shared rules).
[ ] noindex pages remain crawlable (so the crawler can read noindex).
[ ] robots.txt is not the only thing protecting private data —
    authentication handles that.
[ ] Sitemap directive points at the real /sitemap.xml URL.
[ ] Tested in /tools/robots-analyzer after deploying.
[ ] Re-checked after any theme update that touches robots.txt.liquid.

How to install in Shopify

  1. In Shopify admin, go to Online Store → Themes → Edit code.
  2. Under Templates, look for robots.txt.liquid. If it doesn’t exist, click “Add a new template” → “robots” → “.liquid”.
  3. Replace the file contents with the example above (adjusted for your actual store URL and any custom paths).
  4. Save the template.
  5. Verify at https://your-store.myshopify.com/robots.txt that the new content is served (browser cache + Shopify edge cache may take a few minutes to clear).
  6. Paste the robots.txt URL into the Robots Analyzer and confirm GPTBot, OAI-SearchBot, ChatGPT-User, and PerplexityBot all show as “allowed” for /products/ and /collections/.

You’re done. AI crawlers will pick up the updated robots.txt on their next visit (usually within 24 hours). Pair this guide with the fashion llms.txt template (or the beauty/electronics sibling) so once crawlers can reach your store, they find a content map worth reading.

Validation checklist

  • Public product pages are not blocked

    GPTBot, OAI-SearchBot, ChatGPT-User, and PerplexityBot all get `Allow: /products/` (or no explicit Disallow that covers `/products/`).

  • Public collection pages are not blocked

    Same crawlers can reach `/collections/`. Shopify collection pages are the most-cited URL class for shopping queries; blocking them is the #1 self-inflicted AI-visibility wound.

  • Public blog and guide content is not blocked

    Same crawlers can reach `/blogs/` and `/pages/` (where Shopify hosts FAQs, size guides, ingredient pages, and policy pages). AI shopping answers cite blog and guide content heavily.

  • Cart, checkout, account, and admin paths stay protected

    Every AI crawler block has explicit `Disallow: /cart`, `Disallow: /checkout`, `Disallow: /account`, `Disallow: /admin`. These paths leak personalised state when crawled and don't belong in AI shopping answers.

  • robots.txt is not used as a privacy or security mechanism

    Sensitive data (customer info, order details, private app data) is protected by authentication, not by `Disallow`. robots.txt is a crawler hint, not a security boundary.

  • noindex pages stay crawlable

    If a page should be excluded from search, it must remain crawlable (no `Disallow`) so the crawler can read the `<meta name="robots" content="noindex">` directive. Disallowing a noindex page makes it un-checkable and Google may still index it.

  • GPTBot and OAI-SearchBot are handled with separate rule blocks

    Don't share a single `User-agent: GPTBot,OAI-SearchBot` line. The two have different policy semantics (training vs search-time fetch) and merchants may want different rules per crawler.

  • Changes are tested in the Robots Analyzer after publishing

    After deploying the updated robots.txt to the Shopify theme, paste the URL into /tools/robots-analyzer and confirm every AI crawler shows the expected access status.

Run the Robots Analyzer

Prefilled with the Shopify AI-crawler example robots.txt below. Paste your real Shopify robots.txt to compare, or use the prefill to test the recommended baseline against your store.

Frequently asked questions

Does allowing GPTBot guarantee ChatGPT Shopping visibility?

No. Crawler access is the floor, not the ceiling. GPTBot reaching your product page is necessary for it to be indexed by OpenAI's models, but visibility in ChatGPT Shopping also depends on having useful product context (Product schema, llms.txt, accurate descriptions, real reviews). Allow GPTBot, then audit the content it can see — that's the full job.

Should GPTBot and OAI-SearchBot use the same rule block?

Not necessarily. They have different policy semantics: GPTBot is OpenAI's training crawler (its access affects whether your content trains future GPT models), while OAI-SearchBot is the search-time fetcher (its access affects real-time ChatGPT/Bing search answers). Some merchants want to allow search-time access but block training. Treat them as separate policy choices, with separate `User-agent:` blocks.

Can robots.txt protect private Shopify customer data?

No. robots.txt is a crawler instruction, not a security mechanism. Compliant crawlers (GPTBot, Googlebot, etc.) will respect `Disallow`, but adversarial scrapers ignore it entirely. For private customer data, order details, or app-specific endpoints, use Shopify's built-in authentication + access controls. robots.txt is one layer; auth is the actual line.

Should noindex pages be disallowed in robots.txt?

No — this is the most common robots.txt mistake. If you `Disallow:` a noindex page, the crawler never reads the `<meta name="robots" content="noindex">` tag, and the page may still get indexed (Google sometimes infers existence from external links and indexes the URL without crawling). Keep noindex pages crawlable; only block paths that should be invisible to crawlers entirely (cart, checkout, account, admin).

Related resources