Shopify 的 GPTBot robots.txt:实用示例和检查

robots.txt 是决定 AI 购物引擎在引用你 Shopify 店铺之前能否读到它的门。 ChatGPT、Perplexity、Gemini 仅在它们的爬虫(GPTBot、OAI-SearchBot、 ChatGPT-User、PerplexityBot)实际抓取过页面内容后,才回答关于 Shopify 产品的商业查询。在 robots.txt 中屏蔽它们,你就消失了 —— 其他所有 AI 可见性信号(schema、llms.txt、内容质量)都变得无意义。

这份指南提供一份为 AI 购物时代校准的实用 Shopify robots.txt 示例:为重要的 AI 爬虫允许公开店铺内容(产品、合集、博客、页面),屏蔽不该的 checkout / account / admin 路径,并用 Robots 分析器验证。

GPTBot 访问能和不能做什么

能做	不能做
抓取公开产品页、合集、博客	绕过私有页面的身份验证
为 ChatGPT 购物答案索引内容	读取需要登录或 app 权限的内容
读取 product schema、llms.txt、FAQ、政策页	识别个别买家或他们的购物车内容
在合规爬虫上尊重 `Disallow` 指令	强制隐私 —— 对抗式抓取器忽略 robots.txt
尊重可抓取页面上的 `noindex` meta 指令	替代恰当的 Shopify 访问控制

具体地:允许 GPTBot 是 ChatGPT Shopping 可见性的必要条件,但不充分。爬虫到达页面是地板。可见性是这之上由内容质量 + 结构化数据闸控的。

应保持可抓取的 Shopify 路径

路径	为什么
`/products/`	AI 购物查询中被引用最多的 URL 类别
`/collections/`	品类层面的浏览页;AI 用它回答「店铺卖 X」的查询
`/blogs/`	长篇内容,AI 从中提取产品语境
`/pages/faq`	FAQ 内容 —— AI 在「X 是否支持 Y」查询中引用
`/pages/shipping`	运输政策 —— AI 在「能寄到 X 吗」查询中引用
`/pages/returns`	退货政策 —— AI 在「不合适怎么办」查询中引用
`/pages/size-guide`	尺码语境 —— AI 在「我穿什么码」服装查询中引用
`/pages/ingredients`	美妆成分指南 —— AI 在兼容性查询中引用
`/pages/warranty`	电子产品保修 —— AI 在「保修是什么」查询中引用
`/llms.txt`	紧凑内容地图 —— AI 每次抓取都读它
`/sitemap.xml`	URL 发现 —— AI 用来找新产品页

应保持受保护的 Shopify 路径

路径	为什么
`/cart`	个性化状态 —— 抓取它没有意义
`/checkout`	支付流程 —— 必须私密
`/account`	已登录客户面板
`/admin`	Shopify admin(已被身份验证保护,但显式更好)
`/orders/`	按客户划分的订单历史
`/apps/<private>`	暴露私有数据的第三方 app 端点
内部搜索结果页	薄弱或重复内容;AI 引擎会降权抓取繁重的站点
预览 / staging URL	不用于公开可见

robots.txt 示例

放进你 Shopify 主题的 robots.txt.liquid(或 robots.txt 资产)。发布前针对你具体的 apps + 主题审核 —— 这是起点基线,不是一刀切的答案。

Shopify AI 爬虫 robots.txt 示例 txt

# Shopify AI-crawler robots.txt — starter baseline.
# Review against your theme, apps, privacy needs, and policy before
# publishing to production.

User-agent: GPTBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: OAI-SearchBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: ChatGPT-User
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: PerplexityBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: Googlebot
Allow: /
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

Sitemap: https://your-store.myshopify.com/sitemap.xml

审核清单

发布前过一遍这个清单。每一项都对应一种 Shopify 店铺常见的通过 robots.txt 泄露 AI 可见性或暴露私人数据的方式。

Shopify AI 爬虫 robots.txt 审核清单 txt

Shopify AI-crawler robots.txt review checklist

[ ] Public product pages (/products/) are not blocked.
[ ] Public collection pages (/collections/) are not blocked.
[ ] Public blog (/blogs/) and pages (/pages/) content is not blocked.
[ ] Cart, checkout, account, admin, orders paths stay protected.
[ ] Each AI crawler has its own User-agent block (no shared rules).
[ ] noindex pages remain crawlable (so the crawler can read noindex).
[ ] robots.txt is not the only thing protecting private data —
    authentication handles that.
[ ] Sitemap directive points at the real /sitemap.xml URL.
[ ] Tested in /tools/robots-analyzer after deploying.
[ ] Re-checked after any theme update that touches robots.txt.liquid.

在 Shopify 中安装

在 Shopify 后台,前往 在线商店 → 主题 → 编辑代码。
在 Templates 下找 robots.txt.liquid。如果不存在,点「添加新模板」→ 「robots」→「.liquid」。
把文件内容替换为上面的示例(根据你实际的店铺 URL 和任何自定义路径调整)。
保存模板。
在 https://your-store.myshopify.com/robots.txt 验证新内容被提供 (浏览器缓存 + Shopify 边缘缓存可能要几分钟才清理)。
把 robots.txt URL 粘贴到 Robots 分析器, 确认 GPTBot、OAI-SearchBot、ChatGPT-User 和 PerplexityBot 都对 /products/ 和 /collections/ 显示「allowed」。

完成。AI 爬虫将在下一次访问时(通常 24 小时内)pick up 更新后的 robots.txt。把这份指南和 fashion llms.txt 模板 (或 beauty/electronics 兄弟模板)配套使用,这样爬虫到达你的店铺时, 就能找到一份值得读的内容地图。

校验清单

公开产品页未被屏蔽

GPTBot、OAI-SearchBot、ChatGPT-User 和 PerplexityBot 都获得 `Allow: /products/`(或没有覆盖 `/products/` 的显式 Disallow)。

公开合集页未被屏蔽

同样的爬虫能访问 `/collections/`。Shopify 合集页是购物查询里被引用最多的 URL 类别;屏蔽它是头号自损 AI 可见性。

公开博客和指南内容未被屏蔽

同样的爬虫能访问 `/blogs/` 和 `/pages/`(Shopify 在这里托管 FAQ、尺寸表、成分页、政策页)。AI 购物答案会大量引用博客和指南内容。

购物车、结账、账户、admin 路径保持受保护

每个 AI 爬虫块都有显式的 `Disallow: /cart`、`Disallow: /checkout`、`Disallow: /account`、`Disallow: /admin`。这些路径在被抓取时会泄露个性化状态,不应进入 AI 购物答案。

robots.txt 不作为隐私或安全机制使用

敏感数据(客户信息、订单详情、私有 app 数据)由身份验证保护,而非 `Disallow`。robots.txt 是爬虫提示,不是安全边界。

noindex 页面保持可抓取

如果一个页面应被排除在搜索之外,它必须保持可抓取(无 `Disallow`),爬虫才能读到 `<meta name="robots" content="noindex">` 指令。Disallow 一个 noindex 页面会让它无法被检查,Google 可能仍会索引它。

GPTBot 和 OAI-SearchBot 用各自的规则块处理

不要共享一条 `User-agent: GPTBot,OAI-SearchBot` 行。两者有不同的策略语义(训练 vs 搜索时取),商家可能想给每个爬虫不同规则。

发布后在 Robots 分析器中测试

把更新后的 robots.txt 部署到 Shopify 主题后,把 URL 粘贴到 /tools/robots-analyzer,确认每个 AI 爬虫显示预期的访问状态。

常见问题

允许 GPTBot 能保证 ChatGPT Shopping 可见性吗?

不能。爬虫访问是地板,不是天花板。GPTBot 能到达你的产品页是它被 OpenAI 模型索引的必要条件,但 ChatGPT Shopping 的可见性还取决于有用的产品语境(Product schema、llms.txt、准确的描述、真实评论)。允许 GPTBot,然后审核它能看到什么 —— 这才是完整工作。

GPTBot 和 OAI-SearchBot 应该用同一规则块吗?

未必。它们策略语义不同:GPTBot 是 OpenAI 的训练爬虫(它的访问影响你的内容是否训练未来的 GPT 模型),而 OAI-SearchBot 是搜索时取数器(它的访问影响实时 ChatGPT/Bing 搜索答案)。一些商家想允许搜索时访问但屏蔽训练。把它们当作各自独立的策略选择,用各自独立的 `User-agent:` 块。

robots.txt 能保护 Shopify 私人客户数据吗?

不能。robots.txt 是爬虫指令,不是安全机制。合规爬虫(GPTBot、Googlebot 等)会尊重 `Disallow`,但对抗式抓取器完全忽略它。对于私人客户数据、订单详情或 app 特定端点,使用 Shopify 内置的身份验证 + 访问控制。robots.txt 是一层;身份验证才是真正的防线。

noindex 页面应该在 robots.txt 里 disallow 吗?

不应该 —— 这是最常见的 robots.txt 错误。如果你 `Disallow:` 一个 noindex 页面,爬虫永远读不到 `<meta name="robots" content="noindex">` 标签,页面可能仍被索引(Google 有时从外链推断存在性、不抓取也索引 URL)。让 noindex 页面保持可抓取;只屏蔽应该完全对爬虫不可见的路径(cart、checkout、account、admin)。

GPTBot 访问能和不能做什么#

应保持可抓取的 Shopify 路径#

应保持受保护的 Shopify 路径#

robots.txt 示例#

审核清单#

在 Shopify 中安装#