Shopify 的 GPTBot robots.txt:實用範例和檢查

robots.txt 是決定 AI 購物引擎在引用你 Shopify 店家之前能否讀到它的門。 ChatGPT、Perplexity、Gemini 僅在它們的爬蟲(GPTBot、OAI-SearchBot、 ChatGPT-User、PerplexityBot)實際爬取過頁面內容後,才回答關於 Shopify 商品的商業查詢。在 robots.txt 中封鎖它們,你就消失了 —— 其他所有 AI 可見性訊號(schema、llms.txt、內容品質)都變得無意義。

這份指南提供一份為 AI 購物時代校準的實用 Shopify robots.txt 範例:為重要的 AI 爬蟲允許公開店家內容(商品、系列、部落格、頁面),封鎖不該的 checkout / account / admin 路徑,並用 Robots 分析器驗證。

GPTBot 存取能和不能做什麼

能做	不能做
爬取公開商品頁、系列、部落格	繞過私有頁面的身分驗證
為 ChatGPT 購物答案索引內容	讀取需要登入或 app 權限的內容
讀取 product schema、llms.txt、FAQ、政策頁	識別個別買家或他們的購物車內容
在合規爬蟲上尊重 `Disallow` 指令	強制隱私 —— 對抗式抓取器忽略 robots.txt
尊重可爬取頁面上的 `noindex` meta 指令	替代恰當的 Shopify 存取控制

具體地:允許 GPTBot 是 ChatGPT Shopping 可見性的必要條件,但不充分。爬蟲到達頁面是地板。可見性是這之上由內容品質 + 結構化資料閘控的。

應保持可爬取的 Shopify 路徑

路徑	為什麼
`/products/`	AI 購物查詢中被引用最多的 URL 類別
`/collections/`	品類層面的瀏覽頁;AI 用它回答「店家賣 X」的查詢
`/blogs/`	長篇內容,AI 從中提取商品情境
`/pages/faq`	FAQ 內容 —— AI 在「X 是否支援 Y」查詢中引用
`/pages/shipping`	運送政策 —— AI 在「能寄到 X 嗎」查詢中引用
`/pages/returns`	退貨政策 —— AI 在「不合適怎麼辦」查詢中引用
`/pages/size-guide`	尺碼情境 —— AI 在「我穿什麼碼」服飾查詢中引用
`/pages/ingredients`	美妝成分指南 —— AI 在相容性查詢中引用
`/pages/warranty`	電子產品保固 —— AI 在「保固是什麼」查詢中引用
`/llms.txt`	精簡內容地圖 —— AI 每次爬取都讀它
`/sitemap.xml`	URL 發現 —— AI 用來找新商品頁

應保持受保護的 Shopify 路徑

路徑	為什麼
`/cart`	個人化狀態 —— 爬取它沒有意義
`/checkout`	付款流程 —— 必須私密
`/account`	已登入顧客儀表板
`/admin`	Shopify admin(已被身分驗證保護,但明確更好)
`/orders/`	按顧客劃分的訂單歷史
`/apps/<private>`	暴露私有資料的第三方 app 端點
內部搜尋結果頁	薄弱或重複內容;AI 引擎會降權爬取繁重的網站
預覽 / staging URL	不用於公開可見

robots.txt 範例

放進你 Shopify 佈景主題的 robots.txt.liquid(或 robots.txt 資產)。發布前針對你具體的 apps + 佈景主題審核 —— 這是起點基準,不是一刀切的答案。

Shopify AI 爬蟲 robots.txt 範例 txt

# Shopify AI-crawler robots.txt — starter baseline.
# Review against your theme, apps, privacy needs, and policy before
# publishing to production.

User-agent: GPTBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: OAI-SearchBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: ChatGPT-User
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: PerplexityBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

User-agent: Googlebot
Allow: /
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /orders/

Sitemap: https://your-store.myshopify.com/sitemap.xml

審核清單

發布前過一遍這個清單。每一項都對應一種 Shopify 店家常見的透過 robots.txt 洩漏 AI 可見性或暴露私人資料的方式。

Shopify AI 爬蟲 robots.txt 審核清單 txt

Shopify AI-crawler robots.txt review checklist

[ ] Public product pages (/products/) are not blocked.
[ ] Public collection pages (/collections/) are not blocked.
[ ] Public blog (/blogs/) and pages (/pages/) content is not blocked.
[ ] Cart, checkout, account, admin, orders paths stay protected.
[ ] Each AI crawler has its own User-agent block (no shared rules).
[ ] noindex pages remain crawlable (so the crawler can read noindex).
[ ] robots.txt is not the only thing protecting private data —
    authentication handles that.
[ ] Sitemap directive points at the real /sitemap.xml URL.
[ ] Tested in /tools/robots-analyzer after deploying.
[ ] Re-checked after any theme update that touches robots.txt.liquid.

在 Shopify 中安裝

在 Shopify 後台,前往 線上商店 → 佈景主題 → 編輯程式碼。
在 Templates 下找 robots.txt.liquid。如果不存在,點「新增模板」→ 「robots」→「.liquid」。
把檔案內容替換為上面的範例(根據你實際的店家 URL 和任何自訂路徑調整)。
儲存模板。
在 https://your-store.myshopify.com/robots.txt 驗證新內容被提供 (瀏覽器快取 + Shopify 邊緣快取可能要幾分鐘才清理)。
把 robots.txt URL 貼到 Robots 分析器, 確認 GPTBot、OAI-SearchBot、ChatGPT-User 和 PerplexityBot 都對 /products/ 和 /collections/ 顯示「allowed」。

完成。AI 爬蟲將在下一次訪問時(通常 24 小時內)pick up 更新後的 robots.txt。把這份指南和 fashion llms.txt 範本 (或 beauty/electronics 兄弟範本)搭配使用,這樣爬蟲到達你的店家時, 就能找到一份值得讀的內容地圖。

驗證清單

公開商品頁未被封鎖

GPTBot、OAI-SearchBot、ChatGPT-User 和 PerplexityBot 都獲得 `Allow: /products/`(或沒有覆蓋 `/products/` 的明確 Disallow)。

公開系列頁未被封鎖

同樣的爬蟲能存取 `/collections/`。Shopify 系列頁是購物查詢中被引用最多的 URL 類別;封鎖它是頭號自損 AI 可見性。

公開部落格和指南內容未被封鎖

同樣的爬蟲能存取 `/blogs/` 和 `/pages/`(Shopify 在這裡託管 FAQ、尺寸表、成分頁、政策頁)。AI 購物答案會大量引用部落格和指南內容。

購物車、結帳、帳戶、admin 路徑保持受保護

每個 AI 爬蟲區塊都有明確的 `Disallow: /cart`、`Disallow: /checkout`、`Disallow: /account`、`Disallow: /admin`。這些路徑在被爬取時會洩漏個人化狀態,不應進入 AI 購物答案。

robots.txt 不作為隱私或安全機制使用

敏感資料(顧客資訊、訂單細節、私有 app 資料)由身分驗證保護,而非 `Disallow`。robots.txt 是爬蟲提示,不是安全邊界。

noindex 頁面保持可爬取

如果一個頁面應被排除在搜尋之外,它必須保持可爬取(無 `Disallow`),爬蟲才能讀到 `<meta name="robots" content="noindex">` 指令。Disallow 一個 noindex 頁面會讓它無法被檢查,Google 仍可能索引它。

GPTBot 和 OAI-SearchBot 用各自的規則區塊處理

不要共享一條 `User-agent: GPTBot,OAI-SearchBot` 行。兩者有不同的策略語意(訓練 vs 搜尋時取),店家可能想給每個爬蟲不同規則。

發布後在 Robots 分析器中測試

把更新後的 robots.txt 部署到 Shopify 佈景主題後,把 URL 貼到 /tools/robots-analyzer,確認每個 AI 爬蟲顯示預期的存取狀態。

常見問題

允許 GPTBot 能保證 ChatGPT Shopping 可見性嗎?

不能。爬蟲存取是地板,不是天花板。GPTBot 能到達你的商品頁是它被 OpenAI 模型索引的必要條件,但 ChatGPT Shopping 的可見性還取決於有用的商品情境(Product schema、llms.txt、準確的描述、真實評論)。允許 GPTBot,然後審核它能看到什麼 —— 這才是完整工作。

GPTBot 和 OAI-SearchBot 應該用同一規則區塊嗎?

未必。它們策略語意不同:GPTBot 是 OpenAI 的訓練爬蟲(它的存取影響你的內容是否訓練未來的 GPT 模型),而 OAI-SearchBot 是搜尋時取數器(它的存取影響即時 ChatGPT/Bing 搜尋答案)。一些店家想允許搜尋時存取但封鎖訓練。把它們當作各自獨立的策略選擇,用各自獨立的 `User-agent:` 區塊。

robots.txt 能保護 Shopify 私人顧客資料嗎?

不能。robots.txt 是爬蟲指令,不是安全機制。合規爬蟲(GPTBot、Googlebot 等)會尊重 `Disallow`,但對抗式抓取器完全忽略它。對於私人顧客資料、訂單細節或 app 特定端點,使用 Shopify 內建的身分驗證 + 存取控制。robots.txt 是一層;身分驗證才是真正的防線。

noindex 頁面應該在 robots.txt 裡 disallow 嗎?

不應該 —— 這是最常見的 robots.txt 錯誤。如果你 `Disallow:` 一個 noindex 頁面,爬蟲永遠讀不到 `<meta name="robots" content="noindex">` 標籤,頁面可能仍被索引(Google 有時從外連推斷存在性、不爬取也索引 URL)。讓 noindex 頁面保持可爬取;只封鎖應該完全對爬蟲不可見的路徑(cart、checkout、account、admin)。

GPTBot 存取能和不能做什麼#

應保持可爬取的 Shopify 路徑#

應保持受保護的 Shopify 路徑#

robots.txt 範例#

審核清單#

在 Shopify 中安裝#