Lista User-Agent Crawlerów AI

Lista referencyjna każdego ważnego crawlera i user-agenta AI — co robią, kto je obsługuje i czy respektują robots.txt.

SzukajKategoriaZachowanie robots.txt

Pokazano 21 crawler(y)

User-agent	Dostawca	Kategoria	Respektuje robots.txt
GPTBot	OpenAI	Trening AI	Tak
OAI-SearchBot	OpenAI	Indeks wyszukiwania AI	Tak
ChatGPT-User	OpenAI	Pobieranie inicjowane przez użytkownika	Tak
ClaudeBot	Anthropic	Trening AI	Tak
Claude-SearchBot	Anthropic	Indeks wyszukiwania AI	Tak
Claude-User	Anthropic	Pobieranie inicjowane przez użytkownika	Tak
Google-Extended	Google	Trening AI	Tak
GoogleOther	Google	Trening AI	Tak
Googlebot	Google	Wyszukiwarka	Tak
PerplexityBot	Perplexity	Indeks wyszukiwania AI	Tak
Perplexity-User	Perplexity	Pobieranie inicjowane przez użytkownika	Nie
Applebot	Apple	Wyszukiwarka	Tak
Applebot-Extended	Apple	Trening AI	Tak
CCBot	Common Crawl	Zbiór danych współdzielony	Tak
Meta-ExternalAgent	Meta	Trening AI	Tak
Meta-ExternalFetcher	Meta	Pobieranie inicjowane przez użytkownika	Tak
Bytespider	ByteDance	Trening AI	Częściowo
Amazonbot	Amazon	Indeks wyszukiwania AI	Tak
DuckAssistBot	DuckDuckGo	Indeks wyszukiwania AI	Tak
MistralAI-User	Mistral	Pobieranie inicjowane przez użytkownika	Tak
YouBot	You.com	Indeks wyszukiwania AI	Tak

robots.txt

# AI crawler block list — generated from clickfrom.ai/tools/ai-crawler-user-agent-list
# Remove the Disallow line for any crawler you want to allow.

# OpenAI — Crawls public web pages to improve OpenAI foundation models.
# Source: https://platform.openai.com/docs/bots
User-agent: GPTBot
Disallow: /

# OpenAI — Indexes web pages so ChatGPT search and SearchGPT can cite them.
# Source: https://platform.openai.com/docs/bots
User-agent: OAI-SearchBot
Disallow: /

# OpenAI — Fetches a page on the spot when a ChatGPT user asks the assistant about a specific URL.
# Source: https://platform.openai.com/docs/bots
User-agent: ChatGPT-User
Disallow: /

# Anthropic — Crawls public web pages for Anthropic foundation-model training.
# Source: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
User-agent: ClaudeBot
Disallow: /

# Anthropic — Indexes web pages so Claude can cite them in search-like answers.
# Source: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
User-agent: Claude-SearchBot
Disallow: /

# Anthropic — Fetches a page on the spot when a Claude user asks the assistant about a specific URL.
# Source: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
User-agent: Claude-User
Disallow: /

# Google — Opt-out token (not a real user-agent) controlling whether Gemini and Vertex AI may train on your content.
# Source: https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#google-extended
User-agent: Google-Extended
Disallow: /

# Google — Internal R&D and product-team crawls outside of Search and Ads.
# Source: https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#googleother
User-agent: GoogleOther
Disallow: /

# Google — Classical Google Search indexer. Powers AI Overviews via the same index.
# Source: https://developers.google.com/search/docs/crawling-indexing/googlebot
User-agent: Googlebot
Disallow: /

# Perplexity — Indexes web pages so Perplexity can surface them as cited sources in answers.
# Source: https://docs.perplexity.ai/guides/bots
User-agent: PerplexityBot
Disallow: /

# Perplexity — Fetches a page on the spot when a Perplexity user asks the assistant about a specific URL.
# Source: https://docs.perplexity.ai/guides/bots
User-agent: Perplexity-User
Disallow: /

# Apple — Powers Siri, Spotlight, and Safari Suggestions search.
# Source: https://support.apple.com/en-us/119829
User-agent: Applebot
Disallow: /

# Apple — Opt-out token controlling whether Apple Intelligence may train on your content.
# Source: https://support.apple.com/en-us/119829
User-agent: Applebot-Extended
Disallow: /

# Common Crawl — Bulk crawl of the public web. Downstream datasets feed many AI model training pipelines (including some at OpenAI, Anthropic, and academic groups).
# Source: https://commoncrawl.org/ccbot
User-agent: CCBot
Disallow: /

# Meta — Crawls public web pages for Meta AI (Llama family) training and indexing.
# Source: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
User-agent: Meta-ExternalAgent
Disallow: /

# Meta — Fetches a page on the spot when a Meta AI user asks the assistant about a specific URL.
# Source: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
User-agent: Meta-ExternalFetcher
Disallow: /

# ByteDance — Crawls public web pages for ByteDance's foundation-model training (Doubao and related models).
# Source: https://bytespider.bytedance.com/
User-agent: Bytespider
Disallow: /

# Amazon — Powers Alexa and other Amazon answer/AI experiences.
# Source: https://developer.amazon.com/amazonbot
User-agent: Amazonbot
Disallow: /

# DuckDuckGo — Indexes web pages so DuckAssist can summarize them in DuckDuckGo answers.
# Source: https://duckduckgo.com/duckduckgo-help-pages/results/duckassistbot/
User-agent: DuckAssistBot
Disallow: /

# Mistral — Fetches a page on the spot when a Mistral Le Chat user asks the assistant about a specific URL.
# Source: https://docs.mistral.ai/robots/
User-agent: MistralAI-User
Disallow: /

# You.com — Indexes web pages for You.com AI search and chat.
# Source: https://about.you.com/youbot/
User-agent: YouBot
Disallow: /

Co pokazuje ta lista

Dokładny ciąg User-agent każdego ważnego crawlera AI, pobrany z dokumentacji dostawcy
Czy każdy crawler respektuje robots.txt — i gdzie istnieją wyjątki
Do czego służy każdy crawler: trening AI, indeks wyszukiwania AI, pobieranie inicjowane przez użytkownika, wyszukiwanie klasyczne lub zbiór danych współdzielony

Dlaczego lista crawlerów ze źródłami ma znaczenie

Reguły robots.txt działają tylko wtedy, gdy napiszesz User-agenta dokładnie tak, jak crawler sam się przedstawia. Literówka („GPT-Bot" zamiast „GPTBot") cicho zawodzi. Ta lista pobiera każdą nazwę bezpośrednio z publicznych dokumentów dostawcy, dzięki czemu twój robots.txt rzeczywiście robi to, co zamierzasz.

Jak merchants używają tej listy

Wklej przefiltrowany blok „Kopiuj jako robots.txt" do swojego override Shopify robots.txt.liquid, aby blokować niechciane crawlery
Dla Google-Extended i Applebot-Extended: są to tokeny robots.txt — nigdy nie pojawiają się w logach dostępu
Uruchom /tools/robots-analyzer wobec swojego bieżącego robots.txt, aby zweryfikować, że właściwe crawlery są dozwolone lub blokowane

Częste błędy

Blokowanie Googlebota, aby zrezygnować z AI Overviews — nie ma osobnego UA dla AI Overviews; blokowanie Googlebota usuwa cię też z zwykłego wyszukiwania Google
Założenie, że fetchery inicjowane przez użytkownika respektują robots.txt — Perplexity-User wyraźnie nie respektuje
Kopiowanie ciągu UA z posta na blogu bez sprawdzenia źródła dostawcy — nazwy się zmieniają, blogi się starzeją

FAQ lista crawlerów AI

Czy powinienem blokować crawlery AI w moim sklepie Shopify?

Zwykle nie — większość crawlerów AI to sposób, w jaki kupujący znajdują cię w odpowiedziach ChatGPT, Perplexity, Claude i Gemini. Blokuj tylko crawlery, których wartość dla twojego sklepu jest niejasna (np. Bytespider), lub te, których tokeny opt-out (Google-Extended, Applebot-Extended) zdecydowałeś się nie uczestniczyć w treningu.

Jak często ta lista jest aktualizowana?

Za każdym razem, gdy dostawca publikuje nowy crawler, deprecjonuje istniejący lub zmienia swoje deklarowane zachowanie robots.txt. Każda pozycja linkuje do źródła dostawcy do bezpośredniej weryfikacji.

Dlaczego niektóre pozycje są oznaczone jako „częściowo" lub „niejasne"?

Ponieważ zachowanie deklarowane przez dostawcę i audyty stron trzecich nie zgadzają się, lub dostawca nie opublikował jasnego stanowiska. Nie fabrykujemy czystego „tak", gdy rzeczywistość jest bardziej skomplikowana.

Powiązane zasoby widoczności AI

GPTBot robots.txt dla Shopify Analizator robots Szablon llms.txt (moda)