AI-Crawler User-Agent Lijst

Referentielijst van elke belangrijke AI-crawler en user-agent — wat ze doen, wie ze beheert en of ze robots.txt respecteren.

ZoekenCategorieRobots.txt-gedrag

21 crawler(s) weergegeven

User-agent	Leverancier	Categorie	Respecteert robots.txt
GPTBot	OpenAI	AI-training	Ja
OAI-SearchBot	OpenAI	AI-zoekindex	Ja
ChatGPT-User	OpenAI	Door gebruiker getriggerde fetch	Ja
ClaudeBot	Anthropic	AI-training	Ja
Claude-SearchBot	Anthropic	AI-zoekindex	Ja
Claude-User	Anthropic	Door gebruiker getriggerde fetch	Ja
Google-Extended	Google	AI-training	Ja
GoogleOther	Google	AI-training	Ja
Googlebot	Google	Zoekmachine	Ja
PerplexityBot	Perplexity	AI-zoekindex	Ja
Perplexity-User	Perplexity	Door gebruiker getriggerde fetch	Nee
Applebot	Apple	Zoekmachine	Ja
Applebot-Extended	Apple	AI-training	Ja
CCBot	Common Crawl	Gedeelde dataset	Ja
Meta-ExternalAgent	Meta	AI-training	Ja
Meta-ExternalFetcher	Meta	Door gebruiker getriggerde fetch	Ja
Bytespider	ByteDance	AI-training	Gedeeltelijk
Amazonbot	Amazon	AI-zoekindex	Ja
DuckAssistBot	DuckDuckGo	AI-zoekindex	Ja
MistralAI-User	Mistral	Door gebruiker getriggerde fetch	Ja
YouBot	You.com	AI-zoekindex	Ja

robots.txt

# AI crawler block list — generated from clickfrom.ai/tools/ai-crawler-user-agent-list
# Remove the Disallow line for any crawler you want to allow.

# OpenAI — Crawls public web pages to improve OpenAI foundation models.
# Source: https://platform.openai.com/docs/bots
User-agent: GPTBot
Disallow: /

# OpenAI — Indexes web pages so ChatGPT search and SearchGPT can cite them.
# Source: https://platform.openai.com/docs/bots
User-agent: OAI-SearchBot
Disallow: /

# OpenAI — Fetches a page on the spot when a ChatGPT user asks the assistant about a specific URL.
# Source: https://platform.openai.com/docs/bots
User-agent: ChatGPT-User
Disallow: /

# Anthropic — Crawls public web pages for Anthropic foundation-model training.
# Source: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
User-agent: ClaudeBot
Disallow: /

# Anthropic — Indexes web pages so Claude can cite them in search-like answers.
# Source: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
User-agent: Claude-SearchBot
Disallow: /

# Anthropic — Fetches a page on the spot when a Claude user asks the assistant about a specific URL.
# Source: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
User-agent: Claude-User
Disallow: /

# Google — Opt-out token (not a real user-agent) controlling whether Gemini and Vertex AI may train on your content.
# Source: https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#google-extended
User-agent: Google-Extended
Disallow: /

# Google — Internal R&D and product-team crawls outside of Search and Ads.
# Source: https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#googleother
User-agent: GoogleOther
Disallow: /

# Google — Classical Google Search indexer. Powers AI Overviews via the same index.
# Source: https://developers.google.com/search/docs/crawling-indexing/googlebot
User-agent: Googlebot
Disallow: /

# Perplexity — Indexes web pages so Perplexity can surface them as cited sources in answers.
# Source: https://docs.perplexity.ai/guides/bots
User-agent: PerplexityBot
Disallow: /

# Perplexity — Fetches a page on the spot when a Perplexity user asks the assistant about a specific URL.
# Source: https://docs.perplexity.ai/guides/bots
User-agent: Perplexity-User
Disallow: /

# Apple — Powers Siri, Spotlight, and Safari Suggestions search.
# Source: https://support.apple.com/en-us/119829
User-agent: Applebot
Disallow: /

# Apple — Opt-out token controlling whether Apple Intelligence may train on your content.
# Source: https://support.apple.com/en-us/119829
User-agent: Applebot-Extended
Disallow: /

# Common Crawl — Bulk crawl of the public web. Downstream datasets feed many AI model training pipelines (including some at OpenAI, Anthropic, and academic groups).
# Source: https://commoncrawl.org/ccbot
User-agent: CCBot
Disallow: /

# Meta — Crawls public web pages for Meta AI (Llama family) training and indexing.
# Source: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
User-agent: Meta-ExternalAgent
Disallow: /

# Meta — Fetches a page on the spot when a Meta AI user asks the assistant about a specific URL.
# Source: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
User-agent: Meta-ExternalFetcher
Disallow: /

# ByteDance — Crawls public web pages for ByteDance's foundation-model training (Doubao and related models).
# Source: https://bytespider.bytedance.com/
User-agent: Bytespider
Disallow: /

# Amazon — Powers Alexa and other Amazon answer/AI experiences.
# Source: https://developer.amazon.com/amazonbot
User-agent: Amazonbot
Disallow: /

# DuckDuckGo — Indexes web pages so DuckAssist can summarize them in DuckDuckGo answers.
# Source: https://duckduckgo.com/duckduckgo-help-pages/results/duckassistbot/
User-agent: DuckAssistBot
Disallow: /

# Mistral — Fetches a page on the spot when a Mistral Le Chat user asks the assistant about a specific URL.
# Source: https://docs.mistral.ai/robots/
User-agent: MistralAI-User
Disallow: /

# You.com — Indexes web pages for You.com AI search and chat.
# Source: https://about.you.com/youbot/
User-agent: YouBot
Disallow: /

Wat deze lijst toont

De exacte User-agent-string van elke belangrijke AI-crawler, uit leveranciersdocumentatie
Of elke crawler robots.txt respecteert — en waar uitzonderingen bestaan
Waar elke crawler voor is: AI-training, AI-zoekindex, door gebruiker geactiveerde fetch, klassieke zoekopdracht of gedeelde dataset

Waarom een gefundeerde crawlerlijst telt

Robots.txt-regels werken alleen als je de User-agent precies schrijft zoals de crawler zichzelf aankondigt. Een typfout („GPT-Bot" in plaats van „GPTBot") faalt stilletjes. Deze lijst haalt elke naam rechtstreeks uit de openbare documenten van de leverancier, zodat je robots.txt daadwerkelijk doet wat je bedoelt.

Hoe merchants deze lijst gebruiken

Plak het gefilterde „Kopiëren als robots.txt"-blok in je Shopify robots.txt.liquid-override om crawlers te blokkeren die je niet wilt
Voor Google-Extended en Applebot-Extended: dit zijn robots.txt-tokens — ze verschijnen nooit in je toegangslogs
Voer /tools/robots-analyzer uit tegen je huidige robots.txt om te verifiëren dat de juiste crawlers zijn toegestaan of geblokkeerd

Veelgemaakte fouten

Googlebot blokkeren om uit AI Overviews te stappen — er is geen aparte UA voor AI Overviews; Googlebot blokkeren verwijdert je ook uit reguliere Google-zoekopdrachten
Aannemen dat door gebruiker geactiveerde fetchers robots.txt respecteren — Perplexity-User doet dat expliciet niet
Een UA-string uit een blogpost kopiëren zonder de leveranciersbron te controleren — namen veranderen, blogs verouderen

FAQ AI-crawlerlijst

Moet ik AI-crawlers blokkeren in mijn Shopify-store?

Meestal niet — de meeste AI-crawlers zijn hoe shoppers je vinden in ChatGPT-, Perplexity-, Claude- en Gemini-antwoorden. Blokkeer alleen crawlers waarvan de waarde voor je store onduidelijk is (bijv. Bytespider), of waarvan je hebt besloten niet deel te nemen aan training via hun opt-out-tokens (Google-Extended, Applebot-Extended).

Hoe vaak wordt deze lijst bijgewerkt?

Telkens wanneer een leverancier een nieuwe crawler publiceert, er een uitfaseert of zijn aangegeven robots.txt-gedrag verandert. Elke vermelding linkt naar de leveranciersbron voor directe verificatie.

Waarom zijn sommige vermeldingen gemarkeerd als „gedeeltelijk" of „onduidelijk"?

Omdat het aangegeven gedrag van de leverancier en audits van derden niet overeenkomen, of de leverancier geen duidelijk standpunt heeft gepubliceerd. We fabriceren geen schoon „ja" wanneer de realiteit rommeliger is.

Gerelateerde AI-zichtbaarheidsbronnen

GPTBot robots.txt voor Shopify Robots-analyzer llms.txt-sjabloon (mode)