AI-Crawler User-Agent-Liste

Referenzliste aller wichtigen AI-Crawler und User-Agents — was sie tun, wer sie betreibt und ob sie robots.txt respektieren.

SucheKategorieRobots.txt-Verhalten

21 Crawler angezeigt

User-Agent	Anbieter	Kategorie	Respektiert robots.txt
GPTBot	OpenAI	AI-Training	Ja
OAI-SearchBot	OpenAI	AI-Suchindex	Ja
ChatGPT-User	OpenAI	Benutzerinitiierter Abruf	Ja
ClaudeBot	Anthropic	AI-Training	Ja
Claude-SearchBot	Anthropic	AI-Suchindex	Ja
Claude-User	Anthropic	Benutzerinitiierter Abruf	Ja
Google-Extended	Google	AI-Training	Ja
GoogleOther	Google	AI-Training	Ja
Googlebot	Google	Suchmaschine	Ja
PerplexityBot	Perplexity	AI-Suchindex	Ja
Perplexity-User	Perplexity	Benutzerinitiierter Abruf	Nein
Applebot	Apple	Suchmaschine	Ja
Applebot-Extended	Apple	AI-Training	Ja
CCBot	Common Crawl	Geteilter Datensatz	Ja
Meta-ExternalAgent	Meta	AI-Training	Ja
Meta-ExternalFetcher	Meta	Benutzerinitiierter Abruf	Ja
Bytespider	ByteDance	AI-Training	Teilweise
Amazonbot	Amazon	AI-Suchindex	Ja
DuckAssistBot	DuckDuckGo	AI-Suchindex	Ja
MistralAI-User	Mistral	Benutzerinitiierter Abruf	Ja
YouBot	You.com	AI-Suchindex	Ja

robots.txt

# AI crawler block list — generated from clickfrom.ai/tools/ai-crawler-user-agent-list
# Remove the Disallow line for any crawler you want to allow.

# OpenAI — Crawls public web pages to improve OpenAI foundation models.
# Source: https://platform.openai.com/docs/bots
User-agent: GPTBot
Disallow: /

# OpenAI — Indexes web pages so ChatGPT search and SearchGPT can cite them.
# Source: https://platform.openai.com/docs/bots
User-agent: OAI-SearchBot
Disallow: /

# OpenAI — Fetches a page on the spot when a ChatGPT user asks the assistant about a specific URL.
# Source: https://platform.openai.com/docs/bots
User-agent: ChatGPT-User
Disallow: /

# Anthropic — Crawls public web pages for Anthropic foundation-model training.
# Source: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
User-agent: ClaudeBot
Disallow: /

# Anthropic — Indexes web pages so Claude can cite them in search-like answers.
# Source: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
User-agent: Claude-SearchBot
Disallow: /

# Anthropic — Fetches a page on the spot when a Claude user asks the assistant about a specific URL.
# Source: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
User-agent: Claude-User
Disallow: /

# Google — Opt-out token (not a real user-agent) controlling whether Gemini and Vertex AI may train on your content.
# Source: https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#google-extended
User-agent: Google-Extended
Disallow: /

# Google — Internal R&D and product-team crawls outside of Search and Ads.
# Source: https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#googleother
User-agent: GoogleOther
Disallow: /

# Google — Classical Google Search indexer. Powers AI Overviews via the same index.
# Source: https://developers.google.com/search/docs/crawling-indexing/googlebot
User-agent: Googlebot
Disallow: /

# Perplexity — Indexes web pages so Perplexity can surface them as cited sources in answers.
# Source: https://docs.perplexity.ai/guides/bots
User-agent: PerplexityBot
Disallow: /

# Perplexity — Fetches a page on the spot when a Perplexity user asks the assistant about a specific URL.
# Source: https://docs.perplexity.ai/guides/bots
User-agent: Perplexity-User
Disallow: /

# Apple — Powers Siri, Spotlight, and Safari Suggestions search.
# Source: https://support.apple.com/en-us/119829
User-agent: Applebot
Disallow: /

# Apple — Opt-out token controlling whether Apple Intelligence may train on your content.
# Source: https://support.apple.com/en-us/119829
User-agent: Applebot-Extended
Disallow: /

# Common Crawl — Bulk crawl of the public web. Downstream datasets feed many AI model training pipelines (including some at OpenAI, Anthropic, and academic groups).
# Source: https://commoncrawl.org/ccbot
User-agent: CCBot
Disallow: /

# Meta — Crawls public web pages for Meta AI (Llama family) training and indexing.
# Source: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
User-agent: Meta-ExternalAgent
Disallow: /

# Meta — Fetches a page on the spot when a Meta AI user asks the assistant about a specific URL.
# Source: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
User-agent: Meta-ExternalFetcher
Disallow: /

# ByteDance — Crawls public web pages for ByteDance's foundation-model training (Doubao and related models).
# Source: https://bytespider.bytedance.com/
User-agent: Bytespider
Disallow: /

# Amazon — Powers Alexa and other Amazon answer/AI experiences.
# Source: https://developer.amazon.com/amazonbot
User-agent: Amazonbot
Disallow: /

# DuckDuckGo — Indexes web pages so DuckAssist can summarize them in DuckDuckGo answers.
# Source: https://duckduckgo.com/duckduckgo-help-pages/results/duckassistbot/
User-agent: DuckAssistBot
Disallow: /

# Mistral — Fetches a page on the spot when a Mistral Le Chat user asks the assistant about a specific URL.
# Source: https://docs.mistral.ai/robots/
User-agent: MistralAI-User
Disallow: /

# You.com — Indexes web pages for You.com AI search and chat.
# Source: https://about.you.com/youbot/
User-agent: YouBot
Disallow: /

Was diese Liste zeigt

Den exakten User-Agent-String jedes wichtigen AI-Crawlers, aus Anbieterdokumentation übernommen
Ob jeder Crawler robots.txt respektiert — und wo Ausnahmen bestehen
Wofür jeder Crawler dient: AI-Training, AI-Suchindex, benutzerinitiierter Abruf, klassische Suche oder geteilter Datensatz

Warum eine quellengestützte Crawler-Liste zählt

Robots.txt-Regeln wirken nur, wenn Sie den User-Agent genau so schreiben, wie der Crawler sich selbst meldet. Ein Tippfehler („GPT-Bot" statt „GPTBot") versagt stillschweigend. Diese Liste übernimmt jeden Namen direkt aus den öffentlichen Dokumenten des Anbieters, damit Ihre robots.txt tatsächlich das tut, was Sie beabsichtigen.

Wie Händler diese Liste nutzen

Fügen Sie den gefilterten „Als robots.txt kopieren"-Block in Ihre Shopify-robots.txt.liquid-Override ein, um unerwünschte Crawler zu blockieren
Für Google-Extended und Applebot-Extended: Denken Sie daran, dass dies robots.txt-Tokens sind — sie tauchen nie in Ihren Zugriffsprotokollen auf
Führen Sie /tools/robots-analyzer gegen Ihre aktuelle robots.txt aus, um zu verifizieren, dass die richtigen Crawler erlaubt oder blockiert sind

Häufige Fehler

Googlebot blockieren, um sich aus AI Overviews zu entfernen — es gibt keinen separaten UA für AI Overviews; Googlebot zu blockieren entfernt Sie auch aus der regulären Google-Suche
Annehmen, dass benutzerinitiierte Fetcher robots.txt respektieren — Perplexity-User tut das ausdrücklich nicht
Einen UA-String aus einem Blogbeitrag kopieren, ohne die Anbieterquelle zu prüfen — Namen ändern sich, Blogs veralten

FAQ zur AI-Crawler-Liste

Sollte ich AI-Crawler in meinem Shopify-Store blockieren?

Meistens nein — die meisten AI-Crawler sind der Weg, wie Käufer Sie in ChatGPT-, Perplexity-, Claude- und Gemini-Antworten finden. Blockieren Sie nur Crawler, deren Wert für Ihren Store unklar ist (z. B. Bytespider), oder deren Opt-out-Tokens (Google-Extended, Applebot-Extended) Sie sich gegen Training entschieden haben.

Wie oft wird diese Liste aktualisiert?

Immer wenn ein Anbieter einen neuen Crawler veröffentlicht, einen einstellt oder sein deklariertes robots.txt-Verhalten ändert. Jeder Eintrag verlinkt auf die Anbieterquelle, sodass Sie direkt überprüfen können.

Warum sind einige Einträge mit „teilweise" oder „unklar" markiert?

Weil das deklarierte Verhalten des Anbieters und Audits Dritter nicht übereinstimmen oder der Anbieter keine klare Position veröffentlicht hat. Wir erfinden kein sauberes „Ja", wenn die Realität chaotischer ist.