The 2026 robots.txt Guide for AI Crawlers (GPTBot, ClaudeBot, PerplexityBot & More)

Should I let AI bots crawl my site?

Yes — if you want to be cited by ChatGPT, Gemini, Claude and Perplexity. Blocking GPTBot, Google-Extended, ClaudeBot, PerplexityBot, Applebot-Extended, cohere-ai or CCBot in robots.txt means those AI assistants cannot read or recommend your content. The default 'block all bots' robots.txt that ships with many CMS templates is now actively hurting brand visibility.

The Modern Allow-List

Explicitly allow: GPTBot (OpenAI/ChatGPT), Google-Extended (Gemini training), ClaudeBot (Anthropic/Claude), PerplexityBot (Perplexity), Applebot-Extended (Apple Intelligence), cohere-ai (Cohere) and CCBot (Common Crawl, which feeds many LLMs). These are the bots that determine whether your content reaches AI assistants.

The Block List

Block aggressive scrapers and bandwidth-wasters with no AI-visibility upside — most agencies maintain a list of 30-50 of these. The line is simple: if a bot feeds a major AI assistant or search engine, allow it; if it just resells your content or wastes server capacity, block it.

What Else to Add

Add an llms.txt file at your root summarizing your site for LLM consumption. Add Sitemap: lines pointing at your XML sitemap. Keep the file in version control and document why each rule exists. Our LLM SEO playbook covers the full crawling-and-discovery layer.

Ready to embark on your journey in Digital Marketing?

Take the first step with SalemGlobal. Call us directly or schedule a free consultation today.