Question 1

What is robots.txt?

Accepted Answer

robots.txt is a text file at the root of your website that tells search engine crawlers which pages or paths they are allowed or disallowed from crawling. It is a voluntary standard — well-behaved bots respect it, but malicious bots may ignore it.

Question 2

Can I block AI training bots with robots.txt?

Accepted Answer

Yes. Major AI companies respect robots.txt disallow rules. GPTBot (OpenAI), CCBot (Common Crawl used for training), anthropic-ai (Anthropic), and Google-Extended (Google Bard) all respect Disallow: / for their user agents.

Question 3

Does Disallow: / block all bots?

Accepted Answer

Disallow: / for a specific User-agent blocks that bot from crawling your entire site. To block all bots: set User-agent: * followed by Disallow: /. However, blocking Googlebot will prevent your site from being indexed in Google.

Question 4

What is Crawl-delay?

Accepted Answer

Crawl-delay tells the bot to wait a specified number of seconds between requests. This prevents aggressive crawlers from overloading your server. Note: Googlebot ignores Crawl-delay — use Google Search Console's crawl rate settings instead.

Question 5

Should I add my sitemap to robots.txt?

Accepted Answer

Yes. Adding a Sitemap: directive at the bottom of robots.txt tells all crawlers (not just Google) where your sitemap lives. This is in addition to submitting it in Google Search Console.

robots.txt Generator

Control what search engines and AI bots can crawl

Frequently asked questions

Related tools