Stax
Tools

robots.txt Generator

Generate a robots.txt file with bot blocking and sitemap URL.

User-agent: *
Disallow: /admin
Disallow: /api
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

Sitemap: https://example.com/sitemap.xml

Place this file at the root of your domain: https://yourdomain.com/robots.txt

How the Robots.txt Generator works

The robots.txt generator builds a syntactically correct robots.txt file for your website. Select user agents, configure Disallow and Allow rules, set a Crawl-delay, and add your sitemap URL — the tool generates the finished file ready to place at your domain root. A built-in validator checks for common syntax errors that would cause crawlers to misinterpret or ignore your directives.

User-agent directives and wildcard rules

Each robots.txt block starts with a User-agent: directive identifying which crawler the rules apply to. User-agent: * applies rules to all robots. Specific agents like Googlebot, Bingbot, or GPTBot can have their own rule blocks that take precedence over the wildcard. The order of rules within a block matters — Allow directives take precedence over Disallow directives when both match the same path in Google's implementation.

Blocking AI training bots

Several AI companies send crawlers that scrape content for model training. Common bots to block include GPTBot (OpenAI), CCBot (Common Crawl), Google-Extended (Google Gemini training), anthropic-ai (Anthropic Claude), and PerplexityBot. The generator includes a one-click option to add Disallow: / rules for all known AI training crawlers — a popular choice for publishers who want control over how their content is used in AI training datasets.

Protecting private pages and admin areas

Disallow: /admin/ prevents search engines from crawling your CMS backend. Disallow: /checkout/ keeps cart and payment pages out of search indexes. Disallow: /search? blocks internal search result pages that create low-value duplicate content. Disallow: /*.pdf$ excludes PDF files. However, robots.txt is advisory only — it prevents well-behaved crawlers but does not provide security. Sensitive pages must also be access-controlled.

Crawl-delay and Sitemap directives

Crawl-delay: 10 asks crawlers to wait 10 seconds between requests, reducing server load from aggressive crawling. Google ignores Crawl-delay (use Search Console's crawl rate setting instead), but Bing and many other bots respect it. The Sitemap: directive (e.g., Sitemap: https://example.com/sitemap.xml) can appear anywhere in robots.txt and tells all crawlers where to find your sitemap without manual submission to each search engine separately.

Frequently asked questions

What is robots.txt?
robots.txt is a text file at the root of your website that tells search engine crawlers which pages or paths they are allowed or disallowed from crawling. It is a voluntary standard — well-behaved bots respect it, but malicious bots may ignore it.
Can I block AI training bots with robots.txt?
Yes. Major AI companies respect robots.txt disallow rules. GPTBot (OpenAI), CCBot (Common Crawl used for training), anthropic-ai (Anthropic), and Google-Extended (Google Bard) all respect Disallow: / for their user agents.
Does Disallow: / block all bots?
Disallow: / for a specific User-agent blocks that bot from crawling your entire site. To block all bots: set User-agent: * followed by Disallow: /. However, blocking Googlebot will prevent your site from being indexed in Google.
What is Crawl-delay?
Crawl-delay tells the bot to wait a specified number of seconds between requests. This prevents aggressive crawlers from overloading your server. Note: Googlebot ignores Crawl-delay — use Google Search Console's crawl rate settings instead.
Should I add my sitemap to robots.txt?
Yes. Adding a Sitemap: directive at the bottom of robots.txt tells all crawlers (not just Google) where your sitemap lives. This is in addition to submitting it in Google Search Console.

Related tools