robots.txt Generator

Robots.txt Generatorの仕組み

robots.txtジェネレーターはウェブサイト向けの構文的に正確なrobots.txtファイルを構築します。ユーザーエージェントを選択し、DisallowおよびAllowルールを設定し、Crawl-delayを設定し、サイトマップURLを追加すると——ドメインルートに配置する準備ができた完成したファイルが生成されます。組み込みバリデーターは、クローラーがディレクティブを誤解または無視する原因となる一般的な構文エラーをチェックします。

各robots.txtブロックはUser-agent:ディレクティブで始まり、どのクローラーにルールを適用するかを識別します。User-agent: *はすべてのロボットにルールを適用します。Googlebot、Bingbot、GPTBotなどの特定のエージェントはワイルドカードより優先する独自のルールブロックを持てます。Googleの実装では、同じパスに両方がマッチした場合、AllowディレクティブはDisallowディレクティブより優先されます。

複数のAI企業がモデルのトレーニング用にコンテンツをスクレイピングするクローラーを送り込んでいます。ブロックすべき代表的なボット：GPTBot（OpenAI）、CCBot（Common Crawl）、Google-Extended（Google Geminiトレーニング）、anthropic-ai（Anthropic Claude）、PerplexityBot。ジェネレーターには既知のすべてのAIトレーニングクローラーにDisallow: /ルールをワンクリックで追加するオプションが含まれています——AIトレーニングデータセットでのコンテンツの使用を制御したいパブリッシャーに人気の選択です。

Disallow: /admin/はCMSバックエンドを検索エンジンのクロールから防ぎます。Disallow: /checkout/はカートと決済ページを検索インデックスから除外します。Disallow: /search?は低品質の重複コンテンツを生成する内部検索結果ページをブロックします。Disallow: /*.pdf$はPDFファイルを除外します。ただしrobots.txtはアドバイザリーのみです——行儀の良いクローラーは防げますが、セキュリティは提供しません。機密ページには必ずアクセス制御も必要です。

Crawl-delay: 10はクローラーにリクエスト間に10秒待つよう求め、積極的なクロールによるサーバー負荷を低減します。GoogleはCrawl-delayを無視します（代わりにSearch Consoleのクロールレート設定を使用）が、Bingや他の多くのボットは尊重します。Sitemapディレクティブ（例：Sitemap: https://example.com/sitemap.xml）はrobots.txtのどこにでも記述でき、各検索エンジンに手動で送信しなくてもすべてのクローラーにサイトマップの場所を伝えます。

よくある質問

What is robots.txt?

robots.txt is a text file at the root of your website that tells search engine crawlers which pages or paths they are allowed or disallowed from crawling. It is a voluntary standard — well-behaved bots respect it, but malicious bots may ignore it.

Can I block AI training bots with robots.txt?

Yes. Major AI companies respect robots.txt disallow rules. GPTBot (OpenAI), CCBot (Common Crawl used for training), anthropic-ai (Anthropic), and Google-Extended (Google Bard) all respect Disallow: / for their user agents.

Does Disallow: / block all bots?

Disallow: / for a specific User-agent blocks that bot from crawling your entire site. To block all bots: set User-agent: * followed by Disallow: /. However, blocking Googlebot will prevent your site from being indexed in Google.

What is Crawl-delay?

Crawl-delay tells the bot to wait a specified number of seconds between requests. This prevents aggressive crawlers from overloading your server. Note: Googlebot ignores Crawl-delay — use Google Search Console's crawl rate settings instead.

Should I add my sitemap to robots.txt?

Yes. Adding a Sitemap: directive at the bottom of robots.txt tells all crawlers (not just Google) where your sitemap lives. This is in addition to submitting it in Google Search Console.

Robots.txt Generatorの仕組み

よくある質問

関連ツール