OotyOoty
SEOComing soonSocialComing soonVideoComing soonAdsComing soonAnalyticsComing soonCommerceComing soonCRMComing soonCreatorsComing soon
Join the waitlist
FeaturesToolsPricingDocs

Products

SEOComing soonSocialComing soonVideoComing soonAdsComing soonAnalyticsComing soonCommerceComing soonCRMComing soonCreatorsComing soon
FeaturesToolsPricingDocs
Log in
Join the Waitlist

Launching soon

OotyOoty

AI native tools that replace expensive dashboards. SEO, Amazon, YouTube, and social analytics inside your AI assistant.

Product

  • Features
  • Pricing
  • Get started

Resources

  • Free Tools
  • Docs
  • About
  • Blog
  • Contact

Legal

  • Privacy
  • Terms
  • Refund Policy
  • Security
OotyOoty

AI native tools that replace expensive dashboards. SEO, Amazon, YouTube, and social analytics inside your AI assistant.

Product

  • Features
  • Pricing
  • Get started

Resources

  • Free Tools
  • Docs
  • About
  • Blog
  • Contact

Legal

  • Privacy
  • Terms
  • Refund Policy
  • Security

Stay in the loop

Get updates on new tools, integrations, and guides. No spam.

© 2026 Ooty. All rights reserved.

All systems operational

Free Robots.txt Generator with AI Crawler Presets

Free Tools

Robots.txt Generator

Control which AI crawlers can access your content. Block training bots, allow browsing bots, or configure each one individually. Live preview, download, and deploy.

Choose a starting point, then fine-tune below

Training Bots

These bots collect your content to train AI models

GPTBotOpenAI

Collects data for AI model training

Google-ExtendedGoogle

Feeds content into Gemini AI training

CCBotCommon Crawl

Builds open dataset used by many AI labs

BytespiderByteDance

TikTok's crawler for AI training

Browsing Bots

These bots fetch pages live when users ask AI assistants questions

ChatGPT-UserOpenAI

Browses pages live when ChatGPT users ask

PerplexityBotPerplexity

Powers Perplexity's AI search answers

ClaudeBotAnthropic

Fetches pages for Claude AI web access

Intelligence Bots

These bots feed AI-powered features like Siri and Google AI

Applebot-ExtendedApple

Feeds Apple Intelligence, Siri, Spotlight

GoogleOtherGoogle

Google's catch-all for non-search AI tasks

Points crawlers to your XML sitemap.

Live Preview

robots.txt
# Your robots.txt will appear here as you configure bots

Test a URL Path

Enter a path to see which bots would be allowed or blocked.

Frequently Asked Questions

What is a robots.txt file?
A robots.txt file is a plain text file placed at the root of your website (e.g. example.com/robots.txt) that tells web crawlers which pages or directories they are allowed or disallowed from accessing. It follows the Robots Exclusion Protocol and is typically the first file crawlers check before fetching any page on your site. Every crawler that respects the protocol reads it, including Googlebot, GPTBot, ClaudeBot, and PerplexityBot.
How do I block AI crawlers like GPTBot and ClaudeBot?
Add a User-agent directive for each AI crawler followed by Disallow: /. For example, to block GPTBot: User-agent: GPTBot then Disallow: /. Do the same for ChatGPT-User, ClaudeBot, and PerplexityBot if you want to block all major AI crawlers. This generator includes a one-click preset that adds the correct blocks for all 9 known AI training and browsing crawlers.
What is the difference between blocking AI training crawlers and AI browsing crawlers?
AI companies run two types of crawlers. Training crawlers (like GPTBot and CCBot) collect data to train language models. Browsing crawlers (like ChatGPT-User and PerplexityBot) fetch pages in real time when a user asks an AI assistant about your content. You can block training crawlers while allowing browsing crawlers if you want to appear in AI-generated answers without contributing to model training data. This generator lets you configure each crawler individually.
Does robots.txt prevent my content from appearing in search results?
Not exactly. Robots.txt tells crawlers not to fetch your pages, but if other sites link to a blocked page, search engines may still list the URL in results without a snippet. This is called a URL-only result. To fully remove a page from search results, you need a noindex meta tag or X-Robots-Tag HTTP header on the page itself, not a robots.txt disallow directive.
Where do I put my robots.txt file?
The file must be placed at the exact root of your domain, for example https://example.com/robots.txt. Crawlers will not find it at a subdirectory path. Make sure it is served as plain text with Content-Type: text/plain. If your site is on a subdomain (e.g. blog.example.com), you need a separate robots.txt at the subdomain root.
Can I set different rules for different crawlers?
Yes. Each User-agent block in robots.txt targets a specific crawler by name. You can allow Googlebot full access while blocking GPTBot entirely, or set a crawl delay for Bytespider while allowing everything else. A crawler looks for a block matching its own user agent string first, then falls back to the wildcard User-agent: * block if no specific match is found. This generator lets you configure rules per crawler.

Get notified when the full tools launch

We won't spam you, just a heads up when everything is live.

You wrote the rules. Ooty SEO shows who actually follows them.

Ooty SEO shows which bots visit every page, how often they return, and which content they skip entirely. Query your crawl data and AI visibility inside ChatGPT, Gemini, or Claude.

See Ooty SEO

14-day money-back guarantee. No questions asked.

Next Steps

  1. 1

    Copy and deploy your robots.txt

    Save the generated file as robots.txt at your domain root (e.g. yoursite.com/robots.txt). Every bot checks this location.

  2. 2

    Test it in Google Search Console

    Use the robots.txt Tester in Google Search Console to confirm Googlebot can reach the pages you intend to allow.

  3. 3

    Check AI crawler access

    Run the AI Readiness Checker to confirm GPTBot, ClaudeBot, and PerplexityBot have the access your robots.txt intends to grant.

  4. 4

    Validate your sitemap

    Make sure your sitemap is referenced in your robots.txt and accessible. Run the Sitemap Validator to check it is well-formed and indexed.

Check these next

Recommended next

AI Readiness Checker

Check if AI crawlers can access your site

SEO Content Analyzer

44-check SEO audit for any URL

Schema Markup Validator

Validate JSON-LD and check rich result eligibility

Meta Tag Analyzer

Analyze title, description, and OG tags

Sitemap Validator

Validate XML sitemap structure and URL count

Topic Cluster Analyzer

Visualize your site's topic distribution

HTTP Status Checker

Bulk check with AI crawler user-agent testing

The AI Crawler Landscape in 2026

Nine AI crawlers now visit websites regularly. They fall into two categories: training crawlers that collect data to improve AI models, and browsing crawlers that fetch pages in real time when users ask AI assistants questions.

Training Crawlers

GPTBot (OpenAI), CCBot (Common Crawl), Bytespider (ByteDance), Google-Extended (Google). These collect content to train language models. Blocking them means your content is not used for training, but also not available when the AI needs to reference it from memory.

Browsing Crawlers

ChatGPT-User (OpenAI), PerplexityBot (Perplexity). These fetch pages live during conversations. If a user asks ChatGPT to check your pricing page, ChatGPT-User visits it in real time. Blocking these bots makes your site invisible during live AI interactions.

Hybrid Crawlers

GoogleOther, ClaudeBot, and Applebot-Extended serve multiple purposes. GoogleOther feeds experimental Google products. ClaudeBot collects training data and powers search. Applebot-Extended feeds Apple Intelligence.

Training vs Browsing: How to Decide What to Block

  • Allow everything if you want maximum AI visibility. Your content appears in training data AND real-time AI answers. Best for businesses that want AI referrals.
  • Block training, allow browsing if you want AI citation without contributing to training data. Block GPTBot and CCBot, allow ChatGPT-User and PerplexityBot. Your content appears in live AI answers but not in model training.
  • Block everything if you want no AI involvement. Your site disappears from all AI products. Consider the competitive cost: if your competitors allow AI crawlers, they get the citations and you do not.

Use the AI Readiness Checker to verify your robots.txt is working as intended and that crawler access matches your strategy.

How to Deploy Your robots.txt

  1. Generate your file using the tool above.
  2. Copy the output (one-click button in the preview panel).
  3. Save it as robots.txt at the root of your domain (e.g., https://example.com/robots.txt).
  4. Verify it serves with Content-Type: text/plain.
  5. Test using Google Search Console's robots.txt tester.
  6. Run the AI Readiness Checker to confirm crawler access matches your intent.

For specific CMS platforms: WordPress places it at the site root automatically (but check for conflicting plugin rules). Shopify manages robots.txt through the admin panel. For static sites (Next.js, Gatsby, Hugo), place the file in your public directory.

Common robots.txt Mistakes

  • Blocking Googlebot by accident. This blocks ALL Google indexing, not just AI. Your entire site disappears from Google Search.
  • Using Disallow: without a path. This blocks nothing. You need Disallow: / to block the entire site.
  • Assuming robots.txt prevents indexing. It prevents crawling, not indexing. If other sites link to a blocked page, Google may still list the URL. Use a noindex meta tag for that.
  • Forgetting that robots.txt is publicly readable. Do not use it to hide sensitive URLs. Anyone can visit example.com/robots.txt and read your rules.
  • Conflicting rules across User-agent blocks. When a crawler matches multiple blocks, it uses the most specific one. Wildcard rules do not override named user-agent rules.

After generating your robots.txt, validate your XML sitemap to make sure it is referenced correctly. Run the SEO Analyzer on key pages to check crawlability, and use the HTTP Status Checker to confirm pages return the expected status codes.

Related Guides

  • Crawl budget explained covers why robots.txt rules directly affect how search engines allocate crawls.
  • JavaScript SEO explained covers crawler access challenges specific to JS-heavy sites.
  • The complete SEO audit checklist puts robots.txt in context of a broader technical SEO review.
  • AI Overviews and crawler access explains how AI crawlers connect to Google's AI features.