OotyOoty
SEOComing soonSocialComing soonVideoComing soonAdsComing soonAnalyticsComing soonCommerceComing soonCRMComing soonCreatorsComing soon
Join the waitlist
FeaturesToolsPricingDocs

Products

SEOComing soonSocialComing soonVideoComing soonAdsComing soonAnalyticsComing soonCommerceComing soonCRMComing soonCreatorsComing soon
FeaturesToolsPricingDocs
Log in
Join the Waitlist

Launching soon

OotyOoty

AI native tools that replace expensive dashboards. SEO, Amazon, YouTube, and social analytics inside your AI assistant.

Product

  • Features
  • Pricing
  • Get started

Resources

  • Free Tools
  • Docs
  • About
  • Blog
  • Contact

Legal

  • Privacy
  • Terms
  • Refund Policy
  • Security
OotyOoty

AI native tools that replace expensive dashboards. SEO, Amazon, YouTube, and social analytics inside your AI assistant.

Product

  • Features
  • Pricing
  • Get started

Resources

  • Free Tools
  • Docs
  • About
  • Blog
  • Contact

Legal

  • Privacy
  • Terms
  • Refund Policy
  • Security

Stay in the loop

Get updates on new tools, integrations, and guides. No spam.

© 2026 Ooty. All rights reserved.

All systems operational
  1. Home
  2. /
  3. Blog
  4. /
  5. seo
  6. /
  7. What the Claude Code Source Reveals About How AI Actually Searches
1 April 2026·12 min read

What the Claude Code Source Reveals About How AI Actually Searches

We analyzed the Claude Code source code. Here is what it reveals about AI search architecture, content extraction, and which websites get preferential treatment.

By Maya Torres

On March 31, 2026, a clean-room reimplementation of Claude Code hit GitHub and collected 50,000 stars in two hours. Alongside the Rust port, the actual npm package (@anthropic-ai/claude-code) has been sitting in node_modules since release, minified but readable.

We analyzed both. Not because the code itself is particularly surprising, but because it is the first time anyone outside Anthropic can see exactly how a major AI assistant searches the web, processes HTML, and decides what to cite. If you work in SEO, this is the closest thing to reading the ranking algorithm of the next generation of search.

Here is what we found, and what it means for your content strategy.

The search engine correction everyone got wrong

Within hours of the leak, the narrative across Reddit and Twitter was that Claude Code uses DuckDuckGo for web search. That is wrong, or at least deeply misleading.

The open-source Rust port does use DuckDuckGo. It makes direct HTTP requests to html.duckduckgo.com/html/, parses the results, and feeds them to the model. That is a standalone replacement built by the reimplementation author, not Anthropic's architecture.

The real Claude Code works differently. When it searches the web, it creates a nested API call to Anthropic's messages endpoint with a tool type called web_search_20250305. The search happens server-side on Anthropic's infrastructure. Results come back with encrypted_content and encrypted_index fields. The search provider is never disclosed.

Third-party reports have suggested Brave Search API, but Anthropic has not confirmed this publicly. The honest answer is: we do not know which search engine powers it. What we do know is the architecture around it, and that architecture has real implications for anyone trying to be visible in AI-generated answers.

Only 8 slots exist

Claude Code limits itself to 8 search results per query. No pagination. No follow-up searches to expand the result set (unless the model explicitly decides to search again, which counts against the same limit). There is a hard cap of 8 uses of the search tool per API request.

Compare that to a human using Google, who might scan 30 results across three pages before choosing sources. Claude gets 8 shots and works with what it has.

The implication is straightforward. If your page is not in the top 8 results from whatever search engine Anthropic uses, it does not exist to Claude. Position 9 is functionally identical to position 900. This makes traditional SEO ranking more important for AI visibility, not less. The pages that already rank well in conventional search are the ones AI assistants will cite.

The two-tier web: pre-approved domains get preferential treatment

This is the finding that matters most for content strategy.

Claude Code maintains a hardcoded list of approximately 131 "pre-approved" domains. When it fetches content from one of these domains, three things happen differently:

  1. No permission prompt. The user is not asked whether Claude can access the site.
  2. No domain preflight check. Claude skips the API call to api.anthropic.com/api/web/domain_info that it normally makes to verify whether a domain is allowed.
  3. Full content extraction. The extraction prompt allows "relevant details, code examples, and documentation excerpts as needed."

For every other website on the internet, a different extraction prompt applies: "Enforce a strict 125-character maximum for quotes from any source document."

Read that again. Pre-approved domains get unlimited extraction. Everyone else gets 125 characters.

Who is on the list

We compiled the full list from both the TypeScript source structure and the npm package. The pattern is unmistakable:

  • Language docs: docs.python.org, developer.mozilla.org, doc.rust-lang.org, learn.microsoft.com, go.dev
  • Frameworks: react.dev, nextjs.org, vuejs.org, angular.io, tailwindcss.com, docs.djangoproject.com, laravel.com
  • Infrastructure: kubernetes.io, docs.aws.amazon.com, cloud.google.com, www.docker.com, www.terraform.io
  • Databases: www.postgresql.org, www.mongodb.com, redis.io, prisma.io
  • ML/AI: pytorch.org, huggingface.co, www.tensorflow.org
  • Anthropic's own: platform.claude.com, code.claude.com, modelcontextprotocol.io

The complete list runs to about 131 domains. Every single one is developer documentation or a canonical technical reference.

Who is not on the list

Zero marketing sites. Zero e-commerce platforms. Zero news outlets. Zero SEO tools. Zero general content publishers.

If you run a SaaS product, an agency blog, a media publication, or any website that is not developer documentation, your content is subject to the 125-character extraction limit. Claude can still cite you, but it can only quote tiny fragments of what you wrote.

There is no submission process. The list is manually curated and hardcoded. For domains not on the list, Claude falls back to a dynamic API check at api.anthropic.com/api/web/domain_info?domain=your-domain to determine whether fetching is allowed at all.

You can check your domain's AI trust status for free using our AI Domain Trust Checker, which cross-references the pre-approved list, verifies your robots.txt against 12 AI crawlers, and checks HTTPS validity.

What survives the HTML pipeline (and what does not)

When Claude fetches a web page, it does not read your HTML the way a browser does. The content goes through a processing pipeline that strips most of what web developers consider important.

What survives:

  • Headings (converted to markdown #, ##, ###)
  • Paragraphs (plain text)
  • Lists (ordered and unordered)
  • Links (anchor text and href)
  • Bold and italic emphasis
  • Image alt text and src attributes
  • Code blocks with language hints

What dies:

  • Tables. The HTML-to-markdown converter (Turndown.js in the real npm package) has no conversion rules for tables. All tabular data loses its structure entirely.
  • Everything in <head>. Meta descriptions, Open Graph tags, canonical URLs, JSON-LD schema markup. All invisible.
  • CSS and layout. Visual hierarchy, colors, spacing. Gone.
  • JavaScript output. If your content is rendered client-side, Claude sees an empty or near-empty document.

The schema markup finding is counterintuitive. SEOs have spent years adding structured data to help machines understand their content. But Claude's content pipeline strips <head> entirely before the model sees anything. Your schema helps Google. It does not help Claude.

The table finding is the one that should change how you write. If you are presenting data in HTML tables (comparison charts, pricing grids, feature matrices), that information is lost in AI processing. Convert critical tabular data to descriptive lists or inline statements if you want AI tools to understand it.

The 900-character window

The Rust port reveals specific content preview limits: 900 characters for general content summaries, 600 for title-focused queries. After 100,000 characters of markdown, a summarization model condenses everything further.

In practical terms, AI reads your first two to three paragraphs in full detail. Everything after that gets progressively compressed. The first 900 characters of your page's visible text content are by far the most important for AI citation.

This reinforces what good SEO practice already recommends: answer the query in the first paragraph. But it adds a mechanical urgency to that advice. It is not just about user experience or featured snippets anymore. It is about the literal architecture of how AI processes your content. Front-load your key claims, your differentiating data, and your most quotable statements.

Recency bias is built into the system

The WebSearch system prompt includes this instruction: "Use the current month/year in search queries." Every search Claude performs is date-stamped with the current period.

This is an explicit recency bias baked into the architecture. Content published or updated recently has a structural advantage in Claude's search results, not because of a quality signal, but because the queries themselves filter for it.

For content teams, this means the publish date and last-updated date on your pages are not just cosmetic. They affect whether AI search surfaces your content. Regular content refreshes have always been good SEO practice. Now they are an AI visibility requirement.

JavaScript is invisible

Claude's WebFetch tool does not execute JavaScript. It fetches the raw HTML response and processes it through the markdown conversion pipeline. If your page relies on client-side rendering (React SPAs without SSR, dynamically loaded content, infinite scroll), Claude sees whatever the server returns before JavaScript runs.

For many modern web applications, that means Claude sees a loading spinner or an empty div.

This is not unique to Claude. Most AI crawlers and tools have the same limitation. But the source code confirms it explicitly. There is no headless browser, no Puppeteer, no JavaScript execution environment in the content pipeline.

Server-side rendering is not optional for AI visibility. If your framework supports it (Next.js, Nuxt, Remix, Astro), use it. If it does not, ensure that the critical content your page targets is present in the initial HTML response.

The domain preflight API

For domains not on the pre-approved list, Claude makes a check to api.anthropic.com/api/web/domain_info?domain=your-domain before fetching content. This endpoint returns whether the domain is allowed, blocked, or requires user confirmation.

This is a dynamic allowlist maintained by Anthropic, separate from the hardcoded pre-approved list. We do not know the full criteria for inclusion, but its existence means there are three tiers of domain treatment:

  1. Pre-approved (131 domains). Full extraction, no checks, no limits.
  2. Dynamically allowed. Passes the API check, but subject to 125-character quote limits.
  3. Blocked or unknown. User must confirm before Claude can fetch.

Your robots.txt also matters here. If your robots.txt blocks ClaudeBot or the relevant AI crawlers, that is the first gate. The domain preflight API is the second gate. Pre-approved status skips both.

MCP tools bypass the restrictions

Here is the finding that most people will overlook. The 125-character quote limit and the content processing rules are enforced in the client, not the server. When Claude accesses content through MCP (Model Context Protocol) tools rather than WebFetch, these restrictions do not apply.

MCP tools can return up to 400,000 characters of content per result. No quote limits. No paraphrasing requirements. The copyright and content restrictions are conventions in the system prompt, not technical enforcement.

This means the content extraction rules create a two-speed system. Standard web browsing is tightly restricted. Tool-based access (MCP, API integrations, direct data feeds) is essentially unrestricted. For businesses building AI integrations, this is a significant architectural detail.

What this means for your SEO strategy

The Claude Code source confirms several things that were previously theoretical:

AI search amplifies existing rankings. With only 8 result slots and no re-ranking, the pages that rank well in conventional search are the same pages AI assistants will cite. SEO is not less important in an AI search world. It is the prerequisite.

Content structure matters more than metadata. Schema markup, meta descriptions, and Open Graph tags are invisible to AI content processing. What matters is the text content itself: clear headings, front-loaded answers, quotable statements under 125 characters.

Speed of content processing beats depth. AI reads your first 900 characters carefully and compresses everything after. The most important information needs to be at the top of the page, not buried in section seven.

Server-side rendering is mandatory. JavaScript-dependent content is invisible. This is no longer a progressive enhancement discussion. It is a visibility requirement.

Freshness is mechanically rewarded. Date-stamped queries mean recently published or updated content has a structural advantage in AI search results.

The 10-point AI visibility checklist

Based on our analysis of both codebases, here are the concrete actions that improve your visibility in AI-assisted search:

  1. Ensure your site renders server-side. No client-side-only content for pages you want AI to find.
  2. Front-load your key content. The first 900 characters of visible text are what AI reads most carefully.
  3. Write quotable statements. Key claims and data points should work as standalone sentences under 125 characters.
  4. Convert tables to prose or lists. HTML tables lose all structure in AI processing. Use descriptive text for critical data.
  5. Update your robots.txt. Allow ClaudeBot, GPTBot, and other AI crawlers. Blocking them makes you invisible to AI search entirely.
  6. Keep content fresh. Update publish dates when you refresh content. AI search queries are date-stamped.
  7. Prioritize HTTPS. Claude auto-upgrades HTTP to HTTPS. If your HTTPS is broken or misconfigured, the page fails silently.
  8. Do not rely on schema for AI. JSON-LD and meta tags help Google. They are invisible to Claude's content pipeline. Ensure your key information exists as visible text.
  9. Win conventional search first. With only 8 result slots, AI visibility starts with ranking well in traditional search engines.
  10. Check your AI trust status. Use the free AI Domain Trust Checker to see how Claude classifies your domain and whether AI crawlers can access your site.

Check your domain now

We built the AI Domain Trust Checker to make this analysis actionable. Enter your domain and get:

  • Your trust tier (pre-approved, allowed, partial, or issues detected)
  • Whether you are on the 131-domain pre-approved list
  • AI crawler compliance across 12 major crawlers (ClaudeBot, GPTBot, GoogleOther, PerplexityBot, and more)
  • HTTPS validation
  • Prioritized recommendations for improving your AI visibility

The tool is free, no account required. The pre-approved domain list is fully searchable and filterable.

AI search is not replacing traditional search. It is layering on top of it, with its own rules and its own architecture. The Claude Code source is the first time we have been able to read those rules directly. The sites that adapt to them early will have a compounding advantage as AI-assisted search grows.

Keyword data, site audits, and rankings from Google APIs inside your AI assistant.

Try Ooty SEOView pricing
Share
Maya Torres
Maya Torres

SEO Strategist at Ooty. Covers search strategy, GEO, and agentic SEO.

Continue reading

1 Mar 2026

GEO vs SEO: The 76.1% Overlap Nobody Talks About

AI referral traffic grew 527% between January and May 2025. ChatGPT referrals convert at 15.9%, nine times Google organic's 1.76%. And 76.1% of AI Overview citations come from pages already ranking in Google's top 10. Those three numbers tell you everything yo

8 Nov 2025

SEO in the Age of AI Search: What Actually Changed

SEO isn't dead. It's not even close to dead. Here's what actually happened: Google added an AI layer on top of its existing search results. ChatGPT, Perplexity, and Gemini all run queries against Google (a process called query fan-out), pull the top-ranking pa

12 Oct 2025

Zero-Click Searches: 58.5% of Google Queries End Without a Click

More than half of all Google searches in the US now end without anyone clicking a single result. According to SparkToro and Datos, 58.5% of US Google searches in 2024 were zero-click. For every 1,000 searches, only 360 clicks went to the open web. That number

On this page

  • The search engine correction everyone got wrong
  • Only 8 slots exist
  • The two-tier web: pre-approved domains get preferential treatment
    • Who is on the list
    • Who is not on the list
  • What survives the HTML pipeline (and what does not)
  • The 900-character window
  • Recency bias is built into the system
  • JavaScript is invisible
  • The domain preflight API
  • MCP tools bypass the restrictions
  • What this means for your SEO strategy
  • The 10-point AI visibility checklist
  • Check your domain now