What the Claude Code Source Reveals About How AI Actually Searches
We analyzed the Claude Code source code. Here is what it reveals about AI search architecture, content extraction, and which websites get preferential treatment.
By Maya Torres
We analyzed the Claude Code source code. Here is what it reveals about AI search architecture, content extraction, and which websites get preferential treatment.
By Maya Torres
On March 31, 2026, a clean-room reimplementation of Claude Code hit GitHub and collected 50,000 stars in two hours. Alongside the Rust port, the actual npm package (@anthropic-ai/claude-code) has been sitting in node_modules since release, minified but readable.
We analyzed both. Not because the code itself is particularly surprising, but because it is the first time anyone outside Anthropic can see exactly how a major AI assistant searches the web, processes HTML, and decides what to cite. If you work in SEO, this is the closest thing to reading the ranking algorithm of the next generation of search.
Here is what we found, and what it means for your content strategy.
Within hours of the leak, the narrative across Reddit and Twitter was that Claude Code uses DuckDuckGo for web search. That is wrong, or at least deeply misleading.
The open-source Rust port does use DuckDuckGo. It makes direct HTTP requests to html.duckduckgo.com/html/, parses the results, and feeds them to the model. That is a standalone replacement built by the reimplementation author, not Anthropic's architecture.
The real Claude Code works differently. When it searches the web, it creates a nested API call to Anthropic's messages endpoint with a tool type called web_search_20250305. The search happens server-side on Anthropic's infrastructure. Results come back with encrypted_content and encrypted_index fields. The search provider is never disclosed.
Third-party reports have suggested Brave Search API, but Anthropic has not confirmed this publicly. The honest answer is: we do not know which search engine powers it. What we do know is the architecture around it, and that architecture has real implications for anyone trying to be visible in AI-generated answers.
Claude Code limits itself to 8 search results per query. No pagination. No follow-up searches to expand the result set (unless the model explicitly decides to search again, which counts against the same limit). There is a hard cap of 8 uses of the search tool per API request.
Compare that to a human using Google, who might scan 30 results across three pages before choosing sources. Claude gets 8 shots and works with what it has.
The implication is straightforward. If your page is not in the top 8 results from whatever search engine Anthropic uses, it does not exist to Claude. Position 9 is functionally identical to position 900. This makes traditional SEO ranking more important for AI visibility, not less. The pages that already rank well in conventional search are the ones AI assistants will cite.
This is the finding that matters most for content strategy.
Claude Code maintains a hardcoded list of approximately 131 "pre-approved" domains. When it fetches content from one of these domains, three things happen differently:
api.anthropic.com/api/web/domain_info that it normally makes to verify whether a domain is allowed.For every other website on the internet, a different extraction prompt applies: "Enforce a strict 125-character maximum for quotes from any source document."
Read that again. Pre-approved domains get unlimited extraction. Everyone else gets 125 characters.
We compiled the full list from both the TypeScript source structure and the npm package. The pattern is unmistakable:
The complete list runs to about 131 domains. Every single one is developer documentation or a canonical technical reference.
Zero marketing sites. Zero e-commerce platforms. Zero news outlets. Zero SEO tools. Zero general content publishers.
If you run a SaaS product, an agency blog, a media publication, or any website that is not developer documentation, your content is subject to the 125-character extraction limit. Claude can still cite you, but it can only quote tiny fragments of what you wrote.
There is no submission process. The list is manually curated and hardcoded. For domains not on the list, Claude falls back to a dynamic API check at api.anthropic.com/api/web/domain_info?domain=your-domain to determine whether fetching is allowed at all.
You can check your domain's AI trust status for free using our AI Domain Trust Checker, which cross-references the pre-approved list, verifies your robots.txt against 12 AI crawlers, and checks HTTPS validity.
When Claude fetches a web page, it does not read your HTML the way a browser does. The content goes through a processing pipeline that strips most of what web developers consider important.
What survives:
#, ##, ###)What dies:
<head>. Meta descriptions, Open Graph tags, canonical URLs, JSON-LD schema markup. All invisible.The schema markup finding is counterintuitive. SEOs have spent years adding structured data to help machines understand their content. But Claude's content pipeline strips <head> entirely before the model sees anything. Your schema helps Google. It does not help Claude.
The table finding is the one that should change how you write. If you are presenting data in HTML tables (comparison charts, pricing grids, feature matrices), that information is lost in AI processing. Convert critical tabular data to descriptive lists or inline statements if you want AI tools to understand it.
The Rust port reveals specific content preview limits: 900 characters for general content summaries, 600 for title-focused queries. After 100,000 characters of markdown, a summarization model condenses everything further.
In practical terms, AI reads your first two to three paragraphs in full detail. Everything after that gets progressively compressed. The first 900 characters of your page's visible text content are by far the most important for AI citation.
This reinforces what good SEO practice already recommends: answer the query in the first paragraph. But it adds a mechanical urgency to that advice. It is not just about user experience or featured snippets anymore. It is about the literal architecture of how AI processes your content. Front-load your key claims, your differentiating data, and your most quotable statements.
The WebSearch system prompt includes this instruction: "Use the current month/year in search queries." Every search Claude performs is date-stamped with the current period.
This is an explicit recency bias baked into the architecture. Content published or updated recently has a structural advantage in Claude's search results, not because of a quality signal, but because the queries themselves filter for it.
For content teams, this means the publish date and last-updated date on your pages are not just cosmetic. They affect whether AI search surfaces your content. Regular content refreshes have always been good SEO practice. Now they are an AI visibility requirement.
Claude's WebFetch tool does not execute JavaScript. It fetches the raw HTML response and processes it through the markdown conversion pipeline. If your page relies on client-side rendering (React SPAs without SSR, dynamically loaded content, infinite scroll), Claude sees whatever the server returns before JavaScript runs.
For many modern web applications, that means Claude sees a loading spinner or an empty div.
This is not unique to Claude. Most AI crawlers and tools have the same limitation. But the source code confirms it explicitly. There is no headless browser, no Puppeteer, no JavaScript execution environment in the content pipeline.
Server-side rendering is not optional for AI visibility. If your framework supports it (Next.js, Nuxt, Remix, Astro), use it. If it does not, ensure that the critical content your page targets is present in the initial HTML response.
For domains not on the pre-approved list, Claude makes a check to api.anthropic.com/api/web/domain_info?domain=your-domain before fetching content. This endpoint returns whether the domain is allowed, blocked, or requires user confirmation.
This is a dynamic allowlist maintained by Anthropic, separate from the hardcoded pre-approved list. We do not know the full criteria for inclusion, but its existence means there are three tiers of domain treatment:
Your robots.txt also matters here. If your robots.txt blocks ClaudeBot or the relevant AI crawlers, that is the first gate. The domain preflight API is the second gate. Pre-approved status skips both.
Here is the finding that most people will overlook. The 125-character quote limit and the content processing rules are enforced in the client, not the server. When Claude accesses content through MCP (Model Context Protocol) tools rather than WebFetch, these restrictions do not apply.
MCP tools can return up to 400,000 characters of content per result. No quote limits. No paraphrasing requirements. The copyright and content restrictions are conventions in the system prompt, not technical enforcement.
This means the content extraction rules create a two-speed system. Standard web browsing is tightly restricted. Tool-based access (MCP, API integrations, direct data feeds) is essentially unrestricted. For businesses building AI integrations, this is a significant architectural detail.
The Claude Code source confirms several things that were previously theoretical:
AI search amplifies existing rankings. With only 8 result slots and no re-ranking, the pages that rank well in conventional search are the same pages AI assistants will cite. SEO is not less important in an AI search world. It is the prerequisite.
Content structure matters more than metadata. Schema markup, meta descriptions, and Open Graph tags are invisible to AI content processing. What matters is the text content itself: clear headings, front-loaded answers, quotable statements under 125 characters.
Speed of content processing beats depth. AI reads your first 900 characters carefully and compresses everything after. The most important information needs to be at the top of the page, not buried in section seven.
Server-side rendering is mandatory. JavaScript-dependent content is invisible. This is no longer a progressive enhancement discussion. It is a visibility requirement.
Freshness is mechanically rewarded. Date-stamped queries mean recently published or updated content has a structural advantage in AI search results.
Based on our analysis of both codebases, here are the concrete actions that improve your visibility in AI-assisted search:
We built the AI Domain Trust Checker to make this analysis actionable. Enter your domain and get:
The tool is free, no account required. The pre-approved domain list is fully searchable and filterable.
AI search is not replacing traditional search. It is layering on top of it, with its own rules and its own architecture. The Claude Code source is the first time we have been able to read those rules directly. The sites that adapt to them early will have a compounding advantage as AI-assisted search grows.
SEO Strategist at Ooty. Covers search strategy, GEO, and agentic SEO.
AI referral traffic grew 527% between January and May 2025. ChatGPT referrals convert at 15.9%, nine times Google organic's 1.76%. And 76.1% of AI Overview citations come from pages already ranking in Google's top 10. Those three numbers tell you everything yo
SEO isn't dead. It's not even close to dead. Here's what actually happened: Google added an AI layer on top of its existing search results. ChatGPT, Perplexity, and Gemini all run queries against Google (a process called query fan-out), pull the top-ranking pa
More than half of all Google searches in the US now end without anyone clicking a single result. According to SparkToro and Datos, 58.5% of US Google searches in 2024 were zero-click. For every 1,000 searches, only 360 clicks went to the open web. That number