The standard technical SEO audit checks crawlability, indexability, web site pace, mobile-friendliness, and structured knowledge. That guidelines was designed for one client: Googlebot.
That is the way it’s all the time been.
In 2026, your web site has, at the very least, a dozen further non-human customers. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot practice fashions and energy AI search outcomes. Consumer-triggered brokers just like the newly introduced Google-Agent, or its “siblings” Claude-Consumer and ChatGPT-Consumer, browse web sites on behalf of particular people in actual time. A Q1 2026 analysis throughout Cloudflare’s community discovered that 30.6% of all internet site visitors now comes from now bots, with AI crawlers and brokers making up a rising share. Your technical audit must account for all of them.
Listed below are the 5 layers so as to add to your present technical web optimization audit.
Layer 1: AI Crawler Entry
Your robots.txt was most likely written for Googlebot, Bingbot, and possibly just a few scrapers. AI crawlers want their very own robots.txt guidelines, and so they must be separate from Googlebot and Bingbot.
What To Examine
Review your robots.txt for rules focusing on AI-specific person brokers: GPTBot, ClaudeBot, PerplexityBot, Google-Prolonged, Bytespider, AppleBot-Prolonged, CCBot, and ChatGPT-Consumer. If none of those seem, you’re operating on defaults, and people defaults won’t replicate what you really need. By no means settle for the defaults until you already know they’re precisely what you want.
The secret is making a acutely aware choice per crawler moderately than blanket permitting or blocking every part. Not all AI crawlers serve the identical objective. AI crawler site visitors could be break up into three classes: coaching crawlers that accumulate knowledge for mannequin coaching (89.4% of AI crawler site visitors in line with Cloudflare knowledge), search crawlers that energy AI search outcomes (8%), and user-triggered brokers like Google-Agent and ChatGPT-Consumer that browse on behalf of a selected human in actual time (2.2%). Every class warrants a special robots.txt choice.

The crawl-to-referral ratios from Cloudflare’s Radar report could make this an knowledgeable choice for you. Anthropic’s ClaudeBot crawls 20.6 thousand pages for each single referral it returns. OpenAI’s ratio is 1,300:1. Meta sends no referrals. Blocking OpenAI’s OAI-SearchBot or PerplexityBot reduces your visibility in ChatGPT Search and Perplexity’s AI solutions. Blocking training-focused crawlers like CCBot or Meta’s crawler prevents knowledge extraction from a supplier that sends zero site visitors again. The crawl-to-referral ratios inform you who’s taking with out giving.
There’s one crawler that requires particular consideration. Google added Google-Agent to its official listing of user-triggered fetchers on March 20, 2026. Google-Agent identifies requests from AI methods operating on Google infrastructure that browse web sites on behalf of customers. Not like conventional crawlers, Google-Agent ignores robots.txt. Google’s place is that since a human initiated the request, the agent acts as a person proxy moderately than an autonomous crawler. Blocking Google-Agent requires server-side authentication, not robots.txt guidelines. That is each attention-grabbing, and vital for the longer term, even when it’s not inside the scope of this text.
Official documentation for every crawler:
Layer 2: JavaScript Rendering
Googlebot renders JavaScript utilizing headless Chromium. There’s nothing new about that. What’s new and completely different is that virtually every major AI crawler does not render JavaScript.
| Crawler | Renders JavaScript |
|---|---|
| GPTBot (OpenAI) | No |
| ClaudeBot (Anthropic) | No |
| PerplexityBot | No |
| CCBot (Frequent Crawl) | No |
| AppleBot | Sure |
| Googlebot | Sure |
AppleBot (which makes use of a WebKit-based renderer) and Googlebot are the one main crawlers that render JavaScript. 4 of the six main internet crawlers (GPTBot, ClaudeBot, PerplexityBot, and CCBot) fetch static HTML solely, making server-side rendering a requirement for AI search visibility, not an optimization. In case your content material lives in client-side JavaScript, it’s invisible to the crawlers coaching OpenAI, Anthropic, and Perplexity’s fashions and powering their AI search merchandise.
What To Examine
Run curl -s [URL] in your important pages and search the output for key content material like product names, costs, or service descriptions. If that content material isn’t within the curl response, GPTBot, ClaudeBot, and PerplexityBot can’t see it both. Alternatively, use View Supply in your browser (not Examine Component, which reveals the rendered DOM after JavaScript execution) and test whether or not the vital info is current within the uncooked HTML.

Single-page functions (SPAs) constructed with React, Vue, or Angular are notably in danger until they use server-side rendering (SSR) or static web site technology (SSG). A React SPA that renders product descriptions, pricing, or key claims solely on the shopper facet is sending AI crawlers a clean web page with a hyperlink to the JavaScript bundle.
The repair isn’t sophisticated. Server-side rendering (SSR), static web site technology (SSG), or pre-rendering solves this for each main framework. Subsequent.js helps SSR and SSG natively for React, Nuxt supplies the identical for Vue, and Angular Common handles server rendering for Angular functions. The audit simply must flag which pages depend upon client-side JavaScript for important content material.
Layer 3: Structured Knowledge For AI
Structured data has been a part of technical web optimization audits for years, however the analysis standards want updating. The query is not simply “does this web page have schema markup?” It’s “does this markup assist AI methods perceive and cite this content material?”
What To Examine
- JSON-LD implementation (most well-liked over Microdata and RDFa for AI parsing).
- Schema sorts that transcend the fundamentals: Group, Article, Product, FAQ, HowTo, Individual.
- Entity relationships: sameAs, writer, writer connections that hyperlink your content material to identified entities.
- Completeness: are all related properties populated, or are you simply checking a field utilizing skeleton schemas with title and URL?
Why This Issues Now
Microsoft’s Bing principal product supervisor Fabrice Canel confirmed in March 2025 that schema markup helps LLMs understand content for Copilot. The Google Search crew said in April 2025 that structured knowledge offers a bonus in search outcomes.
No, you’ll be able to’t win with schema alone. Sure, it could possibly assist.
The information density angle issues too. The GEO research paper by Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi (offered at ACM KDD 2024, first to publicly use the time period “GEO”) discovered that including statistics to content material improved AI visibility by 41%. Yext’s analysis discovered that data-rich web sites earn 4.3x extra AI citations than directory-style listings. Structured knowledge contributes to knowledge density by giving AI methods machine-readable info moderately than requiring them to extract which means from prose.
An vital caveat: No peer-reviewed educational research exist but on schema’s affect on AI quotation charges particularly. The business knowledge is promising and constant, however deal with these numbers as indicators moderately than ensures.
W3Techs reports that roughly 53% of the highest 10 million web sites use JSON-LD as of early 2026. In case your web site isn’t amongst them, you’re lacking indicators that each conventional and AI search methods use to know your content material.
Duane Forrester, who helped construct Bing Webmaster Instruments and co-launched Schema.org, argues that schema markup is simply the 1st step. As AI brokers proceed transferring from merely decoding pages to creating choices, manufacturers can even have to publish operational reality (pricing, insurance policies, constraints) in machine-verifiable codecs with versioning and cryptographic signatures. Publishing machine-verifiable supply packs is past the scope of a typical audit at present, however auditing structured knowledge completeness and accuracy is the inspiration verified supply packs construct on.
Layer 4: Semantic HTML And The Accessibility Tree
The primary three layers of the AI-readiness audit cowl crawler entry (robots.txt), JavaScript rendering, and structured knowledge. The ultimate two tackle how AI brokers really learn your pages and what indicators assist them uncover and consider your content material.
Most SEOs consider HTML for search engine consumption. Agentic browsers like ChatGPT Atlas, Chrome with auto browse, and Perplexity Comet don’t parse pages the best way Googlebot does. They learn the accessibility tree as an alternative.
The accessibility tree is a parallel illustration of your web page that browsers generate out of your HTML. It strips away visible styling, format, and ornament, maintaining solely the semantic construction: headings, hyperlinks, buttons, kind fields, labels, and the relationships between them. Display readers like VoiceOver and NVDA have used the accessibility tree for many years to make web sites usable for individuals with visible impairments. AI brokers now use the identical tree to know and work together with internet pages.
And the reason being easy: effectivity. Processing screenshots is each dearer and slower than working with the accessibility tree.

This issues as a result of the accessibility tree exposes what your HTML really communicates, not what your CSS (or JS) makes it seem like. A
Microsoft’s Playwright MCP, the usual instrument for connecting AI fashions to browser automation, makes use of accessibility snapshots moderately than uncooked HTML or screenshots. Playwright MCP’s browser_snapshot operate returns an accessibility tree illustration as a result of it’s extra compact and semantically significant for LLMs. OpenAI’s documentation states that ChatGPT Atlas uses ARIA tags to interpret web page construction when looking web sites.
Web accessibility and AI agent compatibility at the moment are the identical self-discipline. Correct heading hierarchy (H1-H6) creates significant sections that AI methods use for content material extraction. Semantic components like
, , and inform machines what function every content material block performs. Type labels and descriptive button textual content make interactive components comprehensible to brokers that parse the accessibility tree as an alternative of rendering visible design.What To Examine
- Heading hierarchy: logical H1-H6 construction that machines can use to know content material relationships.
- Semantic components: nav, foremost, article, part, apart, header, footer, used appropriately.
- Type inputs: each enter has a label, each button has descriptive textual content.
- Interactive components: clickable issues use
or, not.- Accessibility tree: run a Playwright MCP snapshot or check with VoiceOver/NVDA to see what brokers really see.
One way or the other, issues are getting worse on this entrance. The WebAIM Million 2026 report discovered that the typical internet web page now has 56.1 accessibility errors, up 10.1% from 2025.
ARIA (Accessible Wealthy Web Functions) utilization elevated 27% in a single yr. ARIA is a set of HTML attributes that add additional semantic info to components, telling display readers and AI brokers issues like “this div is definitely a dialog” or “this listing capabilities as a menu.” However what’s important is that this: pages with ARIA current had considerably extra errors (59.1 on common) than pages with out ARIA (42 on common). Including ARIA with out understanding it makes issues worse, not higher, as a result of incorrect ARIA overrides the browser’s default accessibility tree interpretation with improper info. Begin with correct semantic HTML. Add ARIA solely when native components aren’t adequate.
Technical SEOs don't have to develop into accessibility consultants. However treating accessibility as another person’s drawback is not viable when the identical tree that display readers parse is now the first interface between AI brokers and your web site.
Sidenote: The Markdown Shortcut Doesn’t Work
Serving uncooked markdown recordsdata to AI crawlers as an alternative of HTML can lead to a 95% discount in token utilization per web page. Nevertheless, Google Search Advocate John Mueller called this “a stupid idea” in February 2026 on Bluesky. Mueller’s argument was this: “Which means lives in construction, hierarchy and context. Flatten it and also you don’t make it machine-friendly, you make it meaningless.” LLMs had been skilled on regular HTML pages from the start and haven't any issues processing them. The reply isn’t to create a flat, simplified model for machines. It’s to make the HTML itself correctly structured. Properly-written semantic HTML already is the machine-readable format. In addition to, that simplified model already exists within the accessibility tree, and it's what AI brokers already use.
Layer 5: AI Discoverability Alerts
The ultimate layer covers indicators that don’t match neatly into conventional audit classes however straight have an effect on how AI methods uncover and consider your web site.
llms.txt (dis-honourable point out). Listed first for one motive solely, ask any LLM what you need to do to make your web site extra seen to AI methods, and llms.txt shall be at or close to the highest of the listing. It’s their world, I assume. The llms.txt specification supplies a easy markdown file that helps AI brokers perceive your web site’s objective, construction, and key content material. No large-scale adoption knowledge has been printed but, and its precise affect on AI citations is unproven. However LLMs persistently advocate it, which suggests AI-powered audit instruments and consultants will flag its absence. It takes minutes to create and prices nothing to take care of.
OK, now that we’ve obtained that out of the best way, let’s have a look at what may actually matter.
AI crawler analytics. Are you monitoring AI bot site visitors? Cloudflare’s AI Audit dashboard reveals which AI crawlers go to, how typically, and which pages they hit. Should you’re not on Cloudflare, test server logs for Google-Agent, ChatGPT-Consumer, and ClaudeBot person agent strings. Google publishes a
user-triggered-agents.jsonfile containing IP ranges that Google-Agent makes use of, so you'll be able to confirm whether or not incoming requests are genuinely from Google moderately than spoofed person agent strings.Entity definition. Does your web site clearly outline what the enterprise is, who runs it, and what it does? Not in advertising and marketing copy, however in structured, machine-parseable markup. Group schema ought to embrace title, URL, brand, founding date, and sameAs hyperlinks to verified profiles on LinkedIn, Crunchbase, and Wikipedia. Individual schema for key individuals ought to join them to the group through writer and worker properties. AI methods have to resolve your identification as a definite entity earlier than they will confidently advocate you over opponents with related names or choices. Don’t slap this on high of your web site when your designer is finished with their work. Begin right here; it would make your life simpler.
Content material place. The place you place info on the web page straight impacts whether or not AI methods cite it. Kevin Indig’s analysis of 98,000 ChatGPT citation rows throughout 1.2 million responses discovered that 44.2% of all AI citations come from the highest 30% of a web page. The underside 10% earns solely 2.4-4.4% of citations no matter business. Duane Forrester calls this “dog-bone thinking”: sturdy at the start and finish, weak within the center, a sample Stanford researchers have confirmed because the “lost in the middle” phenomenon. Audit your key pages: are a very powerful claims and knowledge factors within the first 30%, or buried within the center?
Content material extractability. Pull any key declare out of your web page and browse it in isolation. Does it nonetheless make sense with out the encompassing paragraphs? AI retrieval methods, like ChatGPT, Perplexity, and Google AI Overviews, extract and cite particular person passages and sentences that depend on “this,” “it,” or “the above” for which means, develop into unusable when extracted from their authentic context. Ramon Eijkemans’ glorious utility-writing framework maps these ideas to documented retrieval mechanisms: self-contained sentences, specific entity relationships, and quotable anchor statements that AI methods can confidently cite with out further inference.
The Audit Guidelines
Examine Device/Methodology What You’re Wanting For AI crawler robots.txt Handbook evaluation Acutely aware per-crawler choices JavaScript rendering curl, View Supply, Lynx browser Important content material in static HTML Structured knowledge Schema validator, Wealthy Outcomes Check Full, linked JSON-LD Semantic HTML axe DevTools, Lighthouse Correct components, heading hierarchy Accessibility tree Playwright MCP snapshot, display reader What brokers really see AI bot site visitors Cloudflare, server logs Quantity, pages hit, patterns From Audit To Motion
This audit identifies gaps. Fixing them requires a sequence, as a result of some fixes depend upon others. Optimizing content material construction earlier than establishing a machine-readable identification means brokers can extract your info, however can’t confidently attribute it to your model. I wrote Machine-First Architecture to supply that sequence: identification, construction, content material, interplay, every pillar constructing on the earlier one.
Why Technical web optimization Audit Is The place This Belongs
None of that is technically web optimization. Robots.txt guidelines for AI crawlers don’t have an effect on Google rankings. Accessibility tree optimization doesn’t transfer key phrase positions. Content material place scoring has nothing to do with search indexing.
However most of it did develop out of technical web optimization. Crawl administration, structured knowledge, semantic HTML, JavaScript rendering, server log evaluation: these are expertise technical SEOs have already got. The audit methodology transfers straight. The buyer it serves is what modified.
The web sites that get cited in AI responses, that work when Chrome auto browse visits them, that present up when somebody asks ChatGPT for a suggestion, they received’t be those with the very best content material alone. They’ll be those whose technical basis made that content material accessible to machines. Technical SEOs are the individuals finest outfitted to construct that basis. The previous audit template simply wants a brand new part to replicate it.
Extra Assets:
Featured Picture: Anton Vierietin/Shutterstock
#Technical #web optimization #Audit #Layer

