A Technical AI Search Audit

A Technical AI Search Audit

This put up was sponsored by JetOctopus. The opinions expressed on this article are the sponsor’s personal.

How do I optimize my website for ChatGPT and Perplexity, not simply Google?

How do I do know if AI bots are literally crawling my website?

How ought to my technical search engine optimisation technique change for AI Search?

A good portion of your website’s search impressions in 2026 are generated by machines researching on behalf of humans.

These machines don’t care about your key phrase rankings. They care whether or not your:

  • HTML hundreds cleanly in underneath 200 milliseconds
  • Product element web page is reachable in fewer than 4 clicks
  • Content material solutions a selected, nine-word query that has by no means appeared in any key phrase analysis instrument in your profession.

This isn’t hypothesis. It’s what our server log knowledge throughout lots of of enterprise web sites is exhibiting us, persistently, since mid-2025.

What’s Truly Taking place On Your Website

My colleague, Stan, flagged a sample in a Slack message: question lengths have been rising at charges that didn’t correlate with human habits.

A 161% progress price in 10-word queries year-over-year just isn’t pushed by customers who instantly acquired extra verbose. It’s pushed by AI brokers decomposing a single consumer immediate into dozens of parallel sub-queries, a course of researchers now name “fan-out.”

Question Size Development in 2025

Picture created by JetOctopus, Aggregated GSC knowledge throughout lots of of enterprise properties, 2025

The gradient is the inform. Human search habits doesn’t scale this cleanly by phrase depend. Machines do. By October 2025, 7-plus-word queries reached almost 1% of whole question quantity, roughly triple their historic share.

Extra revealing than the amount is the CTR. Whereas impression counts for 10-word queries spiked 161%, click-through price collapsed to 2.26%, down from 8–11% in 2023.

The AI reads your web page, extracts the reply, synthesizes it for the consumer. Your website by no means will get the go to.

We name these “phantom impressions.” They’re actual indicators that your content material is being evaluated inside AI reasoning chains. If you happen to’re filtering them out of your reporting as a result of they don’t drive site visitors, you might be flying blind.

The Three Bots Visiting Your Website & Their Affect On SERP Visibility

Not all AI crawlers are equal, and treating them as a single class is the primary mistake most technical SEOs make.

Training bots crawl broadly and ignore click on depth. A coaching go to means the AI is aware of your content material exists, not that customers will ever see it.

AI search bots drop off shortly past two or three clicks from the homepage and sometimes go to every web page solely as soon as a month.

AI consumer bots are initiated when an actual individual asks a query in ChatGPT, Perplexity, or Claude, and the AI researches the reply on their behalf. These are the one visits that translate to precise AI visibility.

Bot KindWhat Triggers ItCrawl DepthAffect on AI Visibility
Coaching botsMannequin training cyclesDeep — ignores click on distanceNone immediately. Consciousness solely.
AI search botsNew URL discovery & contemporary content materialShallow — ~1 go to/month past 2–3 clicksImportant gatekeeper. If it misses a web page, consumer bots gained’t discover it both.
AI user botsActual consumer question in ChatGPT / Claude / PerplexitySelective — pushed by velocity and constructionExcessive. Closest proxy to an AI impression.

Your website can obtain heavy crawling from coaching and search bots and nonetheless be fully absent from AI-generated solutions. If you happen to’re not segmenting AI bot site visitors by kind in your log analysis, you haven’t any thought which third of the iceberg you’re measuring.

Which search engine optimisation Indicators Do LLMs Respect?

Robots.txt is your main lever.

Most main AI platforms (ChatGPT, Claude, Gemini) observe robots.txt directives. Perplexity is a partial exception: PerplexityBot respects robots.txt, however Perplexity-Person, the user-triggered bot, doesn’t. Cloudflare confirmed this in an investigation. Most websites haven’t audited their robots.txt with AI entry in thoughts. Do it.

Sitemaps are broadly supported.

ChatGPT, Claude, and PerplexityBot all use XML sitemaps for URL discovery. Preserve them correct.

Indicators Greatest Saved For search engine optimisation & Rating Efforts

These indicators under don’t seem to affect AI visibility, however are nonetheless key for rating for queries that also set off conventional SERPs.

Canonical tags and noindex directives do nothing for AI bots.

AI crawlers don’t construct a search index, so that they haven’t any use for these meta-signals. Content material hidden from Google utilizing noindex is totally seen to ChatGPT’s crawler.

LLM.txt does nothing.

Our log knowledge exhibits main AI bots don’t learn this file. Don’t make investments time right here.

JavaScript rendering is a vital blind spot.

Most AI crawlers (ChatGPT, Claude, Perplexity) don’t render JavaScript. In case your product pages load key content material client-side, these brokers learn an empty shell. Server-side rendering is the one structure that works universally. The exception is Google Gemini, which makes use of the identical Internet Rendering Service as Googlebot.

How To Make Certain ChatGPT, Perplexity & LLMs Can Attain Your Content material

AI search bots go to deep pages roughly as soon as a month and drop off sharply past three clicks from the homepage. The pages with essentially the most particular, answerable data are sometimes the toughest for brokers to achieve.

The repair: Elevate your most respected deep pages via inside linking, making certain they’re reachable inside 4 clicks.

Pages crawled by coaching bots however by no means reached by consumer bots are your highest-priority targets. Pages AI consumer bots go to often are telling you what to scale: extra content material masking the identical subject cluster and depth.

Optimize Content material For Longer, Fan-Out Queries

95% of the queries driving AI citations have zero month-to-month search quantity. They’re artificial sub-queries generated by AI fashions. However they present up in GSC: impressions, no clicks, question lengths you’d by no means goal voluntarily.

How To Discover Fan Out Question Alternatives

To floor fan out queries which might be value chasing, join your GSC API to JetOctopus (to bypass the 1,000-row UI limit) and filter for: question size better than 7 phrases, impressions underneath 50, clicks at 0, during the last 3 months. That’s your Fan-Out Opportunity Matrix, the precise questions AI brokers are asking about your content material.

Immediate Sorts That Fan Out Most

Picture created by JetOctopus, 2025

In case your content material isn’t structured to reply listing and comparability queries, with specific rankings, professionals/cons, and side-by-side specs, you’re leaving the very best fan-out floor space unoptimized.

“Product overview” intent queries surged from 239 in June 2025 to over 40,000 by September 2025. That 16,000% enhance was AI brokers systematically harvesting structured opinion knowledge. In case your product pages lack this depth, you’re invisible to that harvest.

The Technical Audit: The place to Begin

Step 1: Establish AI Person Bot Visitors In Logs

Pull uncooked server logs (Apache/Nginx) and export all strains containing these consumer brokers: OAI-SearchBot and ChatGPT-Person, PerplexityBot and Perplexity-Person, Claude-SearchBot and Claude-Person. Then manually group hits by user-agent patterns and endpoints in a spreadsheet. To differentiate coaching bots from consumer bots, you’ll want to keep up your individual classification listing — one which modifications typically and isn’t standardized.

In JetOctopus Log Analyzer, this segmentation is inbuilt: filter by bot kind (coaching, search, and consumer) in just a few clicks and instantly see which pages AI consumer bots go to (your AI-visible content material, able to scale) versus pages coaching bots hit however consumer bots by no means attain (your highest-priority repair targets).

Step 2: Audit Technical Accessibility Of Deep Pages

Choose a pattern of deep URLs and verify HTML payload measurement, affirm key content material isn’t injected by way of JavaScript by viewing uncooked HTML, simulate crawl depth by counting clicks from the homepage, and check load time in Chrome DevTools or Lighthouse. Additionally verify whether or not vital content material sits behind accordions or “View Extra” parts — these require JavaScript execution that AI bots skip fully. For giant websites with 1000’s of deep pages, this sampling strategy misses loads. AI brokers don’t click on. If data solely seems after consumer interplay, it doesn’t exist for these crawlers.

Step 3: Clear Up Your Robots.txt

Open your robots.txt and overview all Disallow and Permit directives for each user-agent line by line. AI bots observe Disallow guidelines, so be sure you’re not unintentionally blocking vital URLs. Manually check key URLs to verify they aren’t blocked. A 30-minute audit right here can forestall you from blocking crawlers you need in, or exposing content material you’d moderately preserve out.

Step 4: Map Your Phantom Impressions

Export knowledge from GSC Efficiency studies filtered by impressions with zero clicks. Due to the 1,000-row UI restrict, you’ll want to make use of the GSC API or export in chunks by date and question, then merge datasets in spreadsheets or BigQuery. Additionally think about question frequency: lengthy queries showing each day are possible not fan-outs.

Join your GSC API to JetOctopus to bypass the row restrict and construct your Fan-Out Alternative Matrix mechanically — the precise questions AI brokers are asking about your content material, able to act on.

Step 5: Monitor The Adjustments

Arrange a recurring export course of — pull GSC knowledge month-to-month and evaluate impressions over time, re-run log evaluation scripts and diff bot exercise, monitor Core Internet Vitals individually in PageSpeed Insights or CrUX. You’ll find yourself stitching collectively a number of knowledge sources with no unified alerting, making it arduous to catch regressions early.

JetOctopus Alerts covers precisely this: unified notifications for modifications in AI bot exercise alongside Googlebot habits, Core Internet Vitals, on-page search engine optimisation points, and SERP efficiency drops, so that you catch regressions earlier than they compound.

The New KPI: Technical Accessibility

search engine optimisation in 2026 is restructuring round one constraint: can an AI agent crawl, attain, and extract a truth out of your 50,000th product web page in underneath 200 milliseconds?

If the reply isn’t any, your rankings, backlinks, and content material high quality turn out to be irrelevant for a rising share of search interactions. The machines are looking. The query is how shortly you’ll be able to see what’s really taking place.

Begin together with your logs. All the pieces else follows from there.

Need to see precisely how AI bots are interacting together with your website: which pages they attain, which they skip, and the place your fan-out alternatives are hiding? E book a live walkthrough of the JetOctopus platform. We’ll pull your precise log knowledge and present you what your GSC studies aren’t telling you.

Picture Credit

Featured Picture: Picture by JetOctopus. Used with permission.


#Technical #Search #Audit

Leave a Reply

Your email address will not be published. Required fields are marked *