how web.run and fan-out queries shape AI visibility

how web.run and fan-out queries shape AI visibility

When OpenAI switched default fashions on March 4, the variety of web sites cited per response dropped by a fifth, and by no means recovered. However the quotation drop is simply a part of the story.

We additionally reverse-engineered ChatGPT’s inside searching instruments, ran a honeypot experiment, reconstructed its system immediate, and launched a brand new model of our ChatGPT Search Seize plugin.

What occurred

On March 4, ChatGPT switched its default mannequin from GPT-4o/5.2 to GPT-5.3 Prompt. The outcome: the typical variety of distinctive domains cited per response dropped from 19 to fifteen, a decline of greater than 20%.

Distinctive URLs per response adopted the identical trajectory, falling from 24 to 19. We tracked 400 every day prompts over 14 weeks, utilizing monitoring information offered by Meteoria.

Why we care

ChatGPT has 900 million weekly energetic customers. The quotation floor in every response hasn’t modified, however fewer web sites are sharing it. Identical pie, fewer slices.

This possible displays a structural shift towards higher-authority sources, nevertheless it additionally means fewer winners total. Websites that don’t make the reduce are shedding visibility that was beforehand inside attain.

We named this phenomenon after the “Bigfoot update” (recognized by Dr. Peter J. Meyers of Moz in 2012), when Google would generally let a single area occupy your entire first web page of outcomes.

ChatGPT now retrieves fewer domains per response, however the URL-to-domain ratio has remained steady at 1.26. Crawl depth per area hasn’t modified. What has modified is what number of distinct web sites get a seat on the desk.

GPT-5.4 Considering amplifies the focus additional. The mannequin makes use of “web site:” operators to limit searches to trusted domains and distributes its queries throughout usually greater than 10 “fan-out queries” per response, every concentrating on a particular supply.

Impartial log evaluation by Jérôme Salomon (Oncrawl) confirms the development. ChatGPT-Person bot crawl quantity has settled at a decrease degree for the reason that change to five.3. Some pages merely aren’t being crawled anymore.

The trigger goes past mannequin updates: greater than 90% of ChatGPT’s weekly customers are on the free plan, and the default expertise triggers fewer internet searches, makes use of fewer queries, and produces fewer citations.

How ChatGPT Search truly works

Our research additionally features a full reverse engineering of ChatGPT’s inside search system, referred to as internet.run. Earlier than 5.3, the mannequin despatched compact textual content instructions separated by pipes (quick|question|recency). After 5.3, it sends structured JSON objects with typed parameters.

This isn’t only a format change. It displays a unique structure in how the mannequin formulates and distributes its internet operations.

The net instrument now helps 12 operations, up from 4 (plus a separate widget system referred to as genui). These embrace:

  • search_query
  • open
  • discover
  • click on
  • screenshot
  • product_query
  • Specialised widgets for sports activities, finance, climate, and extra.

GPT-5.4 can chain 5 to greater than 10 rounds of search per response, refining queries based mostly on earlier outcomes. GPT-5.3 Prompt usually runs 2 or 3.

Google’s fingerprints are nonetheless seen: Google monitoring markers (strlid) seem in product URLs, and SearchAPI ID-to-token matches reveal the backend’s reliance on third-party search suppliers — and Google behind the scenes.

A brand new sort of fan-out for product queries

We uncovered a beforehand undocumented fan-out sort: browse_rewritten_queries. It seems solely on product queries, on 5.4 Prompt, and is seen in dialog code.

When a person asks one thing like [best 3D printer to buy in 2026] ChatGPT first runs a single rewrite fan-out to construct the total record of candidate merchandise. Then it launches a separate purchasing fan-out for every particular person product, fetching specs, opinions, and pricing one after the other.

Earlier than 5.3, product searches had been bundled right into a single name. Every product now will get its personal devoted retrieval command.

ChatGPT-Person is the retrieval agent

Our honeypot experiment confirmed an necessary element. When ChatGPT browses the net following a search throughout a dialog, the ChatGPT-Person crawler — not OAI-SearchBot — fetches the web page content material.

OpenAI describes OAI-SearchBot because the agent that builds ChatGPT’s search index, however in observe, the mannequin depends on third-party scraping APIs for search outcomes, then sends ChatGPT-Person to retrieve the precise content material from chosen URLs.

The namespace blind spot

This can be our most shocking discovering.

The path began with traditional reverse engineering. We decompiled the ChatGPT cellular app, dissected the net shopper supply code, and sniffed community packets on each platforms. That gave us the names of inside instruments and a few calling conventions.

Armed with these specifics, we had been in a position to ask ChatGPT the fitting questions, and found the mannequin answered with none restrictions.

OpenAI has actual safeguards round its system prompts. However the inside instrument configuration layer has none.

ChatGPT’s namespaces — the teams of inside instruments the mannequin can name throughout a dialog — are freely describable. So long as you keep away from the phrases “system immediate,” the mannequin will disclose instrument schemas, operation lists, output channels, and namespace constructions with excellent consistency.

We revealed ready-to-use prompts that anybody can paste into ChatGPT to audit its inside atmosphere. To confirm that the mannequin wasn’t hallucinating these descriptions, we ran a participatory research with dozens of customers throughout separate periods. Each participant obtained precisely the identical instrument names, parameter schemas, and operation lists. The mannequin constantly and reliably describes its personal tooling.

The research additionally features a reconstructed system immediate extracted progressively, together with a number of notable findings:

  • Reddit is the one area exempted from copyright phrase limits.
  • There’s a granular record of banned merchandise.
  • A “verbosity rating” operates on a 1–10 scale.
  • A full promoting coverage paragraph governs advert show by subscription tier.

Sensible use: operating your individual crawlability audit

The net.run syntax we documented isn’t only a technical curiosity. It really works, and it opens a direct path for testing how ChatGPT interacts together with your content material.

Right here’s a concrete instance. You may power ChatGPT to go looking your area and browse particular pages by pasting JSON instructions straight right into a dialog. First, set off a focused search in your web site, then power it to fetch the primary two outcomes, then ask it to return the title, foremost matter, and key factors from every web page.

"Seek for this question, then open the primary two outcomes and summarize what you discover on every web page.

Step 1: Search:

{ “search_query”: [ { “q”: “site:abondance.com seo” } ], “response_length”: “quick” }

Step 2: Open the primary two outcomes:

{ “open”: [ { “ref_id”: “turn0search0” }, { “ref_id”: “turn0search1” } ] }

Step 3: Give me a structured recap of what you discovered on every URL. For every web page: the title, the principle matter, and three–5 key factors."

What you get is a view of your content material by means of ChatGPT’s eyes: what it may possibly truly attain, what it extracts, and the way it interprets your pages.

If ChatGPT can’t entry a web page, returns garbled content material, or fully misses your foremost messages, that’s a sign to behave on.

Identical mannequin household, completely different citations

GPT-5.2, 5.3, and 5.4 share the identical information cutoff (August 2025) and belong to the identical GPT-5 household. But the identical immediate despatched to every produces completely different fan-out queries, retrieves completely different sources, and surfaces completely different passages within the remaining response.

A number of layers of divergence come into play after pre-training: RLHF reward shaping, supervised fine-tuning information, system immediate configurations, and inference-time compute budgets. GPT-5.4 Professional explicitly will get extra compute to “assume more durable,” and that alone can change which sources are cited.

Because of this we advocate testing mannequin by mannequin. A single immediate can produce radically completely different citations relying on whether or not the person is on GPT-5.3 Prompt, 5.4 Considering, or 5.4 Prolonged. Free-plan customers can also be silently routed to a lighter mannequin.

Two kinds of AI visibility

Our research introduces a framework that separates parametric visibility (what the mannequin learns from coaching information with search disabled) from dynamic visibility (what it retrieves in actual time with search enabled).

  • Parametric visibility: E-E-A-T for LLMs. Parametric visibility is the E-E-A-T equal for big language fashions. It’s authority encoded throughout billions of coaching examples, formed by press protection, Wikipedia presence, different high-authority websites, and the general coaching corpus. It’s steady and measurable by means of one-shot API audits.
  • Dynamic visibility: shifting floor. Dynamic visibility is risky. It’s model-dependent and requires steady monitoring. It’s nearer to conventional website positioning, and may collapse in a single day with a mannequin replace, because the Bigfoot Impact exhibits.
  • The hyperlink between the 2 issues. The mannequin formulates its internet queries by concentrating on sources it already is aware of. A model absent from parametric reminiscence received’t even be thought of as a search candidate. Being unknown to the mannequin means being invisible earlier than the search even begins.

Information cutoff updates are the “Google Dance” of LLMs. When the cutoff date adjustments, parametric rankings are redistributed in bulk. However this solely occurs roughly annually, as a result of retraining at that scale is extraordinarily costly. The strategic window for influencing what the mannequin is aware of about your model sits between two cutoff dates.

Dan Petrovic’s (DEJAN) AI Model Authority Index illustrates parametric measurement at scale. Our research enhances it with a lighter, reproducible testing framework based mostly on 5 prompts run a number of occasions for a one-shot audit.

Dig deeper

The total research — together with reverse-engineered documentation, the honeypot experiment, DIY audit prompts, and the reconstructed system immediate — is out there at think.resoneo.com/chatgpt/5.3-5.4/.

Backside line

ChatGPT Search is not a black field. This research maps its inside structure, from the net.run instrument that powers each search to the fan-out logic that decides which domains are fetched and that are ignored.

The 20% drop in cited domains after the change to five.3 exhibits how briskly the quotation panorama can shift with a single mannequin replace. However the deeper situation is structural: ChatGPT is concentrating citations on fewer web sites and making use of supply choice logic formed by coaching information, post-training fine-tuning, and system immediate guidelines that change from one mannequin to the following.

Monitoring visibility in ChatGPT means understanding two distinct layers (parametric and dynamic), testing throughout a number of fashions, and monitoring a system whose inside instruments are documentable however whose habits can change in a single day.

The total research supplies the information, methodology, and instruments to get began.

Contributing authors are invited to create content material for Search Engine Land and are chosen for his or her experience and contribution to the search group. Our contributors work below the oversight of the editorial staff and contributions are checked for high quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not requested to make any direct or oblique mentions of Semrush. The opinions they specific are their very own.


#internet.run #fanout #queries #form #visibility

Leave a Reply

Your email address will not be published. Required fields are marked *