Enhance your expertise with Progress Memo’s weekly skilled insights. Subscribe for free!
For years, SEOs have operated on a easy assumption: The extra floor your content material covers, the extra doubtless it’s to floor in AI-generated answers. In truth, each “greatest observe” in basic website positioning content material pushes you towards extra: extra subtopics, extra sections, extra phrases. Construct the “final information.”
An evaluation of 815,000 query-page pairs throughout 16,851 queries and 353,799 pages says in any other case:
- Fan-out protection is almost irrelevant to quotation charges.
- Two alerts really predict whether or not ChatGPT cites your web page.
- Six concrete adjustments to your current content material library assist.
1. The Examine
AirOps ran 16,851 queries via ChatGPT 3 times every via the UI, capturing each fan-out sub-query, each URL searched, each quotation made, and each web page scraped. Oshen Davidson constructed the pipeline. I analyzed the information.
Every question generates a mean of two fan-out queries. ChatGPT retrieves roughly 10 URLs per sub-search, reads via them, then selects which ones to cite. We scored how properly every web page’s H2-H4 subheadings matched these fan-out queries utilizing cosine similarity on bge-base-en-v1.5 embeddings. That rating is what we name fan-out protection: the share of subtopics a web page addresses at a 0.80 similarity threshold. (The 0.80 similarity threshold cutoff was used to determine whether or not a subheading counts as a match to a fan-out question. Consider it as a relevance bar.)
The query: Do pages with larger fan-out protection get cited extra?
You’ll discover much more data within the co-written AirOps report.
2. Density Barely Strikes The Needle
Throughout 815,484 rows, the connection between fan-out protection and quotation is weak.
Masking 100% of subtopics provides 4.6 share factors over masking none. That hole shrinks additional once you management for question match (how properly the web page’s greatest heading matches the unique question). Amongst pages with sturdy question match (>= 0.80 cosine similarity):

Reasonable protection (26-50%) outperforms exhaustive protection. Pages that cowl every little thing rating decrease than pages that cowl 1 / 4 of the subtopics. The “final information” technique produces worse outcomes than a centered article that covers two to 3 associated angles properly.
3. What Really Predicts Quotation
These two alerts dominate: retrieval rank and question match.
1. Retrieval rank is the strongest predictor by a large margin. A web page at place 0 in ChatGPT’s internet search outcomes (the primary URL returned by its search software) has a 58% citation rate. By place 10, that drops to 14%. We ran every immediate 3 times consecutively for this evaluation, and pages cited in all three runs have a median retrieval rank of two.5. Pages by no means cited: median rank 13.

2. Question match (cosine similarity between the question and the web page’s greatest heading) is the strongest content material sign. Pages with a 0.90+ heading match have a 41% quotation price in comparison with the 30% price for pages beneath 0.50. Even amongst top-ranked pages (place 0-2), larger question match provides 19 share factors.
Fan-out protection, phrase depend, heading depend, area authority: all secondary. Some are flat. Some are inversely correlated.
4. The Wikipedia Exception
One web site sort breaks the sample. Wikipedia has the worst retrieval rank within the dataset (median 24) and the bottom question match rating (0.576). It nonetheless achieves the best quotation price: 59%.
Wikipedia pages common 4,383 phrases, 31 lists, and 6.6 tables. They’re encyclopedic within the literal sense. ChatGPT cites Wikipedia from deep within the search outcomes the place each different web site sort will get ignored.
That is density working as a sign, however at a scale no writer can replicate. Wikipedia’s content material is exhaustive, richly structured, and cross-linked throughout thousands and thousands of matters. A 3,000-word company weblog put up with 15 subheadings shouldn’t be the identical factor.
5. The Bimodal Actuality
58% of pages retrieved by ChatGPT on this dataset are by no means cited. 25% are all the time cited after they seem. Solely 17% fall in between.
The always-cited and never-cited teams look practically equivalent on most content material metrics: related phrase counts (~2,200), related heading counts (~20), related readability scores (~12 FK grade), related area authority (~54). The on-page alerts we are able to measure don’t separate winners from losers.
What separates them is retrieval rank. At all times-cited pages rank close to the highest after they floor. By no means-cited pages rank within the backside half. The retrieval system, no matter alerts it makes use of internally, is the gatekeeper. Every part else is a tiebreaker.
6. What This Means For Your Content material
Standard website positioning content material writing knowledge says cowl extra subtopics, add extra sections, construct density. The info says the traditional method produces “blended” pages, the 17% within the center that get cited generally and ignored different occasions.
Blended pages have the best phrase counts, probably the most headings, and the best area authority within the dataset. They’re the “final guides.” They’re additionally the least dependable performers in ChatGPT.
The pages that win constantly are centered. They:
- Match the question immediately of their headings,
- Are usually shorter (the quotation candy spot is 500-2,000 phrases), and
- Have sufficient construction (7-20 subheadings) to arrange the content material with out diluting it.
Construct the web page that’s the greatest reply to at least one query. Not the web page that adequately solutions 20.
Featured Picture: Tero Vesalainen/Shutterstock; Paulo Bobita/Search Engine Journal
#Shorter #Targeted #Content material #Wins #ChatGPT

