Bing AI Citation Tracking, Hidden HTTP Homepages & Pages Fall Under Crawl Limit

Welcome to the week’s Pulse for SEO: updates cover how you track AI visibility, how a ghost page can break your site name in search results, and what new crawl data reveals about Googlebot’s file size limits.

Here’s what matters for you and your work.

Bing Webmaster Tools Adds AI Citation Dashboard

Microsoft introduced an AI Performance dashboard in Bing Webmaster Tools, giving publishers visibility into how often their content gets cited in Copilot and AI-generated answers. The feature is now in public preview.

Key Facts: The dashboard tracks total citations, average cited pages per day, page-level citation activity, and grounding queries. Grounding queries show the phrases AI used when retrieving your content for answers.

Why This Matters

Bing is now offering a dedicated dashboard for AI citation visibility. Google includes AI Overviews and AI Mode activity in Search Console’s overall Performance reporting, but it doesn’t break out a separate report or provide citation-style URL counts. AI Overviews also assign all linked pages to a single position, which limits what you can learn about individual page performance in AI answers.

Bing’s dashboard goes further by tracking which pages get cited, how often, and what phrases triggered the citation. The missing piece is click data. The dashboard shows when your content is cited, but not whether those citations drive traffic.

Now you can confirm which pages are referenced in AI answers and identify patterns in grounding queries, but connecting AI visibility to business outcomes still requires combining this data with your own analytics.

What SEO Professionals Are Saying

Wil Reynolds, founder of Seer Interactive, celebrated the feature on X and focused on the new grounding queries data:

“Bing is now giving you grounding queries in Bing Webmaster tools!! Just confirmed, now I gotta understand what we’re getting from them, what it means and how to use it.”

Koray Tuğberk GÜBÜR, founder of Holistic SEO & Digital, compared it directly to Google’s tooling on X:

“Microsoft Bing Webmaster Tools has always been more useful and efficient than Google Search Console, and once again, they’ve proven their commitment to transparency.”

Fabrice Canel, principal product manager at Microsoft Bing, framed the launch on X as a bridge between traditional and AI-driven optimization:

“Publishers can now see how their content shows up in the AI era. GEO meets SEO, power your strategy with real signals.”

The reaction across social media centered on a shared frustration. This is the data practitioners have been asking for, but it comes from Bing rather than Google. Several people expressed hope that Google and OpenAI would follow with comparable reporting.

Read our full coverage: Bing Webmaster Tools Adds AI Citation Performance Data

Hidden HTTP Homepage Can Break Your Site Name In Google

Google’s John Mueller shared a troubleshooting case on Bluesky where a leftover HTTP homepage was causing unexpected site-name and favicon problems in search results. The issue is easy to miss because Chrome can automatically upgrade HTTP requests to HTTPS, hiding the problematic page from normal browsing.

Key Facts: The site used HTTPS, but a server-default HTTP homepage was still accessible. Chrome’s auto-upgrade meant the publisher never saw the HTTP version, but Googlebot doesn’t follow Chrome’s upgrade behavior, so Googlebot was pulling from the wrong page.

Why This Matters

This is the kind of problem you wouldn’t find in a standard site audit because your browser never shows it. If your site name or favicon in search results doesn’t match what you expect, and your HTTPS homepage looks correct, the HTTP version of your domain is worth checking.

Mueller suggested running curl from the command line to see the raw HTTP response without Chrome’s auto-upgrade. If it returns a server-default page instead of your actual homepage, that’s the source of the problem. You can also use the URL Inspection tool in Search Console with a Live Test to see what Google retrieved and rendered.

Google’s documentation on site names specifically mentions duplicate homepages, including HTTP and HTTPS versions, and recommends using the same structured data for both. Mueller’s case shows what happens when an HTTP version contains content different from the HTTPS homepage you intended.

What People Are Saying

Mueller described the case on Bluesky as “a weird one,” noting that the core problem is invisible in normal browsing:

“Chrome automatically upgrades HTTP to HTTPS so you don’t see the HTTP page. However, Googlebot sees and uses it to influence the sitename & favicon selection.”

The case highlights a pattern where browser features often hide what crawlers see. Examples include Chrome’s auto-upgrade, reader modes, client-side rendering, and JavaScript content. To debug site name and favicon issues, check the server response directly, not just browser loadings.

Read our full coverage: Hidden HTTP Page Can Cause Site Name Problems In Google

New Data Shows Most Pages Fit Well Within Googlebot’s Crawl Limit

New research based on real-world webpages suggests most pages sit well below Googlebot’s 2 MB fetch cutoff. The data, analyzed by Search Engine Journal’s Roger Montti, draws on HTTP Archive measurements to put the crawl limit question into practical context.

Key Facts: HTTP Archive data suggests most pages are well below 2 MB. Google recently clarified in updated documentation that Googlebot’s limit for supported file types is 2 MB, while PDFs get a 64 MB limit.

Why This Matters

The crawl limit question has been circulating in technical SEO discussions, particularly after Google updated its Googlebot documentation earlier this month.

The new data answers the practical question that documentation alone couldn’t. Does the 2 MB limit matter for your pages? For most sites, the answer is no. Standard webpages, even content-heavy ones, rarely approach that threshold.

Where the limit could matter is on pages with extremely bloated markup, inline scripts, or embedded data that inflates HTML size beyond typical ranges.

The broader pattern here is Google making its crawling systems more transparent. Moving documentation to a standalone crawling site, clarifying which limits apply to which crawlers, and now having real-world data to validate those limits gives a clearer picture of what Googlebot handles.

What Technical SEO Professionals Are Saying

Dave Smart, technical SEO consultant at Tame the Bots and a Google Search Central Diamond Product Expert, put the numbers in perspective in a LinkedIn post:

“Googlebot will only fetch the first 2 MB of the initial html (or other resource like CSS, JavaScript), which seems like a huge reduction from 15 MB previously reported, but honestly 2 MB is still huge.”

Smart followed up by updating his Tame the Bots fetch and render tool to simulate the cutoff. In a Bluesky post, he added a caveat about the practical risk:

“At the risk of overselling how much of a real world issue this is (it really isn’t for 99.99% of sites I’d imagine), I added functionality to cap text based files to 2 MB to simulate this.”

Google’s John Mueller endorsed the tool on Bluesky, writing:

“If you’re curious about the 2MB Googlebot HTML fetch limit, here’s a way to check.”

Mueller also shared Web Almanac data on Reddit to put the limit in context:

“The median on mobile is at 33kb, the 90-percentile is at 151kb. This means 90% of the pages out there have less than 151kb HTML.”

Roger Montti, writing for Search Engine Journal, reached a similar conclusion after reviewing the HTTP Archive data. Montti noted that the data based on real websites shows most sites are well under the limit, and called it “safe to say it’s okay to scratch off HTML size from the list of SEO things to worry about.”

Read our full coverage: New Data Shows Googlebot’s 2 MB Crawl Limit Is Enough

Theme Of The Week: The Diagnostic Gap

Each story this week points to something practitioners couldn’t see before, or checked the wrong way.

Bing’s AI citation dashboard fills a measurement gap that has existed since AI answers started citing website content. Mueller’s HTTP homepage case reveals an invisible page that standard site audits and browser checks would miss entirely because Chrome hides it. And the Googlebot crawl limit data answers a question that documentation updates raised, but couldn’t resolve on their own.

The connecting thread isn’t that these are new problems. AI citations have been happening without measurement tools. Ghost HTTP pages have been confusing site name systems since Google introduced the feature. And crawl limits have been listed in Google’s docs for years without real-world validation. What changed this week is that each gap got a concrete diagnostic: a dashboard, a curl command, and a dataset.

The takeaway is that the tools and data for understanding how search engines interact with your content are getting more specific. The challenge is knowing where to look.

More Resources:

Featured Image: Accogliente Design/Shutterstock

#Bing #Citation #Tracking #Hidden #HTTP #Homepages #amp #Pages #Fall #Crawl #Limit

Bing Webmaster Tools Adds AI Citation Dashboard

Why This Matters

What SEO Professionals Are Saying

Hidden HTTP Homepage Can Break Your Site Name In Google

Why This Matters

What People Are Saying

New Data Shows Most Pages Fit Well Within Googlebot’s Crawl Limit

Why This Matters

What Technical SEO Professionals Are Saying

Theme Of The Week: The Diagnostic Gap

SocialSignalCounter

Leave a Reply Cancel reply

Login