Your managed WordPress might be blocking AI bots and you can't see it

The whole lot appeared regular within the search engine optimisation information. Google Search Console, site visitors, and indexing — no purple flags. Then I opened Scrunch, our AI quotation monitoring instrument, and checked out platform-by-platform presence for searchinfluence.com over the prior 30 days:

Google AI Mode: 37.8%
Copilot: 22.2%
Google Gemini: 16.3%
ChatGPT: 9.6%
Perplexity: 7.8%
Claude: 0.0%
Meta AI: 0.0%

Two platforms at zero. Each crawler reads the identical website, so content material high quality and topical authority can’t account for that hole. They’re equivalent for each platform on the record.

What varies is entry — whether or not every platform’s crawler is allowed in. Nothing else explains how Google AI Mode hits 37.8% whereas Claude lands at 0%. So I opened the logs.

What 7 days of Cloudflare logs confirmed

Seven days of Cloudflare information (April 4-10) for searchinfluence.com revealed 29,099 bot requests, 65.8% of them AI bots. Right here’s a per-bot share of these requests rate-limited (HTTP 429, “too many requests”), damaged out by bot user-agent (UA, the identifier every request sends):

Amazonbot: 51% rate-limited
ClaudeBot: 29%
GPTBot: 29%
Bytespider: 61% blocked (completely different mechanism: 403/5xx, not 429)
ChatGPT-Consumer: 0%
PerplexityBot: 0%

The cut up isn’t random. Coaching crawlers, those that pull complete websites in large bursts, get throttled. Consumer-facing crawlers, those that fireplace human-paced requests throughout a dwell consumer question, don’t.

For context: Cloudflare’s Q1 2026 crawl-to-referral analysis exhibits ClaudeBot makes 20,583 crawl requests for each referral it sends again.

GPTBot: 1,255 to 1.
Perplexity: 111 to 1.
Google: 5 to 1.

AI coaching crawlers take excess of they offer again, so it is smart that internet hosting infrastructure has began preventing again. Whether or not that’s the correct battle to your website is a separate query.

The 429s in our logs had been being handed by Cloudflare with a cache standing of dynamic or bypass. So I wrote them off as downstream of Cloudflare, should be an internet utility firewall (WAF) or safety plugin. That assumption despatched me down a multi-hour rabbit gap by the incorrect layers.

Your customers search everywhere. Make sure your brand shows up.

The SEO toolkit you know, plus the AI visibility data you need.

Start Free Trial

Get started with

The place we appeared first, and why we had been incorrect

Suspect 1: Stable Safety’s HackRepair default ban record

A WordPress safety plugin we use for hardening, with a built-in bot UA blocklist. Toggled it off, ran a 24-hour earlier than/after on per-bot 429 counts. No change.

Two bots even spiked increased within the post-toggle window. Coincidental crawl bursts, not a regression. Dominated out.

Suspect 2: Stable Safety’s different firewall subsystems

24,538 firewall log entries over 30 days. Each single one was a /wp-login.php brute-force lockout. Zero entries for ClaudeBot, GPTBot, or Amazonbot. Guidelines empty. IP Administration clear. Dominated out.

Suspect 3: Sucuri Cloud WAF

SI has a Sucuri subscription. Logged into the portal and noticed warnings throughout each service column (Monitoring, Firewall/CDN, SSL, Backups). A dig and curl confirmed why: DNS resolved to Cloudflare ranges, and response headers confirmed no x-sucuri-id. Sucuri was by no means within the request path. The subscription existed; the activation by no means occurred. Dominated out.

Suspect 4: Cloudflare itself

Initially written off as a result of cache-status was dynamic/bypass. That inference was sloppy: Cloudflare can return 429 from rate-limit guidelines with the identical cache-status. Going again to the correct view (Safety → Analytics → Occasions tab, filtered by ClaudeBot UA, final 24 hours): zero occasions. Cloudflare took no safety motion on ClaudeBot in 24 hours whereas passing by 608 ClaudeBot 429s. Dominated out.

At that time, we had been out of suspects on layers we might see.

The copy check that modified every thing

We ran 60 quick curl requests with a ClaudeBot UA towards three completely different paths. 60 x 429 each time. Management runs: identical paths, browser UA → 60 x 200 (HTTP “OK”). Similar paths, Googlebot UA → 60 x 200. The block was unambiguously UA-based — not path-based, not rate-based.

The headers gave it away. A single curl -I confirmed x-powered-by: WP Engine. We had been on a managed host, and the block was firing from a layer that hadn’t been on the suspect record: the host’s personal platform infrastructure, sitting between Cloudflare and WordPress. The internet hosting platform itself.

The bot-by-bot fingerprint

As soon as we knew which query to ask, we ran the remainder of the AI bot UA record by the identical curl harness.

Bot UA	End result	Standing
ClaudeBot	60/60 x 429	Blocked
GPTBot	8/10 x 429, 2/10 cached (200)	Blocked
Amazonbot	10/10 x 429	Blocked
Bytespider	10/10 x 520	Blocked (520 is a Cloudflare-specific error: origin returned an invalid response, presumably IP-blackholed)
anthropic-ai (older Anthropic UA)	10/10 x 200	Not blocked
CCBot (Frequent Crawl)	10/10 x 200	Not blocked

Two findings:

The blocklist is dated: It targets the AI coaching crawler set as of mid-2024. The older anthropic-ai UA is allowed. CCBot, the Frequent Crawl bot that feeds many LLM coaching pipelines, is allowed. If the intent is “no LLM coaching information,” this hole defeats it. Scrapers can use CCBot’s UA, or Frequent Crawl can pull the location immediately, and the info results in coaching units anyway. The named-bot record is a fence with a gate left open.
Cached responses serve by the block: WP Engine’s edge cache returns cached pages to ClaudeBot simply advantageous (x-cache: HIT within the headers). Cache-miss requests hit the origin handler and get 429. This explains the Cloudflare information precisely: in 24 hours, 1,054 ClaudeBot requests returned 200 (cache hits) and 608 returned 429 (cache misses). Similar UA, identical website, two outcomes.

It’s value flagging that ~100% of our 24-hour “ClaudeBot” site visitors got here from a single Microsoft/Azure IP (AS8075, Microsoft’s community), not Anthropic’s revealed AWS ranges. Virtually actually, it’s a spoofed UA: a scraper on Azure pretending to be ClaudeBot. A significant slice of “AI crawler 429s” in WAF reviews could also be acceptable for blocking imposter site visitors, not authentic Anthropic crawl.

Why that is onerous to seek out

Begin with what WP Engine itself says about its firewall. From their support page on the security environment:

“Additional data can’t be supplied round our firewall, as this will compromise its safe integrity.”

That’s the corporate’s personal assertion, verbatim. Regardless of the guidelines are, prospects don’t get to see them.

Their 2025 Year in Review reviews 75 billion bot requests mitigated through Cloudflare-powered bot administration. No documented consumer portal management opts you out per-site or per-bot. I checked each customer-facing setting that would plausibly hearth AI bot 429s:

Utilities → Redirect bots: Off (default).
Internet guidelines: Empty.
Robots.txt setting: Not personalized. Stay /robots.txt solely disallows a couple of particular PDFs.

All clear. The block is someplace prospects can’t attain.

A couple of extra causes it’s invisible:

It returns 429, not 403: Returning “forbidden” can get a website flagged by serps as a site-wide failure, so 429 is the safer alternative. However 429 reads as “fee restrict” in each WAF analytics instrument, which sends investigators chasing rate-limit configurations on the incorrect layer.
It fires beneath the WAF plugins: Wordfence, Sucuri, and Stable Safety all log on the WordPress utility layer. WP Engine’s block fires on the platform edge, earlier than the request reaches WordPress. Plugin logs present nothing.
It fires beneath buyer Cloudflare, too: WP Engine runs its own Cloudflare-backed bot administration on the internet hosting edge. That’s a separate Cloudflare layer behind your personal Cloudflare zone. Occasions fired there don’t seem in your Cloudflare dashboard.
WP Engine’s billing already accounts for the block: They exclude “suspected bots” from billable metrics. From a hosting-cost perspective, the shopper advantages. From a GEO/AEO perspective, the shopper pays in quotation absence, with out ever figuring out they signed up.

What WP Engine confirmed once I requested

After a number of rounds of canned auto-replies, I reached a dwell agent. The related exchanges:

On the coverage itself:

“WP Engine does implement platform‑huge fee limiting on sure excessive‑influence bots to guard general server efficiency, and that half can’t be selectively disabled per bot.”

On whether or not the customer-facing Internet Guidelines Engine might route round it:

“Permitting AI bot IPs through Internet Guidelines Engine doesn’t override WP Engine’s platform-wide fee limiting guidelines, which function on the infrastructure stage.”

On whether or not the search engine optimisation draw back was acknowledged wherever internally:

“The documentation acknowledges that blocking or fee limiting bots like Amazonbot and related consumer brokers can influence their crawling and indexing… It emphasizes balancing bot administration with search engine optimisation issues and suggests prospects be empathetic as many didn’t configure these bots themselves.”

Learn that final bit twice. The interior framing assumes the shopper is being protected against bots they didn’t ask for. For businesses, content material websites, B2B SaaS, and anybody whose progress is dependent upon AI search citations, the idea inverts. These bots are the viewers the shopper is attempting to succeed in.

There’s an escalation path:

“When you’ve got an distinctive use case or want a bot to behave otherwise than the platform defaults enable, we are able to escalate it to ProdEng (product engineering) for assessment.”

So the coverage isn’t immutable. It’s simply not a self-service setting.

WP Engine seems to be the outlier right here

We assumed each managed host did this. Public document on the opposite three top-managed WordPress hosts contradicts that:

Kinsta’s CTO stated in March 2026 that they will not block at the platform level and won’t invoice for bot bandwidth. Their Bot Protection feature is opt-in, with 4 customer-controllable ranges.
Pressable explicitly states in its knowledge base: “Pressable doesn’t at present disallow these bots by default.” Buyer manages it through robots.txt.
Pantheon explicitly states: “We don’t block recognized bot site visitors from getting into the platform.” They detect and exclude bots from billing solely.

Outdoors managed WP, the closest analog is SiteGround, which blocks training crawlers by default however is extra clear in regards to the coverage and distinguishes coaching bots from user-action bots.

One wrinkle: Flywheel, a managed WP host owned by WP Engine since 2019, has no documented AI bot block. Similar dad or mum firm, two merchandise, two completely different said insurance policies. Not a corporate-wide stance. A product-level determination particular to WP Engine.

Caveat on the comparability: we confirmed WP Engine’s block empirically with curl. We didn’t run the identical diagnostic towards Kinsta, Pressable, or Pantheon. What now we have for them is their public documentation, which is dependable however not the identical as a dwell check.

The exact declare: based mostly on what every host publicly discloses, WP Engine seems to be the one top-tier managed WP host with a default-on, non-disableable platform-level AI bot block.

The query shifts. It’s not “Are different hosts doing this?” It’s “Why is WP Engine, and apparently solely WP Engine, doing it this manner?”

Easy methods to verify whether or not it’s occurring to you

The usual audit recommendation to your WAF logs doesn’t catch this. Beneath are three steps that don’t require root entry.

Step 1: Reproduce with curl (a command-line instrument that fetches URLs)

for i in $(seq 1 30); do
  curl -sI -A "ClaudeBot/1.0 (+https://www.anthropic.com/claudebot)" 
    "https://yourdomain.com/" 
    -o /dev/null -w "%{http_code}n"
  sleep 0.05
completed | kind | uniq -c

Then run the identical loop with a Mozilla/5.0 … browser UA. If the browser run returns 200s and the ClaudeBot run returns 429s, the block is UA-based and somebody in your stack is doing it. If each return the identical code, you don’t have this downside.

Step 2: Establish your host

Run curl -I https://yourdomain.com/ and take a look at the response headers for x-powered-by or server. They usually title the host (WP Engine, Pressable, Kinsta, and so on.). In case your host is unmanaged or self-hosted, this text seemingly doesn’t apply. Examine your WAF as an alternative.

Step 3: Verify what the host truly controls

For WP Engine particularly, verify Utilities > Redirect Bots is off and that Internet Guidelines has no AI UA blocks, then open a help ticket. Right here’s really helpful wording:

“We’ve reproduced through curl that requests with ClaudeBot/GPTBot/Amazonbot user-agent strings obtain HTTP 429 responses for cache-miss requests on our surroundings. Cloudflare and our safety plugins are usually not the supply. Is that this WP Engine’s platform-level AI crawler mitigation? Can or not it’s disabled or scoped per-bot for our surroundings?”

For different hosts, the equal path is their portal’s safety part first, then a help ticket with the identical proof.

What to do as soon as you recognize

4 actual choices, so as of effort.

Escalate to your host’s product engineering

WP Engine’s help agent named an “distinctive use case” escalation path. The coverage isn’t immutable; it’s simply not a self-service toggle. search engine optimisation and AI search visibility is precisely the form of case that the escalation path is constructed for.

Allowlist through the customer-controllable Internet Guidelines Engine

WREn enables you to allowlist UAs on the website stage, however the help agent confirmed it doesn’t override the platform guidelines. It’s helpful for the bots not on the platform record (CCBot, anthropic-ai), however not a repair for those which might be.

Transfer to a number that doesn’t impose this

A nuclear possibility, however value costing out if AI search visibility is a strategic precedence and ProdEng escalation goes nowhere. Kinsta’s and Pressable’s documented stances each depart AI crawler entry to the shopper.

And to be clear: AI search visibility completely ought to be a strategic precedence proper now. ChatGPT alone handles billions of queries per week, and the solutions cite a small set of sources. In case your class is being determined in these solutions and your website can’t be crawled, you don’t get cited.

There is no such thing as a “I’ll simply rank later” backup plan, as a result of the quotation set hardens quick. Treating AI entry as optionally available in 2026 is similar name as treating natural search as optionally available in 2008. It labored for some time. Then it didn’t.

Settle for the block as a deliberate coverage

Some corporations will conclude that staying out of AI coaching information is the correct name. The sincere model: inform the group that’s what’s occurring, issue it into AI-search expectations, cease working GEO/AEO audits that rating you on lacking citations you weren’t going to get anyway.

The incorrect transfer is to maintain working the WAF audit playbook and concluding that nothing’s incorrect. The block fires invisibly, and the quotation’s absence exhibits up months later in dashboards that nobody connects again to it.

The quotation correlation

Googlebot ~100% entry → Google AI Mode 37.8% quotation presence
GPTBot 54% entry → ChatGPT 9.6%
PerplexityBot 100% entry → Perplexity 7.8%
ClaudeBot 57% entry → Claude 0.0%

The platform-by-platform cut up in citations matches the platform-by-platform cut up in crawl entry. The place the bot can learn the location, the AI cites it at significant charges. The place the bot is blocked, quotation presence collapses.

Suggestive, not proof: 7-day correlation on a single website, no managed earlier than/after. Half 2 publishes the post-fix numbers if we get the block lifted (or transfer hosts). The instinct: crawl entry is the ground; content material high quality, topical authority, and freshness are the ceiling. If the bot can’t learn you, the ceiling doesn’t matter.

Perplexity is the wrinkle: 100% entry, 7.8% quotation. Full entry alone doesn’t assure quotation. However the absence of entry (Claude at 0%) is decisive.

Caveats

Single-site case research: The diagnostic generalizes; the precise numbers don’t.
AI quotation is multi-factor: Content material high quality, topical authority, entity protection, freshness, schema, model recognition: all of these matter. Crawl entry is the ground, not the entire sport.
Bot UAs could be spoofed: Roughly 100% of our “ClaudeBot” site visitors was from a non-Anthropic IP. The host-level block is doing the correct factor for these impostors.
AI bots don’t absolutely respect crawl-delay: InMotion’s coverage is an effective reference: GPTBot and ClaudeBot solely partially honor crawl-delay in robots.txt, so the 429 is without doubt one of the few alerts they really act on. That’s a function, till they enhance crawl-delay compliance.
WP Engine’s defaults aren’t malicious: They’re defending prospects who didn’t ask for AI bot site visitors. The opacity is the problem, not the intent. Clients who do need the site visitors ought to have a technique to say so with out escalating to product engineering.

See the complete picture of your search visibility.

Track, optimize, and win in Google and AI search from one platform.

Start Free Trial

Get started with

What it’s best to do subsequent

If you happen to’re on WP Engine, run the diagnostic above. If the curl copy exhibits the identical sample, you’ve acquired the identical situation. Open a ticket and see the place that goes, or swap suppliers.

If you happen to’re on a unique managed host, run it anyway. The diagnostic takes three minutes.

If you happen to’re spending months on content material updates, schema markup, and llms.txt recordsdata whereas a default-on platform setting is silently blocking the crawlers you’re attempting to succeed in, you’re optimizing the ceiling of a constructing with no ground.

Full disclosure on technique: An AI assistant (Claude) ran the curl checks, parsed headers, and walked the structure with me. The place this piece says “we” examined or reproduced one thing, that’s me plus the AI. The place it says “I,” it was me immediately: portal logins, the WP Engine help chat.

Contributing authors are invited to create content material for Search Engine Land and are chosen for his or her experience and contribution to the search neighborhood. Our contributors work underneath the oversight of the editorial staff and contributions are checked for high quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not requested to make any direct or oblique mentions of Semrush. The opinions they categorical are their very own.

#managed #WordPress #blocking #bots

What 7 days of Cloudflare logs confirmed

The place we appeared first, and why we had been incorrect

Suspect 1: Stable Safety’s HackRepair default ban record

Suspect 2: Stable Safety’s different firewall subsystems

Suspect 3: Sucuri Cloud WAF

Suspect 4: Cloudflare itself

The copy check that modified every thing

The bot-by-bot fingerprint

Why that is onerous to seek out

What WP Engine confirmed once I requested

WP Engine seems to be the outlier right here

Easy methods to verify whether or not it’s occurring to you

Step 1: Reproduce with curl (a command-line instrument that fetches URLs)

Step 2: Establish your host

Step 3: Verify what the host truly controls

What to do as soon as you recognize

Escalate to your host’s product engineering

Allowlist through the customer-controllable Internet Guidelines Engine

Transfer to a number that doesn’t impose this

Settle for the block as a deliberate coverage

The quotation correlation

Caveats

What it’s best to do subsequent

SocialSignalCounter

Leave a Reply Cancel reply

Login