Anthropic up to date its crawler documentation this week, clarifying how its Claude bots entry web sites and how one can block them.
- Anthropic’s doc explains what every bot does, the way it impacts AI coaching and search visibility, and tips on how to decide out by way of robots.txt.
Why we care. If you happen to publish or personal content material, you need management over how AI programs use it. Anthropic separates coaching crawlers, user-triggered fetches, and search indexing. Blocking one bot doesn’t block the others. Every alternative carries totally different visibility and coaching trade-offs.
The robots. Anthropic makes use of three separate consumer brokers:
- ClaudeBot collects public internet content material that could be used to coach and enhance Anthropic’s generative AI fashions. If you happen to block ClaudeBot in robots.txt, Anthropic mentioned it’ll exclude your web site’s future content material from AI coaching datasets.
- Claude-Consumer retrieves content material when a consumer asks Claude a query that requires entry to a webpage. If you happen to block Claude-Consumer, Anthropic can’t fetch your pages in response to consumer queries. The corporate says this may occasionally cut back your visibility in user-directed search responses.
- Claude-SearchBot crawls content material to enhance the standard and relevance of Claude’s search outcomes. If you happen to block Claude-SearchBot, Anthropic received’t index your content material for search optimization, which can cut back visibility and accuracy in Claude-powered search solutions.
Learn how to block them. The bots respect customary robots.txt directives, together with “Disallow” guidelines and the non-standard “Crawl-delay” extension, Anthropic mentioned. To dam a bot throughout your complete web site:
Consumer-agent: ClaudeBot
Disallow: /
- It’s essential to add directives for every bot and every subdomain you need to prohibit.
- IP blocking might not work reliably as a result of its bots use public cloud supplier IP addresses, Anthropic mentioned. Blocking these ranges might forestall the bot from accessing robots.txt. The corporate doesn’t publish IP ranges.
The doc. Does Anthropic crawl data from the web, and how can site owners block the crawler?
Search Engine Land is owned by Semrush. We stay dedicated to offering high-quality protection of promoting matters. Until in any other case famous, this web page’s content material was written by both an worker or a paid contractor of Semrush Inc.
#Anthropic #clarifies #Claude #bots #crawl #websites #block

