Complete Crawler List For AI User-Agents [Dec 2025]

AI visibility plays a crucial role for SEOs, and this starts with controlling AI crawlers. If AI crawlers can’t access your pages, you’re invisible to AI discovery engines.

On the flip side, unmonitored AI crawlers can overwhelm servers with excessive requests, causing crashes and unexpected hosting bills.

User-agent strings are essential for controlling which AI crawlers can access your website, but official documentation is often outdated, incomplete, or missing entirely. So, we curated a verified list of AI crawlers from our actual server logs as a useful reference.

Every user-agent is validated against official IP lists when available, ensuring accuracy. We will maintain and update this list to catch new crawlers and changes to existing ones.

The Complete Verified AI Crawler List (December 2025)

Name	Purpose	Crawl Rate of SEJ (pages/hour)	Verified IP List	Robots.txt disallow	Complete User Agent
GPTBot	AI training data collection for GPT models (ChatGPT, GPT-4o)	100	Official IP List	User-agent: GPTBot Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)
ChatGPT-User	AI agent for real-time web browsing when users interact with ChatGPT	2400	Official IP List	User-agent: ChatGPT-User Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
OAI-SearchBot	AI search indexing for ChatGPT search features (not for training)	150	Official IP List	User-agent: OAI-SearchBot Allow: / Disallow: /private-folder	Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot
ClaudeBot	AI training data collection for Claude models	500	Official IP List	User-agent: ClaudeBot Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Claude-User	AI agent for real-time web access when Claude users browse	<10	Not available	User-agent: Claude-User Disallow: /sample-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-User/1.0; +Claude-User@anthropic.com)
Claude-SearchBot	AI search indexing for Claude search capabilities	<10	Not available	User-agent: Claude-SearchBot Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-SearchBot/1.0; +https://www.anthropic.com)
Google-CloudVertexBot	AI agent for Vertex AI Agent Builder (site owners’ request only)	<10	Official IP List	User-agent: Google-CloudVertexBot Allow: / Disallow: /private-folder	Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Google-CloudVertexBot; +https://cloud.google.com/enterprise-search)
Google-Extended	Token controlling AI training usage of Googlebot-crawled content.			User-agent: Google-Extended Allow: / Disallow: /private-folder
Gemini-Deep-Research	AI research agent for Google Gemini’s Deep Research feature	<10	Official IP List	User-agent: Gemini-Deep-Research Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research; +https://gemini.google/overview/deep-research/) Chrome/135.0.0.0 Safari/537.36
Google	Gemini’s chat when a user asks to open a webpage	<10			Google
Bingbot	Powers Bing Search and Bing Chat (Copilot) AI answers	1300	Official IP List	User-agent: BingBot Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
Applebot-Extended	Doesn’t crawl but controls how Apple uses Applebot data.	<10	Official IP List	User-agent: Applebot-Extended Allow: / Disallow: /private-folder	Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
PerplexityBot	AI search indexing for Perplexity’s answer engine	150	Official IP List	User-agent: PerplexityBot Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Perplexity-User	AI agent for real-time browsing when Perplexity users request information	<10	Official IP List	User-agent: Perplexity-User Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)
Meta-ExternalAgent	AI training data collection for Meta’s LLMs (Llama, etc.)	1100	Not available	User-agent: meta-externalagent Allow: / Disallow: /private-folder	meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
Meta-WebIndexer	Used to improve Meta AI search.	<10	Not available	User-agent: Meta-WebIndexer Allow: / Disallow: /private-folder	meta-webindexer/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
Bytespider	AI training data for ByteDance’s LLMs for products like TikTok	<10	Not available	User-agent: Bytespider Allow: / Disallow: /private-folder	Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/)
Amazonbot	AI training for Alexa and other Amazon AI services	1050	Not available	User-agent: Amazonbot Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36
DuckAssistBot	AI search indexing for DuckDuckGo search engine	20	Official IP List	User-agent: DuckAssistBot Allow: / Disallow: /private-folder	DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html)
MistralAI-User	Mistral’s real-time citation fetcher for “Le Chat” assistant	<10	Not available	User-agent: MistralAI-User Allow: / Disallow: /private-folder	Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MistralAI-User/1.0; +https://docs.mistral.ai/robots)
Webz.io	Data extraction and web scraping used by other AI training companies. Formerly known as Omgili.	<10	Not available	User-agent: webzio Allow: / Disallow: /private-folder	webzio (+https://webz.io/bot.html)
Diffbot	Data extraction and web scraping used by companies all over the world.	<10	Not available	User-agent: Diffbot Allow: / Disallow: /private-folder	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)
ICC-Crawler	AI and machine learning data collection	<10	Not available	User-agent: ICC-Crawler Allow: / Disallow: /private-folder	ICC-Crawler/3.0 (Mozilla-compatible; ; https://ucri.nict.go.jp/en/icccrawler.html)
CCBot	Open-source web archive used as training data by multiple AI companies	<10	Official IP List	User-agent: CCBot Allow: / Disallow: /private-folder	CCBot/2.0 (https://commoncrawl.org/faq/)

The user-agent strings above have all been verified against Search Engine Journal server logs.

Popular AI Agent Crawlers With Unidentifiable User Agent

We’ve found that the following didn’t identify themselves:

you.com.
ChatGPT’s agent Operator.
Bing’s Copilot chat.
Grok.
DeepSeek.

There is no way to track this crawler from accessing webpages other than by identifying the explicit IP.

We set up a trap page (e.g., /specific-page-for-you-com/) and used the on-page chat to prompt you.com to visit it, allowing us to locate the corresponding visit record and IP address in our server logs. Below is the screenshot:

What About Agentic AI Browsers?

Unfortunately, AI browsers such as Comet or ChatGPT’s Atlas don’t differentiate themselves in the user agent string, and you can’t identify them in server logs and blend with normal users’ visits.

Chatgpt's Atlas browser user agetn string from server logs records — ChatGPT’s Atlas browser user agent string from server logs records (Screenshot by author, December 2025)

This is disappointing for SEOs because tracking agentic browser visits to a website is important for reporting POV.

How To Check What’s Crawling Your Server

Some hosting companies offer a user interface (UI) that makes it easy to access and look at server logs, depending on what hosting service you are using.

If your hosting doesn’t offer this, you can get server log files (usually located /var/log/apache2/access.log in Linux-based servers) via FTP or request it from your server support to send it to you.

Once you have the log file, you can view and analyze it in either Google Sheets (if the file is in CSV format), Screaming Frog’s log analyzer, or, if your log file is less than 100 MB, you can try analyzing it with Gemini AI.

How To Verify Legitimate Vs. Fake Bots

Fake crawlers can spoof legitimate user agents to bypass restrictions and scrape content aggressively. For example, anyone can impersonate ClaudeBot from their laptop and initiate crawl request from the terminal. In your server log, you will see it as Claudebot is crawling it:

curl -A 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)' https://example.com

Verification can help to save server bandwidth and prevent harvesting content illegally. The most reliable verification method you can apply is checking the request IP.

Check all IPs and scan to match if it’s one of the officially declared IPs listed above. If so, you can allow the request; otherwise, block.

Various types of firewalls can help you with this via allowlist verified IPs (which allows legitimate bot requests to pass through), and all other requests impersonating AI crawlers in their user agent strings are blocked.

For example, in WordPress, you can use Wordfence free plugin to allowlist legitimate IPs from the official lists (as above) and add blocking custom rules as below:

Block User agent setting in Wordfance — Block User agent setting in Wordfence

The allowlist rule is superior, and it will let legitimate crawlers pass through and block any impersonation request which comes from different IPs.

However, please note that it is possible to spoof an IP address, and in that case, when bot user agent and IPs are spoofed, you won’t be able to block it.

Conclusion: Stay In Control Of AI Crawlers For Reliable AI Visibility

AI crawlers are now part of our web ecosystem, and the bots listed here represent the major AI platforms currently indexing the web, although this list is likely to grow.

Check your server logs regularly to see what’s actually hitting your site and make sure you inadvertently don’t block AI crawlers if visibility in AI search engines is important for your business. If you don’t want AI crawlers to access your content, block them via robots.txt using the user-agent name.

We’ll keep this list updated as new crawlers emerge and update existing ones, so we recommend you bookmark this URL, or revisit this article on a regular basis to keep your AI crawler list up to date.

More Resources:

Featured Image: BestForBest/Shutterstock

#Complete #Crawler #List #UserAgents #Dec

The Complete Verified AI Crawler List (December 2025)

Popular AI Agent Crawlers With Unidentifiable User Agent

What About Agentic AI Browsers?

How To Check What’s Crawling Your Server

How To Verify Legitimate Vs. Fake Bots

Conclusion: Stay In Control Of AI Crawlers For Reliable AI Visibility

SocialSignalCounter

Leave a Reply Cancel reply

Login