
tl;dr
Perplexity AI's web crawlers ignored explicit blocks from tens of thousands of websites, leading Cloudflare to delist Perplexity from its verified bot program and block its deceptive scraping practices. Perplexity used stealth tactics like generic browser user-agents, rotating IP addresses, and auto...
Perplexity AI's web crawlers continued accessing content from tens of thousands of websites despite explicit blocks, according to Cloudflare. This prompted Cloudflare to delist Perplexity from its verified bot program and implement blocks against what it deemed deceptive scraping practices. Founded in 2022 by former AI and tech experts, Perplexity recently raised $100 million, valuing the company at $18 billion.
The conflict escalated after Cloudflare customers reported that Perplexity ignored robots.txt directives and firewall rules designed to block their declared crawlers. Cloudflare engineers confirmed that while Perplexity’s declared crawlers were blocked, the company switched to stealth tactics including using generic browser user-agents that impersonate Google Chrome on macOS.
These undeclared crawlers employed sophisticated evasion techniques such as rotating through IP addresses not listed in Perplexity’s official range and switching across different autonomous system numbers to bypass blocks. Perplexity’s declared crawlers generate 20-25 million daily requests, while stealth crawlers add an extra 3-6 million, affecting tens of thousands of domains and millions of requests each day.
Cloudflare CEO Matthew Prince emphasized the unsustainable extraction of web content by AI companies, highlighting a sharp decline in search traffic referrals as users increasingly rely on AI summaries. He revealed that AI companies crawl vastly more pages per visitor compared to traditional search engines, with OpenAI and Anthropic showing deteriorating visitor-to-crawl ratios.
In response, Cloudflare launched "Content Independence Day," defaulting to blocking AI crawlers on new domains and empowering over a million websites—including major publishers like The Associated Press and BuzzFeed—to block unwanted crawlers. Cloudflare insists crawlers must be transparent, purposeful, and respectful of website directives.
Contrasting Perplexity’s tactics, Cloudflare praised OpenAI for respecting robots.txt rules and halting crawling when blocked. To combat deceptive crawling, Cloudflare implemented signature-based blocks for stealth crawlers accessible to all customers and is developing innovative tools like an "AI Labyrinth" that traps non-compliant bots and a "pay-per-crawl" marketplace allowing publishers to monetize content access by AI companies.