Skip to main content


***** Controlling AI scraping *****

Cloudflare's plan to give its users ways to block and/or monetize AI scraping is interesting, but of course there are ethical and other reasons to avoid using Cloudflare, since they continue to support some of the most disreputable sites on the Net.

This does however suggest the concept of an open source mechanism to provide the same sorts of features broadly (e.g., in conjunction with Apache servers) to any sites, anywhere. This could be paired with a system to keep sites updated about discovered source IP addresses of AI scrapers that are not adhering to robots.txt directives. Sidenote: #Google announced an effort to expand robots.txt to better deal with AI scraping issues, a concept I had already earlier suggested. I signed up for this, but never heard another word about it since the earliest days.

Time to get serious about controlling AI scraping.

This entry was edited (3 weeks ago)