New page on seirdy.one: Scrapers I block (and allow), with explanations.

I’ve replaced all the comments in my robots.txt file with a more readable and detailed web page on scrapers I block. It includes info on the multiple blocking-approaches and criteria I use, commonly-blocked scrapers I allow, and more fact-checking than most of the more comprehensive alternatives.


#RobotsTxt #Scrapers #POSSE

Seirdy reshared this.

in reply to Seirdy

yo, I'm looking into additional things to block in the robots.txt for my website (basing a decent bit of it off of yours, plus any additional stuff I find), and I felt like I'd want to just throw this your way

I'm personally planning to block everything from the first url, as well as the following from the second url

  • all AI related tools
  • several of the "Intelligence Gatherers"
  • possibly several of the "Scrapers"

I would also like to note, BLEXBot is listed on the second site as an "SEO Crawler" and it indicates that it does not believe it is AI-related nvm, I mis-remembered and thought you had blocked it due to it being AI-related

I'll mention any other resources as I find them.

This entry was edited (8 months ago)