New page on seirdy.one
: Scrapers I block (and allow), with explanations.
I’ve replaced all the comments in my robots.txt file with a more readable and detailed web page on scrapers I block. It includes info on the multiple blocking-approaches and criteria I use, commonly-blocked scrapers I allow, and more fact-checking than most of the more comprehensive alternatives.
Scrapers I block (and allow), with explanations
Here’s my thought process when deciding whether to block a scraper from seirdy.one, the scrapers I block, the scrapers I allow, and the ways I block them.Seirdy’s Home
Seirdy reshared this.
Seirdy
in reply to Seirdy • • •Seirdy
in reply to Seirdy • • •solo
in reply to Seirdy • • •Seirdy likes this.
Seirdy
in reply to solo • • •solo
in reply to Seirdy • • •Seirdy
in reply to solo • • •solo
in reply to Seirdy • • •might want to edit it
you wrote
I assume here you intended to write "with no incentive for compliance"
Seirdy likes this.
Seirdy
in reply to solo • • •Seirdy
in reply to Seirdy • • •Seirdy
in reply to Seirdy • • •NoCache
to myX-Robots
and documented why.solo
in reply to Seirdy • • •yo, I'm looking into additional things to block in the
robots.txt
for my website (basing a decent bit of it off of yours, plus any additional stuff I find), and I felt like I'd want to just throw this your wayI'm personally planning to block everything from the first url, as well as the following from the second url
I would also like to note,nvm, I mis-remembered and thought you had blocked it due to it being AI-relatedBLEXBot
is listed on the second site as an "SEO Crawler" and it indicates that it does not believe it is AI-relatedI'll mention any other resources as I find them.
Agents | Dark Visitors
Dark VisitorsSeirdy
in reply to solo • • •solo
in reply to Seirdy • • •ah, I see
did not look at the things you cite lol
do you have some examples of things that are incorrect?
there are several on that list that would probably be good to block, which you don't block (unsure if they're actually used in practice anymore or if they're just historical), such as
Claude-Web
cohere-ai
anthropic-ai
there is also the
aiHitBot
one that I mentionedSeirdy
in reply to solo • • •Seirdy
in reply to Seirdy • • •