Items tagged with: RobotsTxt

Search

Items tagged with: RobotsTxt


New page on seirdy.one: Scrapers I block (and allow), with explanations.

I’ve replaced all the comments in my robots.txt file with a more readable and detailed web page on scrapers I block. It includes info on the multiple blocking-approaches and criteria I use, commonly-blocked scrapers I allow, and more fact-checking than most of the more comprehensive alternatives.


#RobotsTxt #Scrapers #POSSE


New scraper just dropped (well, an old scraper was renamed):

Facebook/Meta updated its robots.txt entry for opting out of GenAI data scraping. If you blocked FacebookBot before, you should block meta-externalagent now:

User-Agent: meta-externalagent
Disallow: /

Official references:

#RobotsTxt #Scraper