Skip to main content


New scraper just dropped (well, an old scraper was renamed):

Facebook/Meta updated its robots.txt entry for opting out of GenAI data scraping. If you blocked FacebookBot before, you should block meta-externalagent now:

User-Agent: meta-externalagent
Disallow: /

Official references:

#RobotsTxt #Scraper

This entry was edited (1 month ago)

Seirdy reshared this.

in reply to Seirdy

Obligatory “the W3C/EU effort for a standard TDM Reservation Protocol would solve this by letting sites globally opt out of datamining for purposes like this, without having to play robots.txt whack-a-mole”.

People really ought to support the NoAI and NoImageAI X-Robots tags in the meantime. They probably won’t, though. Unlike the TDM Reservation Protocol, there’s no legal incentive to do so; unlike noindex, there’s no self-interested reason to do so (noindex is often used on pages that don’t belong in search results, like duplicates and non-public pages).

This entry was edited (1 month ago)

Seirdy reshared this.