New scraper just dropped (well, an old scraper was renamed):
Facebook/Meta updated its robots.txt entry for opting out of GenAI data scraping. If you blocked FacebookBot before, you should block meta-externalagent now:
User-Agent: meta-externalagent
Disallow: /Official references:
- Facebook developer documentation for
FacebookBotno longer mentions GenAI. - Facebook developer documentation for web crawlers, including
Meta-ExternalAgentmentions “AI”.
Meta Web Crawlers - Sharing - Documentation - Meta for Developers
This page lists the User Agent (UA) strings that identify Meta’s most common web crawlers and what each of those crawlers are used for.developers.facebook.com
Seirdy reshared this.

)
Seirdy
in reply to Seirdy • • •Obligatory “the W3C/EU effort for a standard TDM Reservation Protocol would solve this by letting sites globally opt out of datamining for purposes like this, without having to play robots.txt whack-a-mole”.
People really ought to support the
NoAIandNoImageAIX-Robots tags in the meantime. They probably won’t, though. Unlike the TDM Reservation Protocol, there’s no legal incentive to do so; unlikenoindex, there’s no self-interested reason to do so (noindexis often used on pages that don’t belong in search results, like duplicates and non-public pages).Seirdy reshared this.