New scraper just dropped (well, an old scraper was renamed):
Facebook/Meta updated its robots.txt entry for opting out of GenAI data scraping. If you blocked FacebookBot before, you should block meta-externalagent now:
User-Agent: meta-externalagent
Disallow: /Official references:
- Facebook developer documentation for
FacebookBotno longer mentions GenAI. - Facebook developer documentation for web crawlers, including
Meta-ExternalAgentmentions “AI”.
Meta Web Crawlers - Sharing - Documentation - Meta for Developers
This page lists the User Agent (UA) strings that identify Meta’s most common web crawlers and what each of those crawlers are used for.developers.facebook.com
This entry was edited (1 year ago)
Seirdy reshared this.
Seirdy
in reply to Seirdy • • •Obligatory “the W3C/EU effort for a standard TDM Reservation Protocol would solve this by letting sites globally opt out of datamining for purposes like this, without having to play robots.txt whack-a-mole”.
People really ought to support the
NoAIandNoImageAIX-Robots tags in the meantime. They probably won’t, though. Unlike the TDM Reservation Protocol, there’s no legal incentive to do so; unlikenoindex, there’s no self-interested reason to do so (noindexis often used on pages that don’t belong in search results, like duplicates and non-public pages).Seirdy reshared this.