New scraper just dropped (well, an old scraper was renamed):
Facebook/Meta updated its robots.txt entry for opting out of GenAI data scraping. If you blocked FacebookBot
before, you should block meta-externalagent
now:
User-Agent: meta-externalagent
Disallow: /
Official references:
- Facebook developer documentation for
FacebookBot
no longer mentions GenAI. - Facebook developer documentation for web crawlers, including
Meta-ExternalAgent
mentions “AI”.
Meta Web Crawlers - Sharing - Documentation - Meta for Developers
This page lists the User Agent (UA) strings that identify Meta’s most common web crawlers and what each of those crawlers are used for.developers.facebook.com
This entry was edited (3 months ago)
Seirdy reshared this.
Seirdy
in reply to Seirdy • • •Obligatory “the W3C/EU effort for a standard TDM Reservation Protocol would solve this by letting sites globally opt out of datamining for purposes like this, without having to play robots.txt whack-a-mole”.
People really ought to support the
NoAI
andNoImageAI
X-Robots tags in the meantime. They probably won’t, though. Unlike the TDM Reservation Protocol, there’s no legal incentive to do so; unlikenoindex
, there’s no self-interested reason to do so (noindex
is often used on pages that don’t belong in search results, like duplicates and non-public pages).Seirdy reshared this.