Seirdy

Seirdy@pleroma.envs.net

Skim before following: seirdy.one/about/fediverse-gre…. It describes how I accept follow requests, block people, etc.

puppy.

I follow and unfollow extremely liberally.

Interested in fitness, nutrition. accessibility, privacy, security.

I am made of microplastics and can be trusted with your forklift.

tech-stuff: check my "uses" page: seirdy.one/about/uses/
Other tech interests in no particular order: linked data, the #IndieWeb, the #Gemini protocol (more into the community than the technology).

Politics: Leftist, capitalism bad, anti-consumerism. Vegan.

Neuro-atypicality: #anxiety, #ADHD, #ActuallyAutistic.

:QueerCat_Pansexual: :neodog_flag_androgyne:

don't flirt unless i said it's ok

[Verifying my OpenPGP key: openpgp4fpr:AC6AF1F838DF3DCC2E47A6CF1E892DB2A5F84479]

Opinions are those of your employer.

akkoma

Seirdy

1 year ago

Seirdy
1 year ago

New scraper just dropped (well, an old scraper was renamed):

Facebook/Meta updated its robots.txt entry for opting out of GenAI data scraping. If you blocked FacebookBot before, you should block meta-externalagent now:

User-Agent: meta-externalagent
Disallow: /

Official references:

Facebook developer documentation for FacebookBot no longer mentions GenAI.
Facebook developer documentation for web crawlers, including Meta-ExternalAgent mentions “AI”.

#RobotsTxt #Scraper

Meta Web Crawlers - Sharing - Documentation - Meta for Developers

This page lists the User Agent (UA) strings that identify Meta’s most common web crawlers and what each of those crawlers are used for.

^{developers.facebook.com}

This entry was edited (1 year ago)

Seirdy reshared this.

in reply to Seirdy

Seirdy

in reply to Seirdy 1 year ago

Obligatory “the W3C/EU effort for a standard TDM Reservation Protocol would solve this by letting sites globally opt out of datamining for purposes like this, without having to play robots.txt whack-a-mole”.

People really ought to support the NoAI and NoImageAI X-Robots tags in the meantime. They probably won’t, though. Unlike the TDM Reservation Protocol, there’s no legal incentive to do so; unlike noindex, there’s no self-interested reason to do so (noindex is often used on pages that don’t belong in search results, like duplicates and non-public pages).

This entry was edited (1 year ago)

Seirdy reshared this.

⇧

Seirdy

Seirdy 1 year ago • •

Meta Web Crawlers - Sharing - Documentation - Meta for Developers

Seirdy

Seirdy
1 year ago