Evan Prodromou

1 month ago • •

Evan Prodromou
1 month ago • •

Should ActivityPub software check robots.txt when delivering activities or reading data?

#EvanPoll #poll

Strong yes (44%, 15 votes)
Qualified yes (32%, 11 votes)
Qualified no (17%, 6 votes)
Strong no (5%, 2 votes)

34 voters. Poll end: 1 month ago

in reply to Evan Prodromou

modulux

in reply to Evan Prodromou • 1 month ago • •

The abstract of RFC 9309 defines the scope for robots.txt and an AP server is outside it:

This document specifies and extends the "Robots Exclusion Protocol" method originally defined by Martijn Koster in 1994 for service owners to control how content served by their services may be accessed, if at all, by automatic clients known as crawlers.

in reply to modulux

Evan Prodromou

in reply to modulux • 1 month ago • •

@modulux "automated client" seems like a pretty broad category.

@modulux

in reply to Evan Prodromou

modulux

in reply to Evan Prodromou • 1 month ago • •

Sure, but the nature of the restriction seems to be against crawlers recursively hitting resources and such, rather than realising AP on behalf of (presumably human) users. I might agree botsin.space should honour robots.txt, but it doesn't seem otherwise relevant. Only my interpretation, but that's what I get from:

Crawlers are automated clients. Search engines, for instance, have crawlers to recursively traverse links for indexing as defined in [RFC8288].¶
It may be inconvenient for service owners if crawlers visit the entirety of their URI space. This document specifies the rules originally defined by the "Robots Exclusion Protocol" [ROBOTSTXT] that crawlers are requested to honor when accessing URIs.¶

From the same RFC.

⇧