So rblind.com didn't go down, and it isn't anymore broken today than it usually is. Cloudflare breaks #accessibility. It always has, and it always will. Now the rest of you know how it feels to be locked out because you don't run JavaScript and can't solve a captcha. Will you stop using #cloudflare? No, of course not! If it blocks #AI, you're completely fine with denying that blind users might be human at all. As AI becomes more and more capable, the definition of "human" is going to become more and more restrictive, until everyone who isn't completely able bodied and doesn't have absolutely perfect vision, hearing, and cognition is completely excluded.
Urzl
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Urzl • • •Frost, glow wolf 🐺
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Frost, glow wolf 🐺
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •I think the problem is that "AI" scrapers are getting around that by a) ratelimiting themselves and b) botnetting.
IMO it's 100% legit to block "AI" scrapers based on what they're going to do with your stuff, in addition to the whole resource use thing. And as for resource use, it sounds like some of them do botnetting without ratelimiting themselves, so you still get hammered it's just from _everywhere._
🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Frost, glow wolf 🐺
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Jason the fox 🔜 Giant fox maw
in reply to Frost, glow wolf 🐺 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Jason the fox 🔜 Giant fox maw • • •@foxbutt@IceWolf In general, you can start by tarpitting. And you can also rate-limit by geographic areas. For example, 99 percent of my visitors are from the US and Canada. But obviously, I don't want to block the entire rest of the world. But I do have all other countries on a much, much quicker rate limit. There are ways around this if you care. But most people don't; accessibility is a sacrifice they are willing to make on my behalf.
The other problem, of course, is all of these solutions will block legitimate scripts. For example, The Internet Archive, scripts that mirror resources on physical media to ship to underdevelopped countries, and that thing that I use to download multi-page articles for offline reading on my phone because the subway doesn't have internet access.
Jason the fox 🔜 Giant fox maw
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •agreed that blocking real bots ends up being an issue sadly. i kinda just allowlist some user agents since the bots i get are more interested in faking old browser user agents.
i do agree with the alt text comparison to an extent. i dont know if this is entirely true but i feel like images with alt text would be more valuable to ai scrapers building image generation tools. HOWEVER, despite that, i want to have alt text on my site and fedi posts for accessibility.
🇨🇦Samuel Proulx🇨🇦
in reply to Jason the fox 🔜 Giant fox maw • • •Frost, glow wolf 🐺
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •...So you're feeding everyone's posts into some sort of LLM without their consent? That sounds kinda shitty.
Unless it's local-only, of course.
🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Frost, glow wolf 🐺
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •Fortunately, fedi's decentralized nature provides a little bit of defense against this
if they get access to mastodon.social's federated timeline, a) that only covers stuff mastodon.social sees, and b) that's a normal account that can be banned if the mastodon.social people find out (and actually have moderation)
and you could absolutely spin up your own instance to scrape fedi, but someone tries that every few months and everyone blocks the hell out of them. :3
You can't just scrape the activitypub API without being a legit server (and hence blockable), that's what authorized fetch is for.
🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •🇨🇦Samuel Proulx🇨🇦
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •Frost, glow wolf 🐺
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •Fortunately, "it" isn't one single thing. :3
Sure Google and Facebook could totally spin up random instances. But getting people to _federate_ with their instances might be tricky!
(also Facebook's pretty blatantly doing this with Threads, and fortunately a lot of servers blocked them on sight.)
Like, to federate, you'll need to have people on your instance worth talking to. That kinda requires actual users.
🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Frost, glow wolf 🐺
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •Well now I'm curious about our instance (basically single user, though technically not since we're plural and different people have different accounts!).
> select count(*) from statuses;
> count
> ---------
> 1856225
> (1 row)
Okay, so we don't federate quite as widely as you. :3
Frost, glow wolf 🐺
in reply to Frost, glow wolf 🐺 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Frost, glow wolf 🐺
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •Yeah, relays are kinda weird.
Huh on backfilling. "Most implementations" being "anything that's not Mastodon" I take it, just like with all the other useful features Masto doesn't have?
Masto actually added backfilling super recently with 4.5 or something, I gotta backport it because we're running a patched Masto 3 but ugghh it's gonna be a pain.
🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •@IceWolf@foxbutt Nah, it depends on how your implementation is configured. Some server owners turn off backfilling because they want to save disc space and don't care about search. And some server owners configure things so that there server will only show your server a certain subset of posts from a user, rather than all of them when it asks. And then authorized fetch and how it interacts with blocking and post privacy adds another layer of complexity.
And, of course, none of this stuff is (or can) be enforced by any kind of technical server. Someone could easily write/patch an "evil mastodon" to suck up as many posts as it can, while fooling the other server into thinking the requests are legit. Kind of like how some torrent clients are written to upload as little as possible.
🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Frost, glow wolf 🐺
in reply to Frost, glow wolf 🐺 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Frost, glow wolf 🐺
in reply to 🇨🇦Samuel Proulx🇨🇦 • • •🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Frost, glow wolf 🐺
in reply to Frost, glow wolf 🐺 • • •like, this is absolutely a thing that happens and it's also a problem but it's a cultural problem rather than a technical problem
I don't want my stuff to wind up in a Slop Machine database, because it's just creepy to have things mimicking me (along with everyone else) like that
but of course I can't technically stop you from doing that to my posts. But I can still ask people, hey please don't do that.
🇨🇦Samuel Proulx🇨🇦
in reply to Frost, glow wolf 🐺 • • •Frost, glow wolf 🐺
in reply to Frost, glow wolf 🐺 • • •But yeah, detecting the ones that are ratelimiting themselves (or botnetting to spread the source IPs) is tricky.
Anubis does it with proof of work, "do a small thing that's inexpensive for any single person but a botnetting scraper would have to do it thousands of times". But that all falls over without javascript.
It'd be nice to see an alternate challenge for if JS doesn't work, some kind of actual question form that asks you a thing that LLMs are bad at. Maybe a randomized simple math problem or something, I dunno. A thing that isn't difficult to solve, if you're an actual person with reasoning and logic instead of a statistical word-slapper-together. (Though that doesn't help cognitive-deficiency people...)
genstar.service
in reply to Frost, glow wolf 🐺 • • •What I got from Anubis is that it was inexpensive for a single machine. So what diference does it in a bot net?
Specially with AI scraping malware infected computers.
🇨🇦Samuel Proulx🇨🇦
in reply to genstar.service • • •