🇨🇦Samuel Proulx🇨🇦

1 day ago

🇨🇦Samuel Proulx🇨🇦
1 day ago

So rblind.com didn't go down, and it isn't anymore broken today than it usually is. Cloudflare breaks #accessibility. It always has, and it always will. Now the rest of you know how it feels to be locked out because you don't run JavaScript and can't solve a captcha. Will you stop using #cloudflare? No, of course not! If it blocks #AI, you're completely fine with denying that blind users might be human at all. As AI becomes more and more capable, the definition of "human" is going to become more and more restrictive, until everyone who isn't completely able bodied and doesn't have absolutely perfect vision, hearing, and cognition is completely excluded.

in reply to 🇨🇦Samuel Proulx🇨🇦

Urzl

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

Neurodivergence is on the same track. They keep narrowing the definition of "normal" and put more and more of us on the outs.

in reply to Urzl

🇨🇦Samuel Proulx🇨🇦

in reply to Urzl 1 day ago

@gooba42 Yup. Though your post reminds me of the time a childhood friend of mine got banned from a game for being a bot, because she was Neurodivergent and played too regularly for too many hours. This sort of thing has been happening since the dawn of statistics. I think the game was runescape or neopets or one of those 90's browser games all the cool kids were playing back in the internet stone ages.

@Urzl

in reply to 🇨🇦Samuel Proulx🇨🇦

Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

And there are other solutions to block "AI" scrapers! Though, not running Javascript probably means Anubis stuff blocks you too...

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf Yes, it does. The problem is that all of these are intended to destroy the open web, by blocking a particular type of use someone doesn't like. The only way to preserve accessibility is to instead focus on blocking whomever or whatever is using an unfair share of server resources. If that's AI, block it. If that's me hammering your server for some reason, block me, too. Setting up rate limitting is a well understood problem. Blocking AI because it's slamming the server is an excuse. And you already have no reliable way to tell the difference between an AI, a script, and a screen reader. Block based on resource use.

@Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦

Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

I think the problem is that "AI" scrapers are getting around that by a) ratelimiting themselves and b) botnetting.

IMO it's 100% legit to block "AI" scrapers based on what they're going to do with your stuff, in addition to the whole resource use thing. And as for resource use, it sounds like some of them do botnetting without ratelimiting themselves, so you still get hammered it's just from _everywhere._

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf I strongly disagree with this. In the 90's, publishers insisted that all ebook platforms block text to speech, because blind people should be forced to purchase the more expensive audiobooks. This didn't stop until multiple court cases were filed against Amazon and others, and a UN treaty got signed. If we allow authors to say "You can't do X, Y, and Z with my otherwise freely available content", that will soon be abused to disallow using translation software on it, then turning it into Braille, then using screen readers on it, etc. Some directors still don't allow audio description of there films to be produced. Should an artist be allowed to say "no alt text for my photo can ever be produced by anyone"? And it's quite easy to not just rate limit, but tarpit abusive connections. So soon all the botnets resources are taken up by holding open worthless connections.

@Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦

Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

Hmm. I'd say there's a difference between "no screen readers!!" (that reminds me of the Youtube anti-adblock nonsense) and going "don't scrape my site to shove it into the Slop Machine".

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf Right, but as long as capitalism exists, any limits we set to stop one will always be abused to stop the other. As someone with accessibility needs, the only way for me to have a life and job is for the internet to stay completely open. Not just "open to whatever types of use authors and companies feel like granting", because that always excludes accessibility.

@Frost, glow wolf 🐺

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf Twitter and Reddit are more good examples. They decided to shut down there API's, in order to sell content to the AI machine. So most blind people are on Mastodon now. But if the fediverse decides they want to stop all AI training as well, a side-effect of blocking that will be shutting out people using assistive technology. Just like Reddit and Twitter did. Only for the opposite reason.

@Frost, glow wolf 🐺

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf And just like how there is no DRM scheme that can stop someone from copying digital bits, and no encryption scheme with a backdoor that only the good guys can use, there is no way to block AI scrapers, but not screen readers and other accessibility programs. Because they're both just software.

@Frost, glow wolf 🐺

in reply to Frost, glow wolf 🐺

Jason the fox 🔜 Giant fox maw

in reply to Frost, glow wolf 🐺 1 day ago

THIS. i have my own custom bot blocker that works well with the scrapers i get. it actually doesn't use JS even. i wont deny that it probably still reduces accessibility sadly. but it works. and i cant limit based on usage when every bot visit is a different IP. i try to block their datacenter IP ranges and they switch to residential IP botnets.

in reply to Jason the fox 🔜 Giant fox maw

🇨🇦Samuel Proulx🇨🇦

in reply to Jason the fox 🔜 Giant fox maw 1 day ago

@foxbutt@IceWolf In general, you can start by tarpitting. And you can also rate-limit by geographic areas. For example, 99 percent of my visitors are from the US and Canada. But obviously, I don't want to block the entire rest of the world. But I do have all other countries on a much, much quicker rate limit. There are ways around this if you care. But most people don't; accessibility is a sacrifice they are willing to make on my behalf.

The other problem, of course, is all of these solutions will block legitimate scripts. For example, The Internet Archive, scripts that mirror resources on physical media to ship to underdevelopped countries, and that thing that I use to download multi-page articles for offline reading on my phone because the subway doesn't have internet access.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to 🇨🇦Samuel Proulx🇨🇦

Jason the fox 🔜 Giant fox maw

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

agreed that blocking real bots ends up being an issue sadly. i kinda just allowlist some user agents since the bots i get are more interested in faking old browser user agents.

i do agree with the alt text comparison to an extent. i dont know if this is entirely true but i feel like images with alt text would be more valuable to ai scrapers building image generation tools. HOWEVER, despite that, i want to have alt text on my site and fedi posts for accessibility.

in reply to Jason the fox 🔜 Giant fox maw

🇨🇦Samuel Proulx🇨🇦

in reply to Jason the fox 🔜 Giant fox maw 1 day ago

@foxbutt@IceWolf Right, because an open internet means that you don't really get to decide what accesses your content: my screen reader, someone on an ebook reader from eight years ago, a smart TV, or a fridge. Abusive bots are a problem, and need to be stopped. But that's to save the server resources, rather than to limit content use. Because if you limit AI ability to scrape your content, you will always lock me out. The entire fediverse is wonderful for AI! It's got an open AI that my specialized accessibility client can access, and so can any AI training tool. To block the AI, you'd have to take away the API, or put up something that would also block other human and automated uses of it as well.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to 🇨🇦Samuel Proulx🇨🇦

Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

...So you're feeding everyone's posts into some sort of LLM without their consent? That sounds kinda shitty.

Unless it's local-only, of course.

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt I'm not at all. But I'm absolutely certain some other company is! All they have to do is spin up an instance and join the major relays. Or just buy access to the local timeline on mastodon.social. My point is that in order to prevent these things, the fediverse would have to block all automated access of any kind: screen readers, third party clients, everything.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to 🇨🇦Samuel Proulx🇨🇦

Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

Fortunately, fedi's decentralized nature provides a little bit of defense against this

if they get access to mastodon.social's federated timeline, a) that only covers stuff mastodon.social sees, and b) that's a normal account that can be banned if the mastodon.social people find out (and actually have moderation)

and you could absolutely spin up your own instance to scrape fedi, but someone tries that every few months and everyone blocks the hell out of them. :3

You can't just scrape the activitypub API without being a legit server (and hence blockable), that's what authorized fetch is for.

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt Everyone blocks the people who admit they're doing that. Do you really think Google and Metta aren't running some random instance under a quirky name? Do you really think every single server admin will refuse money for access to their timelines? The thing about a distributed system is...it's distributed. Once dozens and dozens of servers have it, you really don't have any hope of controlling where it goes or what happens to it.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to 🇨🇦Samuel Proulx🇨🇦

🇨🇦Samuel Proulx🇨🇦

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

@IceWolf@foxbutt There are, of course, ways to "fix" this. We could require relay operators to request photo ID of every single server operator who joins the relay. And we could only federate with other servers that are willing to provide their ID's to us. And we'll only allow API access for approved organizations. The cure sounds a lot worse than the problem, to me!

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to 🇨🇦Samuel Proulx🇨🇦

Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

Fortunately, "it" isn't one single thing. :3

Sure Google and Facebook could totally spin up random instances. But getting people to _federate_ with their instances might be tricky!

(also Facebook's pretty blatantly doing this with Threads, and fortunately a lot of servers blocked them on sight.)

Like, to federate, you'll need to have people on your instance worth talking to. That kinda requires actual users.

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt Currently, my small single-user instance is receiving over fifty thousand posts a day. The database currently contains slightly over twenty million posts. And I'm just one dude. If an organization wanted visibility into the entire fediverse, I'm sure they could do a lot better!

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to 🇨🇦Samuel Proulx🇨🇦

Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

Well now I'm curious about our instance (basically single user, though technically not since we're plural and different people have different accounts!).

> select count(*) from statuses;
> count
> ---------
> 1856225
> (1 row)

Okay, so we don't federate quite as widely as you. :3

in reply to Frost, glow wolf 🐺

Frost, glow wolf 🐺

in reply to Frost, glow wolf 🐺 1 day ago

Course, to federate you also (IIRC) have to follow people, and you're not a(n automated) bot, and followbots tend to be looked down on here. So a Secretly Facebook instance spewing follow requests so they can federate and start sucking up people's posts is pretty likely to be rejected!

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt No, all you have to do is join a relay server. Then it will send you every post from every other server that also joined that relay server. Almost no relay servers currently require approval to join. Also, because of how threading works, once you become aware of a user (perhaps because they boosted you or whatever), most fediverse implementations will happily let you "backfill": IE allow your server to download every public post of that user so you can view it locally.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to 🇨🇦Samuel Proulx🇨🇦

Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

Yeah, relays are kinda weird.

Huh on backfilling. "Most implementations" being "anything that's not Mastodon" I take it, just like with all the other useful features Masto doesn't have?

Masto actually added backfilling super recently with 4.5 or something, I gotta backport it because we're running a patched Masto 3 but ugghh it's gonna be a pain.

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt Nah, it depends on how your implementation is configured. Some server owners turn off backfilling because they want to save disc space and don't care about search. And some server owners configure things so that there server will only show your server a certain subset of posts from a user, rather than all of them when it asks. And then authorized fetch and how it interacts with blocking and post privacy adds another layer of complexity.

And, of course, none of this stuff is (or can) be enforced by any kind of technical server. Someone could easily write/patch an "evil mastodon" to suck up as many posts as it can, while fooling the other server into thinking the requests are legit. Kind of like how some torrent clients are written to upload as little as possible.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt And relays are weird because they're trying to solve the discovery problem. IE I want to make sure I see all posts with the blind hashtag. But in a distributed system, there's really no good way to do that without giving everyone everything.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to Frost, glow wolf 🐺

Frost, glow wolf 🐺

in reply to Frost, glow wolf 🐺 1 day ago

and since fedi has a "know your admin" culture, you don't really see a lot of servers ran by strange faceless people who could be secretly siphoning off everything to Facebook.

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt I dunno. I have no idea who runs tech.lgb or mastodon.online, two servers picked completely at random from the 15,603 instances currently federating with me. The fact that I'm a good person doesn't mean that everyone else on the network is. And let's be real: if I was going hungry and had little or no access to healthcare, and open AI said "Hey, buddy, we'll give you a million dollars for that post archive!" how many people would choose to be sick and homeless rather than make that deal?

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt Also, if you're using Google Chrome, you're already feeding everyone's everything into an LLM without there consent.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to 🇨🇦Samuel Proulx🇨🇦

Frost, glow wolf 🐺

in reply to 🇨🇦Samuel Proulx🇨🇦 1 day ago

Yepp, this is one reason we don't use Chrome.

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt Doesn't matter. Does anyone reading this thread use Chrome? Congrats! We're both in the AI training material.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to Frost, glow wolf 🐺

Frost, glow wolf 🐺

in reply to Frost, glow wolf 🐺 1 day ago

like, this is absolutely a thing that happens and it's also a problem but it's a cultural problem rather than a technical problem

I don't want my stuff to wind up in a Slop Machine database, because it's just creepy to have things mimicking me (along with everyone else) like that

but of course I can't technically stop you from doing that to my posts. But I can still ask people, hey please don't do that.

in reply to Frost, glow wolf 🐺

🇨🇦Samuel Proulx🇨🇦

in reply to Frost, glow wolf 🐺 1 day ago

@IceWolf@foxbutt Right, and once again, I think DRM is an instructive example, here. Companies can ask people not to pirate there stuff, and usually, most of the people will request that most of the time. But it only takes one! And every single method of DRM that attempts to block piracy makes everything both less accessible and worse for everyone.

@Frost, glow wolf 🐺 @Jason the fox 🔜 Giant fox maw

in reply to Frost, glow wolf 🐺

Frost, glow wolf 🐺

in reply to Frost, glow wolf 🐺 1 day ago

But yeah, detecting the ones that are ratelimiting themselves (or botnetting to spread the source IPs) is tricky.

Anubis does it with proof of work, "do a small thing that's inexpensive for any single person but a botnetting scraper would have to do it thousands of times". But that all falls over without javascript.

It'd be nice to see an alternate challenge for if JS doesn't work, some kind of actual question form that asks you a thing that LLMs are bad at. Maybe a randomized simple math problem or something, I dunno. A thing that isn't difficult to solve, if you're an actual person with reasoning and logic instead of a statistical word-slapper-together. (Though that doesn't help cognitive-deficiency people...)

in reply to Frost, glow wolf 🐺

genstar.service

in reply to Frost, glow wolf 🐺 1 day ago

What I got from Anubis is that it was inexpensive for a single machine. So what diference does it in a bot net?

Specially with AI scraping malware infected computers.

in reply to genstar.service

🇨🇦Samuel Proulx🇨🇦

in reply to genstar.service 1 day ago

@Genstar@IceWolf This, too. It also feels like...one step away from cryptocurrency mining. Remember when captchas were just to tell humans and computers apart? Then they were to help digitize public domain books for the general good. Then they were to help digitize books for Google's ebook library. Now they're to help train Google's AI to recognize photos! How long until someone goes "Hey, as long as we're making someone's computer do work to prove it's a real person...hmmm...why don't we have them mine some bitcoin for us? For charity! Well, at first..."

@genstar.service @Frost, glow wolf 🐺

⇧

🇨🇦Samuel Proulx🇨🇦 1 day ago • •

🇨🇦Samuel Proulx🇨🇦
1 day ago