Is the fediverse about to get Fryed? (Or, “Why every toot is also a potential denial of service attack”)

ar.al/2022/11/09/is-the-fedive…

CC @stephenfry @gretathunberg

#fediverse #mastodon #stephenFry #gretaThunberg #smallTech #smallWeb

in reply to Aral Balkan

If I understand correctly, this entire post is predicated on the assumption that folks are running github.com/mastodon/mastodon. I wonder if this would be the case with a more efficient implementation of ActivityPub, e.g. github.com/superseriousbusines…

Which is to say, the implementation might be the limiting factor, not ActivityPub itself.

in reply to Katherine Cox-Buday

@katco The assumption that multi-tenant servers will be used is baked into the protocol (which is, to a degree, understandable. It would never have passed otherwise. I say this as someone who has seen every raised eyebrow possible when I suggest single-tenant servers). But yes, I’m sure there’s a lot that can be done at the app level as well. I’m actually keeping an eye on the task queue as I write these replies and every reply creates 1,377 tasks. I’m assuming those are the number of instances.
in reply to Katherine Cox-Buday

@katco Yeah, it is. I think it’s a combination of the protocol and the multi-tenant aspect that creates the bottleneck but I do see your point. For the single-tenant Small Web stuff I’m working on, I was running experiments with WebSockets – which are so lightweight – and I could run hundreds of thousands of them on a tiny instance (think: if we had constant connections between instances of one). Something like that would likely provide a very different experience, even with ActivityPub.
in reply to Aral Balkan

@katco

It's a plain truism in performance engineering that if you free up one bottleneck, say the overhead on the queue itself, then the bottleneck will move and it may well mean that waiting on read or write locks becomes mutual deadlocks.

Or more prosaically running out of connections, handles to a dependent service. All those are resolvable with effort.

But it could be considerable effort.

in reply to Paul Wilde :blobcatnim_new: :dontpanic_nobg:

@paul Thanks, Paul. Hopefully, it will also get all of us thinking about what we can do to combat the centralisation tendencies present in the ActivityPub protocol and Mastodon server designs. (As well as considering what a web where we all own and control our own places might be like.)

@gretathunberg @stephenfry

in reply to Aral Balkan

A long time ago, before even Geocities, I had a blog that had enough traffic to get noticed, and I was approached by a researcher at the University of Georgia offering me free hosting so long as uptime wasn't critical.

The site was called Dragonfire. The experiment was squeezing as much power out of a commodity white box as technically possible. It was overclocked to the hilt, water cooled, and the name was chosen after some coffee spilled on the case and started boiling.

in reply to Aral Balkan

that's the screen I (and probably every #mastoadmin) was checking regularly in the last few days 😁
Our instance is not that big, but seems to be integrated in the network, we hit 500k 1.5 weeks ago, steering towards 1M events/day.
It's really interesting when you're hosting the hardware yourself (especially as a hobby project) and can't easily scale up CPUs, but switching to nvme really payed off. We currently run 2 sidekiq processes at 10 treads each, with basically zero queue backlog.
in reply to Jack Deeth

@JackDeeth The Small Web stuff I’m working on is basically optimised for 1.

(The reason being that there’s a huge amount of complexity that is added when you move from 1 to 2. Or, conversely, a lot you can simplify in the experience if you’re optimising for just one person who is the owner of the server.)

But yes, it would be very interesting to see what design properties a solution for small instances would have.

in reply to Aral Balkan

@JackDeeth I think this toot is the best portrayal of the need for a small-web alternative to the corporate insulation/advertisement model. Interaction on the web has a fundamental cost.

I'm with you on the single-tenant solution being the better one, so how can we build a one-click product for nontechnical users to easily own, maintain, and bear the cost burden of their own instances?

Though I'm not sure a really efficient algorithm can solve the high cost for Stephen Fry's instance

in reply to Aral Balkan

We came to more or less this conclusion as well, and spent yesterday spinning up a single user instance for @Raspberry_Pi. With the added benefit that we ended up running it on a #RaspberryPi. raspberrypi.com/news/an-escape…

Aral Balkan reshared this.

in reply to Aral Balkan

I was wondering about real "big-world" scalability issues here. Thanks for posting this clear illustration of the problem. Yikes.

I would have thought AP would also include some gossip-style ability to spread posts around without the one instance being responsible for pinging ALL the followers. Guess not?

PS: apologies for the x Sidekiq jobs I have just triggered with this reply.

in reply to Adam Dalliance

@pre (And every reply.)

I mean, of course, everthing is always a work in progress and hindsight is always 20/20. I just hope we can acknowledge some of the core design decisions that we take for granted that we’ve actually inherited from Big Tech and see how we can go forward differently. (Or at least apply social pressure where protocols and server designs might incentivise centralisation.)

in reply to Aral Balkan

Could it make sense, for instances who wants to have this setting on, to introduce a limit on the number of followers/interactions one can have when hosted there? With an invitation to contact the admin if that number is reached, either to contribute financially to the server’s hosting or or get help moving to another, possibly personal, instance. It would make creating new public instances a less financially risky project in case someone big joins (or someone who joins become big later).
in reply to Aral Balkan

At the risk of causing these problems by replying to a popular person with an engaging post, I have a couple of questions...

1. Is there a possibility that code solutions exist to optimize against this behavior that arises from popular users?

2. Would setting replies to unlisted help resolve some of this storm?

3. If (2), could a configuration change to Mastodon instances that made replies unlisted by default (but changeable on a per post basis) to public help resolve the issue?

in reply to Alx 🐈

@alx Thanks, Alessandra, that’s very kind of you to say. And, hey, we’re all newbies at all this to one degree or other. Here’s to figuring things out and making things better together :)

PS. If there’s anything that doesn’t make sense about the Small Web stuff I’ve written, etc., please just ask. I’m always trying to improve/simplify how I explain things and it’ll help me to know which bits are confusing.

in reply to Aral Balkan

This is a really helpful framing for the problem, and seems like something we're going to have to grapple with sooner rather than later. A few questions spring to mind:

- What is the current cost of maintaining an identity, as a function of followers, activity, media, etc?
- How much can that be optimized in the code? Through protocol improvements?
- Assuming the optimized cost is still non-trivial, are there significant advantages to shared hosting?

in reply to Aral Balkan

If I understand you correctly, single-tenant instances (with no other changes) would mean N followers = N sidekiq jobs per interaction. It seems like the only way to ensure processing power grow with engagement is some kind of bittorrent-driven thing - servers hosting your post promise to seed to a certain ratio / similarly users reading posts seed them as long as they remain on the page, similarly to PeerTube.
in reply to Aral Balkan

There’s some tension between the points you make and the notion of the small instance hosting a community of exercise enthusiasts or esperantistoj.

As both you and @profcarroll point out, a well connected small instance can be quite resource intensive. A poorly connected instance is a virtual ghost town. Could some of this be solved by using lighter weight ActivityPub software? (Misskey or Pleroma)

in reply to Aral Balkan

In your post you talk about your server (which I assume is some sort of virtual private server) having 12 sidekiq threads and can deliver an ActivityPub update to 12 other servers at a time.

Do you have any idea how much performance the virtual server provided by mastohost has in values that allow a comparison to other VPSes? As in, what is "a sidekiq thread" for the server running it, do you get one sidekiq thread per CPU? Is CPU speed of the server, the web connection, or speed of the recieving server (e.g. if it needs to acknowledge recieving the ActivityPub push) the limiting factor for the speed at which such a sidekiq job executes? Maybe @hugo can provide some info on this?

in reply to Aral Balkan

I'm very new to mastodon, and still trying to learn the basics of how it is structured. Your article has been very informative and, though coming from a very different and more-practical perspective, increasingly seems to raise concerns along the same lines as I've had as I've been reading about this.

Specifically: that this thing seems to be too centralised (lots of little centralised servers, but still centralised), and (if I'm reading correctly) only a single level of federation?

in reply to Aral Balkan

Could I ask you a few naive questions, as I am not up to speed on the fediverse architecture.

It sounds like the problem of exponential job growth are technically solvable in code (rather than inherent in the system), is that correct?

Can the jobs not get gracefully queued and balanced, with the result popular people are just processed more slowly?

Can the jobs be batched so newer jobs can do batch updates and allow older jobs to be dropped?

Thank you

in reply to Aral Balkan

This is concerning. It seems to imply that decentralization is impossible long term for most people.

I see this going in the direction of celebrities creating Mastodon instances and starting to post ads to users on their instance, and possibly charge users for the benefit of getting priority access to their posts. (You can see them before the sync jobs run for other instances).

I guess if you want freedom from that you just need to live with a worse experience, at least you have the choice.

in reply to Aral Balkan

Thanks, genuinely interesting and thought provoking. I’m barely known anywhere and my first instinct was to setup my own instance, if only to have some control over my presence.

This does present other challenges you don’t cover - natural follows from things like being in a local timeline just aren’t there. There really is a strong incentive for providers and users for centralisation.

Short of a fully P2P (ala Chord/Pastry) system, genuinely stumped on technical solutions here.

in reply to Aral Balkan

very interesting Aral thanks. Something I don't quite follow though - I understand why large instances are socially bad, and should be discouraged for that reason, but I don't understand how smaller / single-user instances solve the @stephenfry problem. Isn't that problem fundamentally that each of his followers needs to be contacted? How does smaller instances change that?
in reply to Aral Balkan

Maybe its just me, but if it's that expensive to be moderately popular, I just don't see that working for the majority of people, at all. We should work to help trustworthy orgs see the opportunity to be valuable to their stakeholders and possibly to have a revenue stream. I imagine professional orgs, to which people are already paying dues, running servers and adding rel=me tags to pages in their professional directories.
in reply to Aral Balkan

thanks for sharing your insights! I read that single user servers were inefficient somewhere, but can't remember where. I think it had to do with how there's some economy in scale built in when multiple people follow others from the same other server or something. It seems to me that we could scale efficiently without everyone having to have their own server. Like, there must be a middle ground between too large and too small
in reply to Aral Balkan

Ah, the very sudden and extreme growing pains of Mastodon. Issues that will have to be addressed, if large-scale adoption and repeat usage is desired (is it desired?). Also, HOW do you get to see number of favorites, responses, et al on your posts? Mine are always blank, no matter how much engagement a post gets. It's like I throw a post out into the ether and have to guess about what's happening with it.
in reply to Aral Balkan

I've just been arguing with someone else that we need a protocol that keeps the size of any "frictionless" community to a small multiple of Dunbar's number, so it's very nice to see your argument along the same lines.

I'm more interested in creating (small) common spaces than personal ones, though

And I'm struck by the irony of "I want to flee Twitter because Musk might charge for it" being met with, "The best way to Mastodon is to pay $ for your own instance"

in reply to Aral Balkan

Since I've been in the information-centric networking group at #IETF115, I'd like to add that this kind of tech would be a good complement to the small web notion.

The basic principle is that you're not looking to speak to a server, but asking the network where a piece of information is. So if my machine has already received Aral's post, then it's as legitimate a source of this post as his own, etc. Load spreads out.

@stephenfry @gretathunberg

in reply to Aral Balkan

a fun solution mostly for you Aral is to run your web service on a cheap VPS and your sidekiq in a cluster. so each time you say something you can scale up your sidekiq cluster, until the wave passes and then scale it back down again. it's still insane that you should have to pay so much just to participate in social media though. I run my mastodon in k8s so I'll create an autoscaler setup that does this for me.
in reply to Aral Balkan

Somewhat agree, strongly disagree with many conclusions.

Single-tenant #instances are a ruthless attack on our planet.

#Mastodon is conceptually flawed with topical communities expressing themselves mainly via a local #timeline. Lists: poorly implemented. Federated #groups: a.gup.pe hacks

Look at the self-descriptions of any instance focused on a large #community, and ask yourself who could afford to *not* register at the *one* place where the *action* is.

@stephenfry @gretathunberg

in reply to Aral Balkan

judging by how many of the replies to this post still include at-mentions to Fry and Thunberg despite their replies not actually being relevant to them, I also wonder if support for BCC rather than CC would be useful. ;)

Aside from that, it does sound like (as others in the replies have also remarked) that more smaller instances would actually increase the traffic?

Final note: you mistyped 'extinguish' as 'entinguish'. ;)

in reply to Aral Balkan

Maybe a solution, especially for accounts with a lot of followers like you, is to be more intentional in your chattiness (and this could be a habit or norm across the fediverse).

You want to reply to everyone, but do all those replies have to be public posts? If you just want to say a "thank you" or a small courtesy back to someone you know ("thanks! how's your cat?"), you could make it a Direct Post. The way Mastodon puts these in thread for you and the other person like any other post (rather than jusr in a separate DM inbox) works great for this!

Unless I am really wrong about how Mastodon handles these, it would only be one jog going out to the one instance of that user, rather than broadcasting to everyone who follows you.

There's probably an analogy here about how you are using a high power concert PA system to broadcast a separate reply to every single person in the stadium audience, when instead you could just mingle into to crowd later and directly chat with each attendee (and Hugo is powering the PA by running on a treadmill backstage).

If you reply to this message from me with a simple "thanks, great point!" It could be a Direct Post. If you have some commentary or correction of value to your followers, keep it Public!

in reply to Aral Balkan

can i say: awesome article👏 ? Sorry but not sorry, i don’t think limiting oneself per se is the solution. One should defo think what they are hoping to achieve by communicating, and people could abide to some basic principles (power surge hours etc) if/when needed (think Ukraine). But otherwise it is by interacting that we learn and grow, incl learn about #communication itself, so let’s not shut up just yet!
in reply to Aral Balkan

Self hosting is the way. After I joined, my first thoughts were "K neat, how do I host my own?"
Great article, my initial afterthoughts:

Instances need to be capable of a kind mitosis for when they get too large.

Throw more lvls of decentralization at instance level blocking?

With this 3000 requests loading up issue, perhaps there is a way to redistribute those requests so other instances can share a set amount of resources when they are below a certain load threshold? Hmm.

@lashman

in reply to Aral Balkan

Fediverse technical details, guesswork

> Well, there’s only one thing you can do when you find yourself in such a pickle: scale up your Mastodon instance … Or start blocking followers, or unfollowing people, or staying quiet.

Maybe I know too little about how ActivityPub works and/or I've been drinking too much solar.lowtechmagazine.com but would another option be to *wait*?

The queue will resolve itself *eventually*, right? — so what if we treat this slower pace as a Feature, Not A Bug?

in reply to Aral Balkan

I have to ask, how much of what you describe is intrinsic in the problem space, how much is intrinsic in the particulars of the design, and how much is intrinsic in the implementation of Mastodon in particular? Sidekiq seems a particularly poor choice to fire off 3000 tasks a few times an hour. Would different technology choices help, or is the problem space itself just inherently prone to this kind of scale problem?
in reply to TallBaldDesi

@TallBaldDesi It’s a network of people who communicate with each other using a protocol called ActivityPub. We’re doing it right now as we have this conversation. When you wrote your question, your Mastodon server used the ActivityPub protocol to find me and send it to me. My server (which could be running Mastodon or any other piece of software that runs the ActivityPub protocol) would do the same to get my message to you. A protocol is just a set of rules for communicating that we agree on.
in reply to IAmDarthMole

@darthmole Afaik, it should be far more efficient to distribute intra-server messages. That’s one of the erroneous incentives for vertical scale. If I was on mastodon.social and my ~23,000 followers were also on mastodon.social, there wouldn’t be a problem. Mastodon is designed to handle that without breaking a sweat. And hey, did you notice, we just recreated Twitter.com?… oops! :)
in reply to Aral Balkan

thanks for the reply! I see I misunderstood your article earlier.

The comment about having your own server if you had a lot of followers was meant to be a workaround to limit resource issues for the many due to the way mastodon was designed to handle interserver communications?

Would I be accurate in stating that your article was stating that the fact it handles intraserver communications so much better actually hurts itself and its mission as a decentralized champion?

in reply to Aral Balkan

I just read your post about the overall benefits of more Mastodon instances and how people should be encouraged to create their own.

My son has been working on his own instance by following the official docs. This really needs to be a lot easier to do, though.

#Linode and #DigitalOcean have very easy installation options although these options are pretty expensive compared to a similar VM option at a hosting provider like #Hetzner, for example.

I'm sure this is just a matter of time.