I mostly agree here. I haven't read the Doctorow piece. But I've been having a similar conversation within my professional circles. Yes Crowdstrike screwed up. But humans are gonna screw up. We know this. So rather than discussing who to blame, the better discussion is how so many companies found themselves exposed with no way of taking control of what was happening to their systems.
hachyderm.io/@jenniferplusplus…
in reply to Marco Rogers

I would make a stronger statement. For a long time, I've been feeling like this is the major flaw in our current trajectory. The vendorization of everything is a trap. Not only because of the huge "blast zones" as Jennifer puts it. But because we've also created an environment where dumping risk onto vendors effectively means accountability is diffused to the point of uselessness.
in reply to Marco Rogers

Crowdstrike is gonna see some consequences for this. But I suspect they won't be as severe as people think. Okta is still around after failing at their one job multiple times. So is Lastpass. Vendor failures are becoming normalized at a rapid pace. And meanwhile the companies who are delivering direct service largely dodge accountability by just blaming their vendors.
in reply to Marco Rogers

My mom has been trying to fly back from San Francisco to Atlanta since Friday. Delta Airlines has been totally hosed by this issue. I guess they were deeply invested in windows and Crowdstrike.

But what decision does Delta get to make now? What can they change that won't expose them to a potential Crowdstrike or a similar vendor exposure? I don't think they have that option. The whole ecosystem is set up to shed risk in a way that makes accountability impossible.

Unknown parent

mastodon - Link to source

Richard Johnson

@jrconlin @lightweight

Case in counter-point: you could call it broken ticket-toss buck-pass subculture perversely incentivized.

We said "no agent complexity, or if you must, it will have phased and tested/metered roll-outs of changes". We were adamantly overruled. They said "we accept the risk of total revenue outage if this agent breaks catastrophically" and (Catch-22) "you must still ensure no outage" and "you must get budget elsewhere to completely re-engineer your service".

Unknown parent

mastodon - Link to source

Marco Rogers

@jrconlin @lightweight I'm not saying you're wrong about what happens sometimes. But I think it's way too easy to just blame "some exec". In my experience, engineers do not stand up and say "we shouldn't do this". And even if the decision is made for them, they still have autonomy to do lots of things that might mitigate the worst case scenarios. We can't fix "some exec". We can look critically at our own actions and what we do have control over.