So: the matrix.org database secondary lost its FS due to a RAID failure earlier today (11:17 UTC). Then, we lost the primary at 17:26. We're trying to restore the primary DB FS (which could be fastish), while also doing a point-in-time backup restore from last night (which takes >10h). We believe the incremental DB traffic since last night is intact however. Apologies for the downtime; folks on their own homeserver are of course not impacted.
Sorry, but it's bad news: we haven't been able to restore the DB primary filesystem to a state we're confident in running as a primary (especially given our experiences with slow-burning postgres db corruption). So we're having to do a full 55TB DB snapshot restore from last night, which will take >10h to recover the data, and then >4h to actually restore, and then >3h to catch up on missing traffic. Huge apologies for the outage. Again, folks using their own homeservers are not impacted.
Conduit is a simple, fast and reliable
chat server powered by Matrix. Conduit is an alternative to Synapse and
tries to be lightweight and easy to install, but it is still in development.
Any plans to migrate away from centralized RDBMS? There are so many blob stores which can scale to petabytes and can tolerate the loss of multiple nodes without going offline.
Status update: we’re 47TB through restoring the 55TB db snapshot of the matrix.org DB, but then have to rebuild the DB and replay the subsequent 17h of DB traffic, which will take several hours. Thank you for your patience, and apologies once again for the outage.
weirdly this feels like actually a positive example reinforcing the idea of a decentral fediverse, as other instances are unaffected. Also we had been discussing running an own instance at the @chaotikumev just before the outage. I just wish there were such an easy, neat account migration feature like @Mastodon has. (And I guess I can't just ex- and import chats + keys and use SRV records to have a seamless migration?)
Status update: we've restored the 55TB snapshot and subsequent incremental backups, and are about to replay the remaining traffic since the backup. There are still several unknowns, but if things go well the matrix.org instance should be back in 3-4 hours.
Right, matrix.org is back online as of 17:00 UTC. The server is struggling a bit as it catches up. Huge apologies again for the outage; postmortem + ways to avoid a repeat will be forthcoming. See also theregister.com/2025/09/03/mat… & heise.de/en/news/Matrix-main-s…. Thanks all for your patience.
I don't think that this true, folks from other servers are not effected. As result of the 18years policy I moved my community to other servers and now partially having issues with not shared security keys and lots of unencrytable messages.
When you restore and old save state every key generated and shared in the meantime is gone.
oh. that's a pity. Good luck for the repair and thanks for your work ❤️
But this is also a good reminder to use your own server. I am totally new to Matrix but started using it with my own instance (based on Synapse) from the beginning.
Lambda
in reply to The Matrix.org Foundation • • •The Matrix.org Foundation
in reply to Lambda • • •The Matrix.org Foundation
in reply to The Matrix.org Foundation • • •Solinvictus
in reply to The Matrix.org Foundation • • •good luck on the remediation actions 🫡
#matrixdown
hisold
in reply to The Matrix.org Foundation • • •Vincentimes 🚈 WHY2025 📞9669
in reply to The Matrix.org Foundation • • •The Matrix.org Foundation
in reply to The Matrix.org Foundation • • •reshared this
Pietervdvn and Hubert Figuière reshared this.
MoveFastAndFixThings
in reply to The Matrix.org Foundation • • •Frisk
in reply to The Matrix.org Foundation • • •crispycat
in reply to The Matrix.org Foundation • • •M. Kittel
in reply to The Matrix.org Foundation • • •Conduit - Your own chat server
conduit.rsThibaultmol 🌈
in reply to The Matrix.org Foundation • • •Simon Carpentier
in reply to The Matrix.org Foundation • • •Loafer
in reply to The Matrix.org Foundation • • •Dusty Mabe
in reply to The Matrix.org Foundation • • •AJCxZ0
in reply to The Matrix.org Foundation • • •iooioio
in reply to The Matrix.org Foundation • • •Bernie
in reply to The Matrix.org Foundation • • •The Matrix.org Foundation
in reply to The Matrix.org Foundation • • •MatMaul
in reply to The Matrix.org Foundation • • •T_X
in reply to The Matrix.org Foundation • • •I just wish there were such an easy, neat account migration feature like @Mastodon has. (And I guess I can't just ex- and import chats + keys and use SRV records to have a seamless migration?)
The Matrix.org Foundation
in reply to The Matrix.org Foundation • • •Kalos
in reply to The Matrix.org Foundation • • •PoLiTiPeT
in reply to The Matrix.org Foundation • • •The Matrix.org Foundation
in reply to The Matrix.org Foundation • • •Matrix.org homeserver grinds to a halt after RAID meltdown
Richard Speed (The Register)Bubu reshared this.
kontrollierterWahnwitz
in reply to The Matrix.org Foundation • • •altf4
in reply to The Matrix.org Foundation • • •Vincentimes 🚈 WHY2025 📞9669
in reply to The Matrix.org Foundation • • •Daniel 黄法官 CyReVolt 🐢
in reply to The Matrix.org Foundation • • •🥺
That must have been rough and tough.
We love you! 🧡
Cavallo Pazzo
in reply to The Matrix.org Foundation • • •Thomas Frans 🇺🇦
in reply to The Matrix.org Foundation • • •С.
in reply to The Matrix.org Foundation • • •The Matrix.org Foundation
in reply to С. • • •Bart
in reply to The Matrix.org Foundation • • •OMG 🇪🇺 🇺🇦
in reply to The Matrix.org Foundation • • •I don't think that this true, folks from other servers are not effected.
As result of the 18years policy I moved my community to other servers and now partially having issues with not shared security keys and lots of unencrytable messages.
When you restore and old save state every key generated and shared in the meantime is gone.
Right?
Marcos Dione
in reply to The Matrix.org Foundation • • •just an idea to improve backups:
Make exponential backoff like backups: last month, months 2-3 ago, mos 4-6 ago, 7-12moa, 2-3ya, etc. Or with N messages instead of N days.
Sounds like you could recover the fresher data first, then catch up, then restore backwards.
#backup #SysAdmin
amy
in reply to The Matrix.org Foundation • • •The Matrix.org Foundation
in reply to amy • • •FeedMe Doughnuts
in reply to The Matrix.org Foundation • • •Demi Marie Obenour
in reply to The Matrix.org Foundation • • •Stefano Zacchiroli
in reply to The Matrix.org Foundation • • •as an advertisement for decentralization this is a bit harsh, but definitely effective!
(J/k, of course. Good luck with the recovery and thanks!)
Vera
in reply to The Matrix.org Foundation • • •Askaaron
in reply to The Matrix.org Foundation • • •oh. that's a pity. Good luck for the repair and thanks for your work ❤️
But this is also a good reminder to use your own server. I am totally new to Matrix but started using it with my own instance (based on Synapse) from the beginning.
MrClon
in reply to The Matrix.org Foundation • • •Use another homeservers, my dudes
beta3
in reply to The Matrix.org Foundation • • •Hubert Figuière
in reply to The Matrix.org Foundation • • •Mazhar Hussain
in reply to The Matrix.org Foundation • • •