A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.
I then suspected a power failure, but the UPS should have sent an alert.
The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.
To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.
The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.
That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.
The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.
The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.
Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.
Never rely only on internal monitoring. Never.
miki
in reply to Stefano Marinelli • • •Stefano Marinelli
in reply to miki • • •Uriel Fanelli
in reply to Stefano Marinelli • • •feld
in reply to Uriel Fanelli • • •@uriel ✋ worked for years for an ISP/datacenter whose primary datacenter space was in the first level of our office building. We had only one service for the building. It's technically possible to get two, but it would be from the same power company... so when the drunk driver crashed into the transformer and took out our power in winter it would have taken out both anyway. That actually caused a power surge that destroyed our transfer switch which is another problem that having two services wouldn't have solved. We did have diesel backup generators though
We didn't even have diverse entrances into the building for our fiber for a long long time either. But we were definitely a datacenter. (my brother still works there; nothing has really changed except increased bandwidth)
I have never heard of any rules or regulations that require a "datacenter" to have two buildings and independent power. Sounds like something someone made up...
feld
in reply to feld • • •@uriel there are different "Tiers" of datacenters though, which is probably what people get confused about:
copy/pasted definitions from the first search hit:
Tier 1: A data center with a single path for power and cooling, and no backup components. This tier has an expected uptime of 99.671% per year.
Tier 2: A data center with a single path for power and cooling, and some redundant and backup components. This tier offers an expected uptime of 99.741% per year.
Tier 3: A data center with multiple paths for power and cooling, and redundant systems that allow the staff to work on the setup without taking it offline. This tier has an expected uptime of 99.982% per year.
Tier 4: A completely fault-tolerant data center with redundancy for every component. This tier comes with an expected uptime of 99.995% per year.
We would have been a Tier 2