Search

Items tagged with: Monitoring


A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.

I then suspected a power failure, but the UPS should have sent an alert.

The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.

To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.

The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.

That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.

The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.

The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.

Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.

Never rely only on internal monitoring. Never.

#IT #SysAdmin #HorrorStories #ITHorrorStories #Monitoring



#FreeBSD recommendations for #monitoring #alerting #observability sought. I have a much loved collectd + riemann that needs an upgrade.

Target is about 10 servers and 200 jails.

No apache2 /php, nagios or clones thereof please. I don’t have these in my stack today, and my expertise in managing them is about 20 years out of date. I prefer to avoid JVM stuff but I’m not violently against it.

Doesn’t have to be in ports yet ( like sensu.io/ server) if it’s in a friendly language.


I recently wrote a #monitoring dashboard for myself, and figured I'll share the experience: asylum.madhouse-project.org/bl…

Code dump is at the bottom. The dashboard itself is a fairly straightforward #React app in truly awful JS. It's fed by #Riemann, which is itself fed by my own collector I wrote in #fennel.

The blog post isn't about the tools, but about the thought process that led me to have my dashboard look and function like it does. More monitoring stuff will follow, as time permits.