dch

5 months ago

dch
5 months ago

#FreeBSD recommendations for #monitoring #alerting #observability sought. I have a much loved collectd + riemann that needs an upgrade.

Target is about 10 servers and 200 jails.

No apache2 /php, nagios or clones thereof please. I don’t have these in my stack today, and my expertise in managing them is about 20 years out of date. I prefer to avoid JVM stuff but I’m not violently against it.

Doesn’t have to be in ports yet ( like sensu.io/ server) if it’s in a friendly language.

Sensu | Observability Pipeline

The Observability Pipeline that delivers monitoring as code on any cloud

^sensu.io

in reply to dch

feld

in reply to dch 5 months ago

I am not familiar with Sensu and should check that out, but I've been gravitating towards the "Prometheus/OpenTSDB, jam it all into time series somehow (VictoriaMetrics for low resource usage), use webhooks to issue the alerts configured in Grafana" method.

in reply to feld

dch

in reply to feld 5 months ago

@feld @maikel is a huge fan of sensu. We don’t have the server in FreeBSD ports but I poked it a bit about 18 months ago and it seems feasible to port it.

I also tried openobserve, port again seems doable but I get the feeling they are experimenting with licensing and business models still. I really want a simple store of logs that doesn’t have elastic/opensearch/mongo at the core.

@Maikel 🇪🇺 🇪🇸 @feld

in reply to dch

feld

in reply to dch 5 months ago

you want VictoriaLogs then -- simple single Go binary

victoriametrics.com/products/v…

VictoriaLogs: Scalable | Open Source | Logs DB & Logging Solution

VictoriaLogs is a fast and easy-to-use, open source logs solution. Highly scalable on cloud, kubernetes or on-premise setups.

^{VictoriaMetrics}

in reply to feld

dch

in reply to feld 5 months ago

@feld thanks that is exactly what I want. I can’t believe I’ve not seen this already. I actually like graylog a lot but it requires both mongodb for config data and *search for data and it’s an operational PITA to upgrade @maikel

@Maikel 🇪🇺 🇪🇸 @feld

in reply to dch

feld

in reply to dch 5 months ago

lots of people seem unaware of Victoria products, but they are simpler and better than what they're using... and often a drop-in replacement!

This entry was edited (5 months ago)

in reply to feld

ltning

in reply to feld 5 months ago

And the community edition is not crippled? I'm happy to pay (and pay well) for support but we *must* run publicly available and OSS-licensed code. No exceptions.

in reply to ltning

feld

in reply to ltning 5 months ago

paying them only gets you support, really. it's not crippled.

in reply to feld

Tom

in reply to feld 5 months ago

@feld @maikel Is VictoriaMetrics a competitor to Prometheus/Grafana?

@Maikel 🇪🇺 🇪🇸 @feld

in reply to Tom

feld

in reply to Tom 5 months ago

more or less a direct competitor to Prometheus, InfluxDB, OpenTSDB, Graphite, and Datadog

It implemented all of their APIs and has an improved storage design. Very lightweight and high performance.

Then you just use Grafana or whatever you want with VictoriaMetrics

in reply to feld

okapi

in reply to feld 5 months ago

@feld @maikel I recently started using #VictoriaLogs and have been impressed, at least compared to my previous usage of a central `syslogd` and `grep`. Needed a syslogd_flags containing `-O rfc5424` on BSD systems to prevent it mixing host and app names. A colleague installed Prometheus/Grafana some years back and while it produced pretty and entertaining graphs, none of it seemed actually useful and I've never seen decent use case examples for such metrics.

#victorialogs @Maikel 🇪🇺 🇪🇸 @feld

in reply to okapi

feld

in reply to okapi 5 months ago

It's not just about metrics themselves. You can just centralize all your stuff here. Most people use multiple systems: one for metrics (cpu % perhaps), and another for monitoring processes etc (is nginx running)

But you can just leverage a time series database to do both. You can use the systemd exporter for example to get the services into the time series database, then setup alert rules for those. Process is running if it has a value of 1, stopped or absent if value is 0 or null

You get a lot of power if you apply a little creativity

in reply to feld

Psyhackological

in reply to feld 5 months ago

@feld I have experience with Prometheus and AlertManager and I saw that Victoria Metrics and Logs can be more efficient and replace many things. Is it really just better 1:1 alternative?

@feld

in reply to Psyhackological

feld

in reply to Psyhackological 5 months ago

well, VictoriaMetrics supports:

- prometheus
- influxdb
- opentsdb
- graphite
- (maybe datadog? can't remember)
- their own API
- all of PromQL, plus some improvements to it that allow you to do certain queries that are literally impossible in normal PromQL

The client/agent side is a 7.5MB binary right now vs 100MB+ for Prometheus

It's faster, uses fewer resources, and has a far more efficient storage implementation (like 10:1 compression)

and it's basically a complete drop-in replacement. It supports almost all of the prometheus.yaml config file. There are a few things you may have to remove, but all your target configuration etc will still work as-is so you don't have to re-engineer anything

It's good. Someone sat down and said "we can do this better with less bloat" and they did it. And I hope they become filthy rich for it.

in reply to feld

Psyhackological

in reply to feld 5 months ago

@feld so to sum it feels like they saw Prometheus did it themselved and now it looks like a worse version of it.

I use at my job Prometheus, AlertManager and Grafana stack and I'm wondering if this can be done better and also at the same time if I understand correctly it also comes with more easily accessible logs? Sorry I'm trying to grasp it and I see you have experience with Victoria things.

@feld

in reply to Psyhackological

feld

in reply to Psyhackological 5 months ago

> looks like a worse version of it.

I'm not sure what you mean by this. Are you referring to my screenshot? That's just the basic web interface to Victoria Logs. There's also one for Victoria Metrics. But I use Grafana with Victoria. You just add it as a Prometheus data source and it works.

I have never used AlertManager, just the alerts+webhooks directly in Grafana. But Victoria does have their own AlertManager equivalent.

Also, I have hated every implementation of "Logs" in Grafana. Loki is terrible and the Grafana support is bad IMHO. But I *think* you can also use VictoriaLogs with all of those things in case you want alerts based on certain log events

in reply to feld

Tom

in reply to feld 5 months ago

@feld @psyhackological You've not seen terrible logs until you've had AWS CloudWatch ingest logs. The interface is atrocious to use!

@feld @Psyhackological

This entry was edited (5 months ago)

in reply to Tom

Psyhackological

in reply to Tom 5 months ago

@pertho @feld is that so? I was leaning slowly AWS stack but good to know.

@feld @Tom

in reply to Psyhackological

Tom

in reply to Psyhackological 5 months ago

@psyhackological I also added about 8 EC2 instances reporting metrics to CloudWatch and it cost me $200. So I dumped CloudWatch really fast. Do NOT recommend!
@feld @dch

@dch @feld @Psyhackological

This entry was edited (5 months ago)

in reply to Tom

feld

in reply to Tom 5 months ago

I see companies throwing millions at AWS services that would cost maybe tens of thousands to self-host, even if still on EC2

It drives me nuts

in reply to feld

dch

in reply to feld 5 months ago

@feld the big attraction for me with sensu, is that it uses an etcd backend so you don’t have a SPoF in monitoring. Other than that @maikel knows it far more than I.

@Maikel 🇪🇺 🇪🇸 @feld

in reply to dch

feld

in reply to dch 5 months ago

Victoria supports clustering to remove the SPoF and also horizontally scale I think?

I haven't used their clustering capabilities yet

docs.victoriametrics.com/victo…

VictoriaMetrics: Cluster version

Documentation for VictoriaMetrics, VictoriaLogs, Operator, Managed VictoriaMetrics and vmanomaly

^{docs.victoriametrics.com}

⇧

dch 5 months ago • •

dch
5 months ago