Infra & metrics

Prometheus

open source / community (CNCF)

The de-facto standard pull-based metrics engine and time-series database for cloud-native and Kubernetes monitoring; everything else in the space orbits it.

Category
Infra & metrics
License
Open source
Deployment
Self-hosted
Cost
Free
Free tier
Yes
Self-host effort
Heavy
Maturity
Incumbent
Popularity
≈64k GitHub stars; category center of gravity

The catch

Single-node by design — no native HA or long-term storage — so any serious deployment becomes a 4-5 component stack (Alertmanager, Grafana, Thanos/Mimir, exporters) you assemble and operate yourself.

Monitors

MetricsServersK8sCloudNetwork

Protocols

PrometheusSNMP

Capabilities

AlertingAutodiscoveryAPIConfig as code

Built for

SRE / DevOpsEnterpriseHomelab

The honest take

Prometheus is the cloud-native default, and the single most useful thing I can tell you about it is that it’s the wrong thing to call “a product.” When someone says they “use Prometheus,” what they actually run is a stack: Prometheus itself for scraping and storage, Alertmanager for routing alerts, Grafana for dashboards, an exporter per thing you want to watch, and — the moment you need high availability or more than a few weeks of history — Thanos, Mimir or VictoriaMetrics bolted on behind it. Adopting Prometheus is adopting an architecture you operate, not installing a tool. That’s not a criticism; it’s the thing people underestimate, and the source of most “Prometheus is hard” complaints.

Where it’s genuinely the right call: anything Kubernetes or container-native. The pull model, service discovery and PromQL are the lingua franca of cloud-native monitoring for a reason — the ecosystem, the exporters and the community knowledge are unmatched, and config-as-code means your monitoring lives in Git like everything else. If you’re building on the standard, you’re building on the thing everything else integrates with.

Where it’s the wrong call: a traditional, SNMP-heavy network. Yes, snmp_exporter exists, and yes, it works — but it’s the clunky path, and you’ll spend your time fighting generator configs instead of monitoring. If you’re escaping SolarWinds and you’re not going cloud-native, Zabbix is the gentler landing; reach for Prometheus when the destination is Kubernetes, not just “something free.”

Two traps worth naming up front. The first is long-term storage: vanilla Prometheus is single-node and short-memoried by design — local retention is days-to-weeks, with no HA — so any serious deployment eventually grows a remote-storage backend, and that decision (Thanos vs Mimir vs VictoriaMetrics) is its own project. The second is cardinality: a single label with unbounded values (a user ID, a full URL, a pod name baked into a metric) can explode your time-series count and your memory bill overnight. Get your labels right and it’s cheap to run; get them wrong and you’ll learn what “cardinality” means the hard way.

The honest summary: free in license, expensive in ownership, and worth it when you’re cloud-native and treat operating the stack as work you want to own. If that sentence makes you tired, that’s useful information. Compare it head-to-head with Zabbix or managed Grafana Cloud before you commit.

Pricing in the real world

  • Software $0
  • Grafana Enterprise support optional $10-30k/yr

Free software; cost is engineering time + the surrounding stack.