Observability / APM
Grafana Cloud / LGTM
Grafana Labs
The de-facto open-source observability stack — best-of-breed components for metrics/logs/traces unified under Grafana dashboards; adopt incrementally.
- Category
- Observability / APM
- License
- Open core
- Deployment
- SaaS or self-hosted
- Cost
- Low
- Free tier
- Yes
- Self-host effort
- Heavy
- Maturity
- Incumbent
- Popularity
- Gartner MQ Leader 2025; LGTM ubiquitous
The catch
It's a stack of separate systems, not one product — metrics cardinality is the silent budget-killer (a misconfigured K8s label can be $1k+/mo), and self-hosting LGTM well needs real distributed-systems expertise.
Monitors
Capabilities
Built for
The honest take
Grafana Cloud is best understood as “the LGTM stack, operated for you.” LGTM — Loki for logs, Grafana for dashboards, Tempo for traces, Mimir for metrics (plus Pyroscope for profiling) — is the de-facto open-source observability stack, and its great virtue is that it’s best-of-breed and incremental: you can adopt one signal at a time, keep your existing Prometheus and just ship metrics to Mimir, and never do a rip-and-replace. The free tier is real (10k series, 50 GB per signal, 3 users), the dashboards are the ones your team already knows, and you’re building on open standards rather than a proprietary platform. For a lot of teams that’s the ideal “leave Datadog incrementally” path.
The honest catch is in the name: it’s a stack, not a product. Even managed, you’re reasoning about four separate systems with four separate meters, and the abstraction leaks the moment something costs more than you expected. Which brings us to the thing that actually bites people — metrics cardinality. Each backend bills on its own usage, and Mimir bills on active series; a single misconfigured Kubernetes label (a pod name, a request ID, anything unbounded) can multiply your series count and turn into a four-figure monthly line item without anyone deploying “more monitoring.” The silent budget-killer here isn’t the list price, it’s a label you didn’t think about.
Self-hosting the same stack is the other fork in the road, and it’s not a small one. Running LGTM well — sharding Mimir, tuning Loki’s label strategy, operating object storage, handling upgrades across four systems — needs genuine distributed-systems expertise. “Free” relocates the cost from a subscription to a senior engineer’s time, and for most teams under a certain size, Grafana Cloud’s managed tiers are the cheaper total cost once you price the person.
So: Grafana Cloud if you want the open-source stack without operating it and you’ll watch your cardinality; self-hosted LGTM if you have the platform expertise and the scale to justify it. Either way it’s a strong, standards-based answer. Worth comparing directly against the all-in-one SigNoz and against running Prometheus yourself.
First-hand data
data as of Jun 24, 2026
- Significant incidents · 90d
- 434 critical · 21 major · 18 minor
- Incident-minutes logged
- 27.4 dayscumulative, not downtime
- Last incident
- Jun 22, 2026
Polled first-hand from each vendor's public status page & GitHub. "Significant" excludes informational notices & planned maintenance; incident-minutes sum per-incident durations (not platform downtime). Method & full data →