Observability / APM

Grafana Cloud / LGTM

Grafana Labs

The de-facto open-source observability stack — best-of-breed components for metrics/logs/traces unified under Grafana dashboards; adopt incrementally.

Category
Observability / APM
License
Open core
Deployment
SaaS or self-hosted
Cost
Low
Free tier
Yes
Self-host effort
Heavy
Maturity
Incumbent
Popularity
Gartner MQ Leader 2025; LGTM ubiquitous

The catch

It's a stack of separate systems, not one product — metrics cardinality is the silent budget-killer (a misconfigured K8s label can be $1k+/mo), and self-hosting LGTM well needs real distributed-systems expertise.

Monitors

ServersMetricsLogsTracesSyntheticsK8sCloudProfiling

Capabilities

AlertingDashboardsDistributed tracingSLO trackingAPIRBAC

Built for

SRE / DevOpsEnterpriseHomelab

The honest take

Grafana Cloud is best understood as “the LGTM stack, operated for you.” LGTM — Loki for logs, Grafana for dashboards, Tempo for traces, Mimir for metrics (plus Pyroscope for profiling) — is the de-facto open-source observability stack, and its great virtue is that it’s best-of-breed and incremental: you can adopt one signal at a time, keep your existing Prometheus and just ship metrics to Mimir, and never do a rip-and-replace. The free tier is real (10k series, 50 GB per signal, 3 users), the dashboards are the ones your team already knows, and you’re building on open standards rather than a proprietary platform. For a lot of teams that’s the ideal “leave Datadog incrementally” path.

The honest catch is in the name: it’s a stack, not a product. Even managed, you’re reasoning about four separate systems with four separate meters, and the abstraction leaks the moment something costs more than you expected. Which brings us to the thing that actually bites people — metrics cardinality. Each backend bills on its own usage, and Mimir bills on active series; a single misconfigured Kubernetes label (a pod name, a request ID, anything unbounded) can multiply your series count and turn into a four-figure monthly line item without anyone deploying “more monitoring.” The silent budget-killer here isn’t the list price, it’s a label you didn’t think about.

Self-hosting the same stack is the other fork in the road, and it’s not a small one. Running LGTM well — sharding Mimir, tuning Loki’s label strategy, operating object storage, handling upgrades across four systems — needs genuine distributed-systems expertise. “Free” relocates the cost from a subscription to a senior engineer’s time, and for most teams under a certain size, Grafana Cloud’s managed tiers are the cheaper total cost once you price the person.

So: Grafana Cloud if you want the open-source stack without operating it and you’ll watch your cardinality; self-hosted LGTM if you have the platform expertise and the scale to justify it. Either way it’s a strong, standards-based answer. Worth comparing directly against the all-in-one SigNoz and against running Prometheus yourself.

First-hand data

data as of Jun 24, 2026

Significant incidents · 90d
434 critical · 21 major · 18 minor
Incident-minutes logged
27.4 dayscumulative, not downtime
Last incident
Jun 22, 2026

Polled first-hand from each vendor's public status page & GitHub. "Significant" excludes informational notices & planned maintenance; incident-minutes sum per-incident durations (not platform downtime). Method & full data →