The Lens

Thanos extends Prometheus into a highly available, horizontally scalable monitoring system with long-term retention and cross-cluster querying. It sits on top of your existing Prometheus setup and adds what Prometheus deliberately left out.

You keep your Prometheus instances. Thanos adds a global query view across all of them, long-term storage in object storage (S3, GCS, Azure Blob), downsampling for historical queries, and high availability through deduplication. It's not a replacement; it's the scaling layer.

Apache 2.0, CNCF project. Production-proven at companies running hundreds of Prometheus instances.

The catch: Thanos adds real operational complexity. You're running Thanos Sidecar, Store Gateway, Compactor, Query Frontend. Each is a separate component to deploy and monitor. The irony of needing monitoring for your monitoring system is not lost on anyone. If you have fewer than 3 Prometheus instances, you probably don't need this yet.