Prometheus

Prometheus is an open-source monitoring and alerting system built for reliability, scalability, and flexibility — particularly in cloud-native and microservices environments such as those orchestrated by Kubernetes. It is part of the Cloud Native Computing Foundation (CNCF) ecosystem and is widely used for metrics-based observability.

Below is a detailed description of the Prometheus architecture, components, data model, alerting system, and integrations — especially how it is used for event monitoring and alerting.

1. Core Concept: Metrics-Based Monitoring

Prometheus focuses on metrics, not logs or traces.
Metrics are time-stamped numerical data that describe the state of systems over time — e.g.:

  • CPU utilization (node_cpu_seconds_total)
  • HTTP request latency (http_request_duration_seconds)
  • Number of running pods (kube_pod_running_total)

Each metric has:

  • Name: e.g. http_requests_total
  • Labels (key-value pairs): e.g. method="GET", status="200", instance="node1"

This structure forms a multi-dimensional time series.

2. Prometheus Architecture Overview

Prometheus is composed of several components working together:

ComponentDescription
Prometheus ServerCore component responsible for scraping metrics, storing time-series data, and evaluating alert rules.
ExportersAgents or services that expose metrics in Prometheus format (e.g. node_exporter, blackbox_exporter).
PushgatewayOptional component for short-lived jobs that cannot be scraped directly (e.g. batch jobs).
AlertmanagerHandles alerts sent by Prometheus — deduplicates, groups, routes, and sends notifications (e.g. Slack, PagerDuty, email).
Service DiscoveryAutomatically discovers targets via Kubernetes, Consul, EC2, etc.
Visualization ToolsUsually Grafana, which queries Prometheus via PromQL for dashboards.

3. How Prometheus Collects Data (Pull Model)

Prometheus pulls metrics from monitored targets via HTTP on a configurable interval.

Each target exposes a /metrics endpoint returning plaintext data like:

# HELP http_requests_total The total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET", status="200"} 12345
http_requests_total{method="POST", status="500"} 42

Prometheus scrapes these endpoints at a defined interval (e.g. every 15 seconds) and stores the resulting metrics in its time-series database (TSDB).

This “pull-based model” ensures:

  • Each Prometheus instance is autonomous
  • Simpler debugging (you can visit /metrics directly)
  • Better scalability and resilience (no centralized push bottleneck)

4. PromQL: The Query Language

PromQL (Prometheus Query Language) allows you to:

  • Select and aggregate metrics
  • Compute derived metrics
  • Filter by labels
  • Generate expressions for alerting or visualization

Example queries:

# Average CPU usage per node
avg(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance)

# Error rate for HTTP requests
rate(http_requests_total{status=~"5.."}[5m])

# Alert if high latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

5. Alerting in Prometheus

Prometheus has a two-stage alerting pipeline:

(a) Alerting Rules

Defined in YAML, Prometheus periodically evaluates alert expressions (PromQL queries) and triggers alerts when conditions are met.

Example (alerts.yml):

groups:
  - name: instance-health
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"
          description: "No data received from {{ $labels.instance }} for 2 minutes."
  • expr — PromQL expression evaluated on schedule
  • for — Duration the condition must be true before firing
  • labels — Metadata for routing and severity
  • annotations — Human-readable info for notifications

(b) Alertmanager

When alerts fire, Prometheus sends them to the Alertmanager via HTTP.

The Alertmanager:

  • Deduplicates repeating alerts
  • Groups related alerts (e.g., multiple instances down in one alert)
  • Silences alerts temporarily
  • Routes alerts to destinations (email, Slack, PagerDuty, Opsgenie, etc.)

Example Alertmanager config (alertmanager.yml):

route:
  receiver: "slack_alerts"
  group_by: ["alertname", "cluster"]

receivers:
  - name: "slack_alerts"
    slack_configs:
      - channel: "#ops-alerts"
        send_resolved: true

6. Prometheus Time-Series Database (TSDB)

Prometheus has a built-in TSDB optimized for fast ingestion and querying:

  • Data stored on local disk (default retention ~15 days)
  • Supports chunk compression for efficiency
  • Allows remote storage integrations (e.g., Thanos, Cortex, VictoriaMetrics) for long-term retention and scaling

7. Exporters (Metric Sources)

Prometheus itself only scrapes data — it depends on exporters to expose metrics. Common ones include:

ExporterPurpose
node_exporterOS-level metrics (CPU, memory, disk, network)
blackbox_exporterEndpoint probing (HTTP, TCP, ICMP)
cadvisor / kube-state-metricsKubernetes pod/node metrics
snmp_exporterNetwork devices via SNMP
mysqld_exporter, postgres_exporterDatabase metrics

Custom applications can expose /metrics directly using Prometheus client libraries (Go, Python, Java, etc.).

8. Event Monitoring Use Cases

Prometheus is used for event-based monitoring where metrics change signal important state transitions or anomalies:

Event TypeExample MetricPossible Alert
Service Downup == 0Instance down alert
High CPU/Memorynode_cpu_seconds_total, node_memory_MemAvailable_bytesResource exhaustion
Job Failurebatch_job_success_totalJob error spike
Network Issuenode_network_receive_errs_totalPacket loss
Latency Spikehttp_request_duration_secondsSLO violation

By continuously evaluating PromQL expressions, Prometheus detects these events in real time.

9. Integrations & Visualization

Prometheus integrates tightly with:

  • Grafana → Dashboards & visualization
  • Kubernetes → Auto-discovery of pods/services
  • Alertmanager → Notification and routing
  • Thanos / Cortex / VictoriaMetrics → Long-term storage and global view

Grafana queries Prometheus to show dashboards of time-series metrics and alert statuses — ideal for NOC or SRE teams.

10. Strengths & Limitations

Strengths

  • Simple and reliable architecture (no external dependencies)
  • Multi-dimensional data model (labels)
  • Rich query language (PromQL)
  • Powerful alerting with routing and deduplication
  • Excellent Kubernetes integration
  • Strong ecosystem and community

Limitations

  • Short-term storage (default local retention)
  • Not ideal for high-cardinality data (too many unique label combinations)
  • Limited support for push-based metrics (via Pushgateway workaround)
  • No built-in log or trace collection (use Tempo/Loki for full observability)

Summary

Prometheus is the backbone of modern observability stacks, providing:

  • Event monitoring via metrics scraping
  • Powerful query-based alerting
  • Flexible integrations with Grafana and Alertmanager

Together, Prometheus + Alertmanager + Grafana (the “PGA stack”) deliver a complete open-source monitoring and alerting solution that is reliable, extensible, and cloud-native.

Remote Storage Backends for Prometheus

Comparing and contrasting Prometheus remote storage backends: Thanos, Cortex, and VictoriaMetrics — focusing on architecture, scalability, long-term retention, operational complexity, and ecosystem fit.

Why Remote Storage?

Prometheus by default:

  • Stores metrics locally on disk.
  • Retains data typically for 15 days.
  • Is standalone — no horizontal scaling or global querying.

For production-grade, enterprise or AI/HPC-scale monitoring, teams need:

  • Long-term metric retention (months/years)
  • Cross-cluster/global querying
  • High availability and scalability
  • Object storage (S3/GCS/Azure Blob) for cost efficiency

That’s where Thanos, Cortex, and VictoriaMetrics come in — each extends Prometheus into a global, horizontally scalable, long-term metrics platform.

1. Thanos

Overview

Thanos is an open-source project by Improbable, now part of CNCF, that “wraps” Prometheus with additional components to achieve global, scalable, long-term storage.

It extends Prometheus using a sidecar model.

Architecture Components

ComponentFunction
SidecarRuns next to Prometheus, uploads blocks to object storage (S3/GCS), exposes Store API for queries.
Store GatewayReads historical data from object storage and serves via gRPC.
QuerierFederates multiple data sources (local Prometheus, Store Gateways) and provides a unified query view.
CompactorOptimizes data blocks (downsampling, deduplication).
RulerHandles alerting and recording rules globally.
ReceiveOptional: can ingest remote write data directly (turns Thanos into a central TSDB).

Storage

  • Object storage: Amazon S3, Google Cloud Storage, Azure Blob, Ceph, MinIO, etc.
  • Data retained forever at low cost.
  • Compaction & downsampling for efficiency.

Strengths

  • Seamless with existing Prometheus (non-intrusive, just add a sidecar)
  • Cost-efficient long-term storage on object stores
  • Global querying & deduplication across clusters
  • Native HA Prometheus support
  • Supports downsampling for faster queries on older data

Weaknesses

  • Complex multi-component deployment (many moving parts)
  • High operational overhead (especially Compactor, Querier, Store)
  • Query latency can increase for deep historical data
  • Harder to manage at massive multi-tenant scale

Best Fit

  • Kubernetes-heavy environments with multiple Prometheus instances
  • Organizations already using object storage (S3/GCS)
  • Teams wanting 100% Prometheus compatibility and federated global view

2. Cortex

Overview

Cortex is also a CNCF project, developed originally by Weaveworks and Grafana Labs.
It aims to provide a massively scalable, multi-tenant, Prometheus-compatible metrics platform — think “Prometheus-as-a-Service”.

It uses a microservices architecture and a chunk-based storage model.

Architecture Components

Cortex splits into multiple horizontally scalable services:

ComponentDescription
DistributorReceives remote writes from Prometheus and forwards to ingesters.
IngesterWrites incoming data to backend storage and handles queries for recent data.
QuerierReads data from both ingesters and long-term storage.
Store-Gateway(in newer architectures) for querying long-term blocks.
RulerGlobal alerting/rules evaluation.
Query-FrontendOptional layer for caching and parallelizing queries.
CompactorMerges small blocks to improve efficiency.

Storage

Cortex stores data in object stores (S3, GCS, Azure Blob) or databases like DynamoDB, Bigtable, Cassandra, depending on the mode.

Supports two modes:

  1. Chunks storage (older)
  2. Blocks storage (Thanos-like, newer and preferred)

Strengths

  • True multi-tenancy (strong isolation per tenant)
  • Horizontally scalable (microservices-based)
  • Integrated query caching, sharding, and parallelization
  • High availability and HA Prometheus ingestion
  • Enterprise-friendly; used by Grafana Cloud

Weaknesses

  • Operationally heavy (many microservices)
  • More complex than Thanos to run standalone
  • Requires coordination of storage and caching backends
  • Overhead of multi-tenancy if not needed

Best Fit

  • Large SaaS providers, enterprises, or cloud observability platforms
  • Multi-tenant environments (shared clusters per team/project)
  • Environments needing real-time ingestion and long-term storage
  • When operating at hundreds of Prometheus instances scale

3. VictoriaMetrics

Overview

VictoriaMetrics is a high-performance time-series database designed to be simpler, faster, and more resource-efficient than Prometheus + Thanos/Cortex.

It can act as:

  • A drop-in remote write storage for Prometheus
  • A Prometheus-compatible TSDB
  • A single-binary all-in-one system (cluster or single-node mode)

Architecture Components

ModeDescription
Single-nodeSimple setup, stores all data locally, ideal for small/medium environments.
Cluster modeScalable version with vmstorage, vminsert, and vmselect components.
vmagentLightweight agent replacing Prometheus remote write or scraping.
vmalertIntegrated alerting component (replacement for Prometheus + Alertmanager).

Storage

  • Optimized for local disk and object storage (via snapshots/backup).
  • Extremely efficient compression (10× smaller footprint than Prometheus).
  • Supports multi-year retention easily.

Strengths

  • Simplicity (can run as a single binary)
  • Blazing-fast ingestion and queries
  • Low resource consumption (RAM and CPU efficient)
  • Drop-in replacement for Prometheus remote write
  • Integrated alerting and scraping tools (vmalert, vmagent)
  • Supports PromQL and InfluxQL

Weaknesses

  • No built-in object storage integration for native block compaction (uses snapshots)
  • Not officially CNCF or vendor-neutral (single-vendor project)
  • Some PromQL functions differ slightly from Prometheus semantics
  • Ecosystem less standardized than Thanos/Cortex

Best Fit

  • Teams wanting simplicity and performance
  • On-prem or hybrid environments (HPC, research clusters)
  • Cost-sensitive deployments (less infra, smaller footprint)
  • Self-hosted single-tenant setups

4. Comparison Table

FeatureThanosCortexVictoriaMetrics
Project TypeCNCF (Open Source)CNCF (Open Source)Independent OSS (VictoriaMetrics Ltd)
Architecture StyleSidecar + ComponentsMicroservicesMonolithic (or light cluster)
ScalabilityHorizontal (federation)Full horizontal scaleVery efficient single-node or cluster
Multi-tenancyLimited (label-based)Native multi-tenantPartial (via label isolation)
Storage BackendObject stores (S3, GCS, etc.)Object + DBs (S3, DynamoDB, etc.)Local/Cloud disks (snapshots)
HA / DeduplicationYesYesYes
Query LayerFederated (Querier)Parallelized (Query-Frontend)Built-in
AlertingThanos RulerCortex Rulervmalert
Operational ComplexityModerate–HighHighLow
Best Use CaseMultiple Prometheus clustersMulti-tenant cloud platformLightweight, high-speed self-hosted
Example UsersRed Hat, Grafana OSS, CNCF usersGrafana Cloud, WeaveworksCloudflare, GitLab, Aiven, many HPCs

5. Summary Insight

NeedChoose
Keep your Prometheus instances but want global long-term viewThanos
Operate Prometheus as a multi-tenant SaaS platformCortex
Need simple, high-performance, low-cost storageVictoriaMetrics

6. Example Architectures

  • Thanos: Prometheus (sidecar) → Object Storage → Thanos Querier + Compactor → Grafana
  • Cortex: Prometheus (remote write) → Distributors → Ingester + Storage → Query Frontend → Grafana
  • VictoriaMetrics: Prometheus (remote write) → vmstorage (cluster) → vmselect → Grafana

Conclusion

All three — Thanos, Cortex, and VictoriaMetrics — extend Prometheus for long-term, scalable, reliable metric storage.

| Simplicity → Complexity |
| VictoriaMetrics 🠒 Thanos 🠒 Cortex |

  • Thanos excels for multi-cluster observability with minimal Prometheus disruption.
  • Cortex suits massive, multi-tenant, SaaS-grade deployments.
  • VictoriaMetrics wins on simplicity, efficiency, and performance for single-tenant or HPC-scale clusters.

Grafana Cortex/Mimir

Grafana Mimir implements long-term, horizontally scalable, multi-tenant storage for Prometheus metrics.

1. Cortex Overview

Cortex is a CNCF project (incubating) designed to run Prometheus at massive scale — providing:

  • Long-term, durable storage for metrics
  • Horizontal scalability (both ingestion and queries)
  • True multi-tenancy (per-tenant isolation and authentication)
  • Global querying across all tenants and Prometheus instances
  • High availability and deduplication of metrics

Grafana Labs is the largest maintainer and operator of Cortex, which powers Grafana Cloud Metrics.

2. Architectural Overview

Cortex is microservices-based, with each component independently scalable.
It follows a write path and read path design similar to a distributed database.

Core Components

ComponentFunction
DistributorReceives time series via Prometheus remote_write API, validates, sharding, and sends to Ingester(s).
IngesterBuffers and writes metrics to persistent storage (chunks or blocks). Handles recent data queries.
QuerierHandles read queries from Grafana or Prometheus federation. Fetches from both ingesters (recent) and object storage (historical).
Query FrontendCaches and parallelizes queries for performance.
CompactorMerges smaller blocks into larger ones, downsampling and optimizing storage.
RulerEvaluates alerting and recording rules at scale (multi-tenant).
Store GatewayServes data from long-term object storage to Querier (blocks mode).
AlertmanagerTenant-aware alert routing.
Auth & API GatewayEnforces tenant isolation and access control.

Data Flow Summary

[Prometheus/Agent] → Distributor → Ingester → Object Store (S3/GCS/DynamoDB)
                                      ↓
                                   Querier
                                      ↓
                                   Grafana UI

3. Storage Modes

Cortex supports two main storage engines (evolved over time):

Storage EngineDescriptionStatus
Chunks StorageOriginal Cortex format using key-value storesLegacy (deprecated)
Blocks StorageNewer Thanos-like format using TSDB blocksCurrent (preferred, default)

3.1 Chunks Storage (Legacy)

  • Each metric is split into chunks (typically 12-hour segments).
  • Stored in NoSQL backends (e.g. DynamoDB, Bigtable, Cassandra).
  • Index entries point to chunk locations.
  • Allowed fast ingestion but complex querying.

Problems:

  • Difficult to manage large indexes
  • Expensive and slow for long-term queries
  • High operational complexity (managing both chunks + index tables)

3.2 Blocks Storage (Current Standard)

Modern Cortex (and Grafana Cloud) now use Blocks Storage, modeled after Thanos’s format.

Key Concepts:

  • Metrics are grouped into TSDB blocks (2-hour data windows)
  • Blocks are uploaded to object storage (S3/GCS/Azure Blob)
  • Each block contains:
    • chunks/ → actual time series data
    • index → label/value metadata for querying
    • meta.json → metadata (time range, labels, compaction level)
    • tombstones → deleted series

Data Lifecycle:

  1. Ingesters hold in-memory TSDBs (short-term buffer)
  2. Every 2 hours → ingester flushes a block to object storage
  3. Compactor merges small blocks, removes duplicates, down-samples
  4. Store Gateway exposes blocks via Cortex Store API
  5. Querier merges data from:
    • Recent blocks (still in ingesters)
    • Historical blocks (in object storage)

4. Multi-Tenancy & Isolation

Cortex is designed as a multi-tenant system from the ground up.

Tenant Model

  • Each tenant (customer, team, environment) has a unique tenant ID
  • Tenant ID is passed in HTTP header:
    X-Scope-OrgID: <tenant-id>
  • All data in storage and memory is namespaced by tenant ID
  • Authentication and rate-limiting are per tenant

Storage Isolation

  • Each tenant’s blocks/chunks are stored in their own object store prefix: s3://cortex-data/<tenant-id>/blocks/<block-id>
  • Queries are filtered by tenant ID to prevent cross-tenant visibility.

5. Query Path Internals

When Grafana (or Prometheus via federation) issues a query:

  1. Request hits Query Frontend
    • Splits long-range queries into smaller time shards
    • Applies caching and parallel execution
  2. Querier fetches:
    • Fresh data from ingesters
    • Historical data from Store Gateways (reading object storage)
  3. Results are merged, deduplicated, and returned.

Performance Features

  • Query result caching (Redis, Memcached)
  • Sharding and parallelism (per time range)
  • Chunk and index caching (object store acceleration)
  • Downsampling of historical data (via Compactor)

6. Data Storage Details

Object Storage Layer

Cortex relies on cloud-native object stores:

  • AWS S3
  • Google Cloud Storage
  • Azure Blob Storage
  • MinIO or Ceph (self-hosted)

Metadata Storage

  • Cortex stores block metadata in a key-value database:
    • DynamoDB, Bigtable, or Consul (depending on deployment)
    • Used for block discovery and tenant indexing

7. Operational Behavior in Grafana Cloud

Grafana Cloud’s Metrics service (built on Cortex) adds:

  • Centralized auth via Grafana Cloud’s identity system
  • Clustered distributors and ingesters running across regions
  • Compaction + retention policies per customer tier
  • Replication factor 3 for HA ingestion
  • Query Federation across multiple backends for global metrics

Grafana Labs operates tens of Cortex clusters, each serving thousands of tenants, with billions of active time series.

8. Scaling Characteristics

Scaling DimensionCortex Behavior
IngestionScale horizontally by adding Distributors/Ingesters
QueryingScale horizontally by adding Queriers and Frontends
StorageObject storage scales infinitely
Multi-tenancyStrong isolation, per-tenant quotas, authentication
HA PrometheusDeduplication based on external labels

Cortex can handle millions of time series per second ingestion, depending on cluster sizing.

9. Comparison to Thanos

FeatureThanosCortex (Grafana)
ArchitectureSidecar + ComponentsMicroservices
DeploymentAdd-on to PrometheusCentralized system
Multi-tenancyBasic (labels)Strong tenant isolation
Storage FormatTSDB blocksTSDB blocks
ScalingFederation (per Prometheus)Horizontal (microservices)
Operational ComplexityModerateHigh (many components)
Query PerformanceGood for few clustersExcellent at large scale
Use CaseMultiple Prometheus clustersSaaS metrics platform

Grafana Labs actually contributes to both Thanos and Cortex, but:

  • Grafana Cloud (Metrics) uses Cortex
  • Grafana Enterprise Metrics (GEM) is based on Cortex blocks storage

10. Summary Table

CategoryDescription
Data ModelPrometheus TSDB (blocks)
Write PathPrometheus → Distributor → Ingester → Object Store
Read PathQuerier ← Ingester + Store Gateway
Storage BackendObject stores (S3/GCS/Azure)
Index StorageDynamoDB / Bigtable / etc.
RetentionConfigurable (months/years)
Multi-TenancyBuilt-in (per-tenant isolation, quotas, auth)
AlertingCortex Ruler (multi-tenant)
ScalingHorizontal for ingestion, querying, and compaction
Used ByGrafana Cloud Metrics, Grafana Enterprise Metrics

11. Key Advantages in Grafana Cortex Implementation

  • Multi-tenant, horizontally scalable long-term Prometheus storage
  • Supports billions of active time series
  • Integrated caching and query acceleration
  • Full compatibility with PromQL
  • Cloud-optimized (object storage, stateless microservices)
  • Battle-tested in Grafana Cloud production

12. Typical Production Architecture (Grafana Cloud Example)

                   ┌───────────────────────────┐
                   │     Prometheus Agents     │
                   └────────────┬──────────────┘
                                │ (remote_write)
                                ▼
                         ┌────────────┐
                         │ Distributor│
                         └────┬───────┘
                              │ Sharded writes
                              ▼
                        ┌────────────┐
                        │ Ingester   │
                        └────┬───────┘
                             │ Flushes blocks
                             ▼
               ┌────────────────────────────────┐
               │  Object Store (S3 / GCS / etc.)│
               └────────────────────────────────┘
                             ▲
                             │
                     ┌───────┴───────────┐
                     │ Store Gateway     │
                     │ Compactor / Ruler │
                     └────────┬──────────┘
                              │
                              ▼
                        ┌────────────┐
                        │ Querier    │
                        │ + Frontend │
                        └────┬───────┘
                             │
                             ▼
                         ┌──────────┐
                         │ Grafana  │
                         └──────────┘

In summary:

Grafana’s Cortex storage implementation transforms Prometheus into a cloud-native, multi-tenant, long-term metrics platform.
It uses block-based TSDB storage in object stores, horizontally scaled microservices for ingestion and querying, and per-tenant isolation — enabling Grafana Cloud to store and query years of metrics data from millions of Prometheus series efficiently.

Compare Grafana Mimir vs VictoriaMetrics

For a 5 million+ active-series Prometheus workload and give concrete, cited estimates for performance (ingest/query), storage (1-year retention), ops complexity, and rough cloud costs.

1) Assumptions

(If your environment differs, the numbers will change; I tried to pick realistic defaults.)

  • Active series: 5,000,000 (user’s request)
  • Scrape interval: 15 seconds (Prometheus default; I also note differences for 60s)
  • Retention: 1 year (typical LTR)
  • Query load: not heavily specified — I show ingestion-side sizing and give guidance for query CPU separately.
  • Cloud object-store price: Amazon S3 Standard ≈ $0.023 / GB / month (first tier) — used for storage cost calc. Grafana Labs+1

2) Raw numbers & calculations (full detail)

2.1 Sample-rate math (common intermediate)

  • Seconds per year = 365 × 24 × 3600 = 31,536,000 s
  • Samples per series per year @15s = 31,536,000 ÷ 15 = 2,102,400 samples / series / year.
  • Total samples / year for 5M series = 5,000,000 × 2,102,400 = 10,512,000,000,000 samples / year.

(If you use 60s scrape: samples/series/year = 525,600; multiply accordingly.)

2.2 Storage — VictoriaMetrics (practical vendor numbers)

VictoriaMetrics publishes realistic compression numbers from production-like workloads: users and Managed VictoriaMetrics report ~0.4–0.6 bytes/sample for real-world metrics (node_exporter-type workloads), and Managed VM’s public benchmark shows ~0.6 bytes/sample at scale. Using that range is realistic for production data (it depends on value precision and series entropy). Medium+1

Estimates (5M series, 15s, 1 year):

  • Using 0.6 bytes/sample → total bytes = 10,512,000,000,000 × 0.6 = 6.3072×10¹² bytes5.74 TiB (~5,874 GiB).
  • Using 0.5 bytes/sample → ≈ 4.78 TiB.
  • Using 0.4 bytes/sample → ≈ 3.82 TiB.

(Computation shown above; I used VictoriaMetrics’ published bytes/sample values.) VictoriaMetrics+1

Monthly S3 storage cost (if you snapshot/backup that data to S3 rather than keeping local disks):

  • 5.74 TiB ≈ 5,874 GiB × $0.023 ≈ $135 / month (≈ $1,620 / year) (S3 Standard first-tier rate). Amazon Web Services, Inc.

Note: VictoriaMetrics often stores data on local disks (vmstorage) and offers snapshot/backup utilities to object stores; many users keep VM on local disks (cheaper for active queries) and only archive snapshots to S3. VictoriaMetrics+1

2.3 Storage — Grafana Mimir (official Grafana sizing guidance)

Grafana’s Mimir docs provide practical capacity rules-of-thumb for object-store-backed block storage. Grafana’s store-gateway disk estimate (their documented example) is:

  • 13 GB disk per 1 million active series (this is derived in their docs assuming: 2 bytes per sample for compacted blocks, 15s scrape, 1-year retention, store-gateway replication factor 3 — see Grafana doc for exact assumptions). → 13 GB / 1M series / year. Grafana Labs

Using that Grafana figure:

  • For 5M active series → 13 GB × 5 = 65 GB total object-store block footprint (for 1 year retention under the doc assumptions). Grafana Labs

Monthly S3 storage cost (if you use S3 for Mimir blocks):

Important caveat (read carefully): Grafana’s “13 GB per 1M series” estimate is based on specific* compaction/downsampling and index assumptions described in their docs** (they assume 2 bytes per sample for compacted blocks and other compression parameters). That value is much lower than raw-bytes-per-sample estimates (like VictoriaMetrics’ bytes/sample × samples/year). The discrepancy is because the two vendors use different post-compaction assumptions, index footprints, and accounting methods. In practice you should validate with your own dataset (label lengths, cardinality, histogram buckets and exemplars all change the outcome). Grafana Labs+1

3) Ingestion & CPU/memory estimates (sizing the ingestion path)

We must size how much CPU/memory to ingest the 5M series at 15s.

Samples-per-second:

  • Samples/sec = active_series ÷ scrape_interval = 5,000,000 ÷ 15 ≈ 333,333 samples / second.

3.1 Grafana Mimir ingestion sizing (use Grafana guidance)

Grafana Mimir docs give component scaling rules:

  • Distributor: ~1 CPU and 1 GB RAM per 25,000 samples/sec. → Distributor CPUs ≈ 333,333 ÷ 25,000 ≈ ~14 cores and ~14 GB RAM. Grafana Labs
  • Ingester memory/disk: docs say ~2.5 GB RAM + 5 GB disk per 300,000 in-memory series (useful for estimating how many ingesters you need; ingesters also have replication factor). Grafana Labs

So for the ingestion path in Mimir you’d typically run dozens of cores across distributors/ingesters for 5M series (depends on replication factor and how many series are “hot” in memory), plus Store-Gateway / Compactor resources for long-term blocks. The Mimir docs also recommend running one compactor per ~20M active series (compactor sizing also consumes disk). Grafana Labs

3.2 VictoriaMetrics ingestion sizing (practical observed numbers)

VictoriaMetrics publishes practical benchmark/managed numbers:

  • Managed VM benchmark: 2 vCPU + 4 GB RAM handled ~1M active series and ~211k samples/sec in their example. From that figure, scaling linearly (approx) gives for 5M series (~1.055M samples/sec) roughly ~10 vCPU + 20 GB RAM for ingestion (this is an extrapolation — VM scales well vertically and horizontally). VM also offers cluster mode (vminsert/vmstorage/vmselect) for HA and scale. VictoriaMetrics+1

VictoriaMetrics tends to need less memory per series than vanilla Prometheus and has very high single-node ingestion capability; however, for 5M series you will likely run cluster mode or beefy single-node hardware (fast NVMe). VictoriaMetrics+1

4) Query performance & sizing notes

  • Mimir: Query path is microservice-based (query-frontend, querier, store-gateway). Grafana’s guidance: 1 CPU + 1 GB per ~10 queries/sec on the querier, and store-gateway CPU/memory per queries is also given. Mimir is designed to scale horizontally for large query loads and to reduce index lookups; it’s optimized for large, multi-tenant query traffic. Grafana Labs
  • VictoriaMetrics: vmselect + vmstorage provide fast PromQL query responses; Managed VM examples show median query latencies of ~14 ms for certain workloads in benchmarks. In practice, VM often returns faster query latencies for many ad-hoc PromQL queries (depends on query complexity and whether you’re querying many historical blocks). VictoriaMetrics

Practical rule: if you have heavy dashboard/query workloads (many concurrent dashboards, large range queries), Mimir’s horizontal query fanout and caching may be preferable at very large multi-tenant scale; for simpler or smaller-tenant setups VM often wins for raw query latency and simplicity.

5) Operational complexity (ops) — what you’ll run & manage

Grafana Mimir

  • Many services: distributors, ingesters, queriers, query-frontend, store-gateway, compactor, ruler, etc. (microservices architecture). Grafana Labs
  • Requires object store (S3/GCS/Azure) for blocks + caches (memcached/Redis) for query acceleration. Grafana Labs
  • More moving parts → more operational overhead (service orchestration, scaling policies, compactor tuning, caches, RBAC/tenant auth). Grafana provides production runbooks, but running at scale requires platform engineering effort. Grafana Labs+1

VictoriaMetrics

  • Simple deployment options: single binary for single-node, or small set of components (vminsert, vmstorage, vmselect) in cluster mode. Fewer components to operate. VictoriaMetrics Docs+1
  • Local-disk oriented (vmstorage), optional snapshot/backup to object storage — fewer spinner components like compactor/store-gateways vs. Mimir.
  • Lower ops cost for one-tenant or straightforward multi-tenant needs, but if you need very strong tenant isolation at SaaS scale, you may still choose to run multiple clusters or more orchestration.

6) Cost comparison (storage + infra) — rough, order-of-magnitude

Important: these are illustrative numbers — final numbers depend on your cloud pricing, instance types, IO profile, data cardinality (label sizes), and query load. I show one retention+storage cost axis and a rough compute estimate for ingestion CPUs.

Storage (1 year retention, 5M series, 15s scrape)

  • Grafana Mimir (Grafana doc estimate): ~65 GB total → S3 cost ≈ $1.5 / month (~$18 / year). Grafana Labs+1
  • VictoriaMetrics (0.6 bytes/sample): ~5.74 TiB (≈ 5,874 GiB) → S3 cost ≈ $135 / month (≈ $1,620 / year). VictoriaMetrics+1

Why such a big difference? Because Grafana’s Mimir estimates reflect their block-compaction and index assumptions (their doc used 2 bytes/sample post-compaction + other factors and produced 13GB per 1M series). VictoriaMetrics reports bytes/sample numbers from real ingestion benchmarks (0.4–0.6 bytes/sample) and tends to store more raw data locally (or snapshot to object store) — different approaches to compaction/encoding and where you pay (object store vs local disk) produce the variance. Don’t treat either single number as gospel — benchmark with your data. Grafana Labs+1

Compute (ingest path) — rough cores

  • Mimir distributors: ~14 cores + 14 GB RAM (based on 333k sps and 1 core per 25k sps). Add ingesters and replication overhead. Grafana Labs
  • VictoriaMetrics extrapolation**: Managed example shows 2 vCPU/4GB handled 211k sps (1M series). Extrapolate to 5M → ~10 vCPU + 20 GB RAM (rough linear scaling). Use cluster mode for HA. VictoriaMetrics

Rough cloud compute cost (very approximate): if you run 10–20 vCPUs continuously (VM/EC2 instances), cost could be hundreds of $/month depending on instance types and zones; Mimir’s multi-component cluster for high availability will typically cost more in compute/ops than a compact VM setup — but Mimir’s overall storage cost (object store) can be much lower depending on the assumptions they used. You should factor the cost of S3 requests and compaction IO for Mimir as well.

7) Practical interpretation & recommendations

  1. If you are a SaaS provider or need strong multi-tenant isolation and global aggregation (hundreds of tenants, many queries):
    • Grafana Mimir is purpose-built for this. It scales horizontally and is what Grafana Cloud / Grafana Enterprise Metrics uses in production for massive scales. Expect higher ops complexity but excellent multi-tenant controls and global querying. Grafana Labs+1
  2. If you are a single tenant, self-hosted, want fastest ingestion and simplest ops at lower infra/engineering overhead:
    • VictoriaMetrics likely produces lower operational overhead and excellent ingestion/query performance; it also compresses very well for many real workloads (0.4–0.6 bytes/sample) so raw disk usage is low (but you must choose whether to keep data on local disks or snapshot to object storage). VictoriaMetrics+1
  3. Be very careful about vendor numbers:
    • The Grafana Mimir “13 GB per 1M series” rule-of-thumb is explicit in Grafana’s docs and is calculated from certain compaction assumptions — but it can look “too good” compared to per-sample bytes metrics from other projects. The two approaches measure different things (compacted object-block footprint vs bytes/sample timeseries accounting). Do your own small-scale test with your actual metrics (label set, histogram buckets, exemplar use, and scrape intervals). Grafana Labs+1
  4. If cost is the only driver: compare total cost = compute + storage + ops for your workload. Mimir may save object storage costs dramatically (per Grafana doc) but will cost more in compute and ops; VM will cost more raw storage (if you send snapshots to S3) but requires less orchestration.

8) Quick TL;DR table (5M active series, 15s scrapes, 1y retention)

AspectGrafana Mimir (doc-based)VictoriaMetrics (practical benchmarks)
Estimated storage (1y)~65 GB (13 GB per 1M × 5) — Grafana doc assumptions. Grafana Labs~3.8–5.7 TiB (0.4–0.6 bytes/sample) → ~3.8–5.7 TiB (~3,916–5,874 GiB). Medium+1
S3 storage cost (monthly)~$1.5 / month (~$18 / year). Amazon Web Services, Inc.~$135 / month (~$1,620 / year) at $0.023/GB. Amazon Web Services, Inc.
Ingest CPU (rough)Distributor ~14 cores (+ ingesters/quasi many components). Grafana LabsExtrapolated ~10 vCPU (based on 2vCPU handling 1M series @211k sps in Managed VM benchmark). VictoriaMetrics
Ops complexityHigh — many microservices, object-store glue, caching, compactor tuning. Grafana LabsLow–Medium — single binary or small set of services; fewer moving parts. VictoriaMetrics Docs
Best forMassive multi-tenant SaaS, global queries, minimal per-tenant storage costHigh-performance single-tenant/cluster use, simple ops, efficient compression

9) Final notes & recommended next steps

  • Run a short benchmark with real data shape. I strongly recommend a 1–2 day trial ingest of a representative sample of your metrics (full label cardinality and histogram usage). That will remove uncertainty — both Mimir and VictoriaMetrics recommend testing with your real data. (Grafana provides capacity planning docs; VictoriaMetrics provides benchmark tooling.) Grafana Labs+1
  • Decide where you want the cost: object storage (cheap, scalable) vs local disks (fast, but you manage disk). Mimir leans on object storage and microservices; VM is disk-first with snapshot capabilities. Grafana Labs+1
  • If you want, I can produce a one-page cost model showing compute + object storage + ops estimates for your exact scrape interval, retention, and expected queries (I’ll use the same method above and you can tweak scrape/retention). I can also generate the exact commands for a 24–48h benchmark you can run (prometheus-bench or vmagent-based generator) and a small script that computes the storage/costs from the measured bytes/sample.

Sources / further reading (key items I used)

  • Grafana Mimir — planning capacity & component sizing (store-gateway disk & compactor guidance). Grafana Labs
  • VictoriaMetrics — Managed benchmark & bytes/sample guidance (0.6 bytes/sample example + ingestion/query results). VictoriaMetrics
  • VictoriaMetrics compression discussion / real-world bytes/sample explanation. Medium
  • VictoriaMetrics docs (single-node/cluster modes). VictoriaMetrics Docs
  • S3 Standard pricing (used $0.023 / GB / month). Amazon Web Services, Inc.