OpenTelemetry Collector — file_storage persistent queue during a backend outage

Two-tier queue: in-memory sending_queue in front of a persistent file_storage extension on a PVC — how queue_size (15 000 vs 25 000) and block_on_overflow (false / true) change what happens when the aggregator is unreachable

How the two-tier queue works
The exporter still has a small in-memory sending_queue for worker handoff; behind it, the file_storage extension persists every enqueued batch to a PVC before the producer is acknowledged.

Write path: producer → WAL on disk → ack. Read path: workers consume from disk, attempt export, ack + delete on success, retry on failure.

Two knobs set behaviour during a backend outage:

queue_size — capacity in batches (here 15 000 or 25 000). At 10 batches/s that's ~25 min or ~42 min of buffer.
block_on_overflow — when disk queue is full, drop (false) or block caller (true).

Key benefit vs in-memory: on Collector restart the queue persists. Work already on disk resumes draining when the backend recovers.
Telemetry Sources Applications Pods Host Metrics Logs Traces OTLP (gRPC/HTTP) Son Testing K8s Cluster (Source — Telemetry Producers) OpenTelemetry Collector (single pipeline · sending_queue + file_storage persistent extension) INGRESS PIPELINE (Receivers) otlp receiver Receives Traces, Metrics & Logs Processors → Exporter PROCESSING Batching · Resource Detection · Transformation OTLP EXPORTER · two-tier queue (in-memory queue hands off to file_storage; workers drain disk to backend) otlp exporter sending_queue (memory · handoff) file_storage (PVC) tail head queue_size: 15k / 25k persists on restart Persistent Volume Claim (/var/lib /otelcol) ~750 MB / 15k ~1.25 GB / 25k workers drain retry_on_failure exp. backoff · max_elapsed_time = ∞ otelcol_exporter_queue_size / _capacity otelcol_exporter_enqueue_failed_* Observability Dev K8s Cluster (Observability Platform) Otel Aggregator Service (OTLP Receiver + Processing) OTLP Receiver (ingest from source collector) — unreachable during outage Processing Batching · Resource Detection Transformation · Routing Mimir (Metrics) Loki (Logs) Tempo (Traces) OTLP export (gRPC/HTTP) Backend outage retries fail, file queue fills Important Notes
  • file_storage is a Collector extension; attach it to the exporter's sending_queue.storage.
  • Queue survives restarts & crashes. On startup, workers resume from where they left off.
  • Every enqueue is a disk write — expect IOPS & fsync cost. Use a fast PVC class; avoid network storage with unpredictable latency.
  • Size the PVC 1.5–2× the batch-count budget for compaction headroom. Monitor kubelet_volume_stats_used_bytes.
  • If PVC fills, enqueue fails the same way queue_size limits do — block_on_overflow still decides drop vs block.
  • Single-writer by design — bind to a StatefulSet, not a Deployment; one Pod per PVC.
  • Backpressure still possible at the memory tier in front of it; the memory queue is small and just a handoff buffer.
1 NORMAL OPERATION (backend healthy)
  • Batches are written to the WAL, then drained immediately by workers; the PVC holds only a small working set.
  • Disk queue depth hovers near zero. Write amplification is roughly 1× enqueue-write + 1× delete-marker per batch.
  • queue_size & block_on_overflow have no observable effect.
file_storage depth ~10 / 25 000 Ingest sending_queue handoff file_storage near-empty Backend ✓
Result: Steady state. Disk writes are absorbed; no drops; tiny PVC footprint.
2 queue_size = 15 000 · block_on_overflow = false buffers ~25 min of outage, then drops
  • Backend unreachable; retries fail; every batch persists to disk. Queue climbs at 10 batches/s — full in ~1 500 s (~25 min).
  • After overflow, enqueue returns failure and newest batches are dropped at the exporter; enqueue_failed_* climbs. Already-queued data still waits on disk.
  • PVC footprint ≈ 750 MB (15 000 × ~50 KB).
file_storage depth (queue_size = 15 000) ~25 min to full @ 10/s Ingest sending_queue handoff file_storage FULL (15 000) dropped enqueue_failed++ Backend ✕
Result: Newest telemetry lost after 25 min; older queued data survives & will flush when backend returns.
3 queue_size = 15 000 · block_on_overflow = true buffers ~25 min, then blocks the caller
  • Same fill behaviour — full in ~25 min at 10 batches/s.
  • Once full, the exporter blocks until the backend drains an item from disk. Backpressure propagates: processors pause → receiver slows → OTLP clients see timeouts and retry or buffer themselves.
  • Good when producers can hold data; risky when upstream has no buffer of its own.
file_storage depth (queue_size = 15 000) ~25 min — then stalls Ingest backpressure sending_queue caller waits file_storage FULL — blocked Backend ✕
Result: No exporter drops; loss moves upstream to OTLP clients. PVC still ~750 MB.
4 queue_size = 25 000 · block_on_overflow = false buffers ~42 min of outage, then drops
  • Same mechanics as scenario 2 — bigger headroom. Queue fills in ~2 500 s (~42 min) at 10 batches/s.
  • After overflow, new batches are dropped at the exporter; older batches on disk continue waiting and will flush when backend returns.
  • PVC footprint ≈ 1.25 GB (25 000 × ~50 KB) — plus compaction headroom.
file_storage depth (queue_size = 25 000) ~42 min to full @ 10/s Ingest sending_queue handoff file_storage FULL (25 000) dropped enqueue_failed++ Backend ✕
Result: Bigger outage budget (42 min) for the cost of 1.25 GB disk; past that, drops resume.
5 queue_size = 25 000 · block_on_overflow = true buffers ~42 min, then blocks the caller
  • Same fill curve — full in ~42 min. This is the strongest durability configuration without sharding.
  • Once full, the exporter blocks; upstream sees timeouts. If outage outlasts ~42 min and clients can't hold data, loss happens at producers.
  • Risk: PVC must comfortably hold > 1.25 GB — watch for compaction and fsync pressure.
file_storage depth (queue_size = 25 000) ~42 min — then stalls Ingest backpressure sending_queue caller waits file_storage FULL — blocked Backend ✕
Result: Maximum durability; collector stays bounded; loss pushed to producers after ~42 min.
6 RESTART DURING OUTAGE — QUEUE SURVIVES (the core reason to use file_storage)
  • Collector Pod crashes / is rolled / is OOM-killed mid-outage. In-memory queues would lose everything buffered.
  • With file_storage, the PVC is re-mounted on the new Pod. Exporter reads WAL head offset and resumes draining the same batches.
  • Only loss: anything in the small memory queue at the moment of crash (handoff window).
file_storage depth — before & after restart before 12 000 batches on disk Pod restart · PVC re-attached after ~12 000 batches preserved new Pod file_storage (PVC) workers resume Backend (when up)
Result: Near-zero loss across restarts. The defining advantage over the in-memory queue.
Behaviour matrix — in-memory only vs file_storage persistent queue, during a backend outage Assumes steady ingest of 10 batches/s, ~50 KB per batch, default retry_on_failure. Times and disk sizes are illustrative — real throughput depends on batch size and processor load.
queue_size block_on_overflow time to fill
(@ 10 batches/s)
on overflow effect upstream
(receiver / client)
survives restart? storage cost trade-off · when to choose
sending_queue in-memory only (no file_storage)
250 false ~25 seconds Exporter drops incoming batches; enqueue_failed_* increments. None — ingest continues at full rate; loss invisible to producers. no — RAM-only ~250 batches in RAM (~12 MB) Cheap, OOM-safe, tiny outage budget. OK when SDK retry is strong.
250 true ~25 s — then stalls Caller blocks; no exporter-side drops. Backpressure ~25 s in; OTLP clients see timeouts. no — RAM-only ~250 batches in RAM (~12 MB) Push loss upstream fast; use when producers can buffer.
1000 false (default) ~100 seconds Drops at exporter, enqueue_failed_* rises. None — ingest unaffected; larger buffer delays drops. no — RAM-only ~1000 batches in RAM (~50 MB) Common default. Rides short blips; > ~100 s still loses oldest.
1000 true ~100 s — then stalls Caller blocks; no exporter-side drops until retry window lapses. Backpressure arrives later but lasts until backend recovers. no — RAM-only ~1000 batches in RAM (~50 MB) Max in-memory durability; risk of OOM on long outages.
file_storage persistent queue on a PVC (in-memory tier in front)
15 000 false ~25 minutes (1 500 s) After 25 min, newest batches are dropped at the exporter; enqueue_failed_* rises. Already-persisted batches still wait on disk and flush when backend returns. None — ingest continues at full rate; loss invisible to producers. yes — persists & resumes ~750 MB PVC
(plus compaction headroom)
Good balance: ~25 min of buffer with bounded disk. Accepts tail-drops on long outages.
15 000 true ~25 min — then stalls Caller blocks when disk queue full; no exporter-side drops. Backpressure after ~25 min; OTLP clients retry or drop. Upstream becomes the bottleneck. yes — persists & resumes ~750 MB PVC Pushes loss upstream; best when producers can hold or slow down.
25 000 false ~42 minutes (2 500 s) Same mechanics as 15 000 but with ~17 more minutes of headroom before drops begin. None — ingest unaffected for the full ~42 min window. yes — persists & resumes ~1.25 GB PVC Larger outage budget at modest disk cost; still bounded drop behaviour.
25 000 true ~42 min — then stalls Caller blocks; no exporter-side drops until retry window lapses on oldest items. Backpressure after ~42 min; hardest guarantee without sharding. yes — persists & resumes ~1.25 GB PVC
(watch compaction / fsync)
Maximum single-node durability. Risk: slow PVC = slow ingest even when backend is healthy.