OpenTelemetry Collector with Failover Connector, Sending_Queue and File_Storage

Resilient telemetry flow from Son Testing (Source) to Observability Dev (Aggregator + Backends)

How the Failover Connector Works
The failover connector is an internal component that acts as an exporter for the ingress pipeline and a receiver for the downstream pipelines. It maintains a prioritized list of destinations, routing telemetry to the first healthy pipeline. It periodically retries higher-priority pipelines to return healthy pipelines when they become healthy again.
Telemetry Sources Applications Pods Host Metrics Logs Traces OTLP (gRPC/HTTP) Son Testing K8s Cluster (Source — Telemetry Producers) OpenTelemetry Collector (Single Collector Process · Multiple Pipelines) INGRESS PIPELINE (Receivers) otlp receiver Receives Traces, Metrics & Logs Exports to Failover Connector FAILOVER CONNECTOR (Exporter for Ingress, Receiver for Downstream Pipelines) Priority Levels (Health-Based Routing) 1. primary (highest priority) 2. failover (lower priority) retry_interval 30s (configurable) Routes data to the first healthy pipeline Routes data to failover pipeline when primary pipeline is unhealthy PRIMARY PIPELINE (Fast Path · No Disk) otlp exporter sending_queue (memory) FAILOVER PIPELINE (Durable Buffer Path) otlp exporter sending_queue (memory) file_storage (Write-Ahead Log on Disk) Observability Dev K8s Cluster (Observability Platform) Otel Aggregator Service (OTLP Receiver + Processing) OTLP Receiver (Ingests from Primary and Failover pipelines) Processing Batching · Resource Detection Transformation · Routing Mimir (Metrics) Loki (Logs) Tempo (Traces) Primary OTLP (gRPC/HTTP) Live Data Both pipelines target the same Aggregator endpoint Failover OTLP (gRPC/HTTP) Buffered or Live Data Important Notes
  • The connector does not buffer or persist data — it only routes based on health.
  • Buffering and persistence are provided by the exporters (sending_queue and file_storage) in each pipeline.
  • Both pipelines send to the same aggregator endpoint.
  • After recovery, the connector routes new traffic to Primary while Failover drains backlog.
1 NORMAL OPERATION (Aggregator is Healthy)
  • Connector routes new incoming telemetry to the Primary Pipeline (highest priority, healthy).
  • Primary exporter sends data directly to the Otel Aggregator.
  • Failover pipeline is idle but ready.
Ingress Failover Connector → Primary to Failover (idle)
Result: Low latency, normal flow.
2 BACKEND OUTAGE (2 HOURS) (Aggregator Unavailable)
  • Primary exporter's sending_queue starts buffering in memory. When it reaches limits or errors surface, the pipeline returns failure to the connector.
  • Connector marks Primary as unhealthy and switches traffic to the Failover pipeline.
  • Failover exporter's sending_queue + file_storage buffer and persist telemetry safely for the duration of the outage.
! Aggregator Down Ingress Failover Connector Primary (unhealthy) Failover (active · buffering)
Result: No data loss (within capacity). Live traffic goes to failover; data is durable on disk.
3 BACKEND RECOVERY — BOTH SEND (Aggregator is Healthy Again)
  • Connector retries Primary on retry_interval (e.g., 30s).
  • When Primary is healthy again, connector routes new incoming telemetry back to Primary.
  • Failover pipeline continues sending its buffered backlog (from memory and disk) until fully drained.
  • Both pipelines may send simultaneously for a period:
  • – Primary sends fresh/new telemetry
  • – Failover sends older/buffered telemetry
Ingress Failover Connector → Primary (new data) → Failover (draining backlog)
Result: System returns to normal while ensuring no data loss during the outage window.
Key Components Failover Connector
Health-based routing between pipelines (no storage).
Primary Pipeline
Fast path to backend with in-memory queue only.
Failover Pipeline
Buffering path with in-memory queue + persistent file_storage.
Otel Aggregator
Central ingest and routing to Mimir (metrics), Loki (logs), Tempo (traces).
Arrow Legend Active live traffic Failover / buffered Idle / inactive