OpenTelemetry

OpenTelemetry: OpenTelemetry is an open source observability framework created when CNCF merged the OpenTracing and OpenCensus projects.[65] OpenTracing offers “consistent, expressive, vendor-neutral APIs for popular platforms”[66] while the Google-created OpenCensus project acts as a “collection of language-specific libraries for instrumenting an application, collecting stats (metrics), and exporting data to a supported backend.”[67] 

Under OpenTelemetry, the projects create a “complete telemetry system [that is] suitable for monitoring microservices and other types of modern, distributed systems — and [is] compatible with most major OSS and commercial backends.”[68] It is the “second most active” CNCF project.[69] In October 2020, AWS announced the public preview of its distro for OpenTelemetry.[70]

Wikipedia

https://opentelemetry.io describes OpenTelemetry as “a collection of APIs, SDKs, and tools. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.

OpenTelemetry is generally available across several languages and is suitable for production use.

You can follow OpenTelemetry’s blog here: https://opentelemetry.io/blog/

OpenTelemetry enables comprehensive observability by integrating distributed tracing, metrics, and logs across various application layers and environments. Examples include:

Distributed Tracing

  • Tracking a request as it flows through multiple microservices, capturing spans for each service interaction, and visualizing the end-to-end latency and bottlenecks in systems like Jaeger or Zipkin.
  • Instrumenting HTTP handlers, database queries, and RPC calls to create trace data, which helps diagnose where failures or performance issues occur.

Metrics Collection

  • Gathering infrastructure metrics such as CPU, memory usage, and network I/O, as well as custom application metrics like request duration, error counts, and throughput.
  • Exporting metrics to platforms like Prometheus for real-time monitoring and alerting, enabling fast response to anomalies.

Logging Correlation

  • Enriching application logs with trace and span IDs so that developers can link logs directly to traces, making it easier to contextually analyze incidents.
  • Sending logs to log management systems like Loki or Elasticsearch, alongside metrics and trace data, for unified querying and troubleshooting.

K8s OpenTel example

  • In a Kubernetes-based microservice architecture, OpenTelemetry is used to instrument all services. Traces track requests between services, metrics capture latency and error rates, and logs include trace context. This comprehensive telemetry allows teams to visualize SLAs, quickly investigate outages, and correlate issues between signals for rapid root cause analysis.

These patterns demonstrate how OpenTelemetry provides a holistic observability solution beyond siloed tracing, metrics, or logging, improving visibility and accelerating issue resolution in distributed architectures.

OpenTelemetry Timeline (Key Developments)

Pre-history (2010–2018): Foundations of Distributed Tracing

https://images.openai.com/static-rsc-4/h-gxiTmlC34wGNaVB1m1PzLdSHEbDz7lVZmekTJKXFFaa_6BSjEdCpMUtrldNMrgHOLog2e5dmuL1dPYOiexDf3r_CiE5ixVD7wRNgp9zleWJqiDIQ2FvKdnzpIGJg9Z2ya6ZE-vDO6hICXUWYk1rmkjHdO72E_76LBRKVPnKqPdun1rbjHpzp9l2dqrnhEY?purpose=fullsize

  • 2010 – Google publishes Dapper, introducing large-scale distributed tracing concepts
  • 2012–2015 – Emergence of tools like Zipkin and Jaeger
  • 2016 – OpenTracing launched (vendor-neutral tracing APIs)
  • 2017 – OpenCensus launched (metrics + tracing SDKs)

Problem: Two competing standards created fragmentation and adoption friction


2019: Birth of OpenTelemetry

https://images.openai.com/static-rsc-4/D01ksqeVKC1eYPq_8iZLtZKQeK76rdt6WdsJm4NxhswKhp2IF2bx4Q2JmzD_2WRZOmyUoOKzjBGTH-B9W6sfJQbRgxDkZNgoZ6oq_XKe8dWt_VLw-dnKD_AutDEqgHk-R9VjBHBfASeO5WngP6G3cndh2cqS41WmT7jUKsc4tM5mmSV-E3-jUPDQiDuoasJn?purpose=fullsize
https://images.openai.com/static-rsc-4/YF7AHI67UwLxfMkILbo_feVmPfkchSo-u0GkvYjT0gTFfWRMsuHsCyRGcqUD2az5U83JMEevctF1VHFzX9sAn5GzNQdEULKRflq5qc4tIL0pd0jX5WbdLZVWvLHhyLWyNFbAoNRMUKjKxJoKTxcShvemErADKcenr-rZMKcRmBVG50ENWszaJu0iOnsfhU62?purpose=fullsize
https://images.openai.com/static-rsc-4/6Q2n_lz_kGJh_4OMxjgwVNHrREmXB15OhZEeqmMP-qoMHt-kvCnnB81BkN2Wzb5t6CJVTX2dtwY918EQTHnEr1_MYsJCmRBLIDt8IFMpRqiDx0c7lNO0FZVc6ZDBrgaivUASOUhEuSRP5qu9maXxtgo__wntoUO-9p9qX9ytL585_T-lVIUcCSt6toFHpOkY?purpose=fullsize

  • May 2019
    • OpenTracing + OpenCensus officially merge into OpenTelemetry
    • Accepted into CNCF as a Sandbox project

Strategic shift:

  • One unified standard for telemetry (traces, metrics, logs)
  • Vendor-neutral instrumentation layer

2020: First Production Readiness

  • v1.0 Tracing API released
    • Stable tracing specification and SDKs
    • Signals confidence for production adoption

Impact:

  • Tracing becomes the first mature pillar
  • Vendors (Datadog, New Relic, etc.) begin aligning

2021: CNCF Incubation & Ecosystem Growth

  • Aug 2021
    • OpenTelemetry moves to CNCF Incubating

Key developments:

  • Multi-language SDK expansion (10+ languages)
  • Formalisation of OTLP (OpenTelemetry Protocol)
  • Strong adoption across vendors and enterprises

2022: Metrics Maturity & “Three Pillars” Alignment

  • Metrics API & SDK reach stability
  • Logs integration matures (still evolving)

Outcome:

  • First time a single framework supports all three signals:
    • Traces
    • Metrics
    • Logs

2023–2024: Industry Standardisation Phase

  • Widespread adoption across:
    • Cloud providers (AWS, Azure, GCP)
    • Observability vendors (Datadog, Splunk, Grafana)

Key trends:

  • Auto-instrumentation becomes mainstream
  • OpenTelemetry Collector becomes the de facto telemetry pipeline layer
  • Deep integration with CNCF stack (Prometheus, Jaeger, etc.)

2025: Scaling Challenges & Maturity Work

  • Focus on:
    • Configuration standardisation (YAML/JSON)
    • SDK self-observability
    • Reducing operational complexity

Reality:

  • OTel becomes powerful but operationally complex at scale

2026: CNCF Graduation 🎓

  • May 11, 2026
    • OpenTelemetry reaches CNCF Graduated status

This signals:

  • Enterprise-grade maturity
  • Strong governance and ecosystem stability
  • Long-term industry standard for observability

🧠 Summary (Condensed View)

PhaseFocusOutcome
2010–2018Fragmented tracing ecosystemCompeting standards
2019Merge into OpenTelemetryUnified vision
2020Tracing stabilisedProduction adoption
2021CNCF incubationRapid ecosystem growth
2022Metrics stabilisedFull observability stack
2023–24Industry adoptionDe facto standard
2025Scaling & complexityMaturity refinement
2026CNCF graduationEnterprise standard

Unified Timeline: OpenTelemetry vs Commercial Platforms

Phase 1: Pre-OpenTelemetry (2010–2018)

Vendor-controlled instrumentation era

https://images.openai.com/static-rsc-4/8uumeQb785FPYV-hS3yF-umwkatxdSn85pohOzPiE-BpTd8K3wTtGQZYkkknMHCHjWtU3Zj_G61LjzEyAZXN_ITlLsUx0QNuLlb0FiYkduTibUWzXGauVI5ceB2xIRxdbskrNueAJZ10cyXzxpv5wW5Ck2N9p79SjO6F65ow9yZ2Lwf-pWwXOtG-x5LYaJNV?purpose=fullsize
https://images.openai.com/static-rsc-4/8MWE0QS9HmdacYU6IEnEdBpPjjmuHDoNtD6qAqxhSO2MR9S7AvET-XCxKPn8aHGNgYWpFxuWLZ9VxBoQAa1HJ_r24rf-r-0gN3kyQ5vt9P_j4uzsQVxU5xd_HiIU0l2JhE-Mp7LYSNBXZbJ1FEPH2rKpHkRk4VyA9UqLvqzryrrELWNfb7XiucMOkRP95dRr?purpose=fullsize
https://images.openai.com/static-rsc-4/s5tpan9fsZK4q8KycymMCW4c-rwv8-Il_9ULHJl6qmXXYu4twYl6efuvwLfwzWfe-A3QewVtUpAjHQnQ9mxm_VpycWCeXbn0_3JOyBvNkruir5_VWmKwFMSpAOFQvHQdRooDhEYoK4QoZ6z89PQlxxOCR16mj-fCURFia9eoWnKZXGBE5DPaC7RWUtydHlhD?purpose=fullsize

Vendors

  • Datadog (founded 2010)
    • Agent-based metrics + infra monitoring
    • Later adds APM (tracing)
  • New Relic
    • Strong APM-first model
    • Proprietary agents and SDKs
  • Splunk
    • Log-centric (machine data platform)
    • Later moves into APM via acquisition (SignalFx, Omnition)

Key Characteristics

  • Fully proprietary instrumentation
  • Vendor lock-in at the agent + SDK layer
  • Scaling issue:
    • Each service tightly coupled to a vendor agent
    • Difficult multi-vendor strategy
    • High operational friction in polyglot environments

This is the problem OpenTelemetry was created to solve.


Phase 2: 2019–2020 (OpenTelemetry Emerges)

Vendors react cautiously

https://images.openai.com/static-rsc-4/c_m9iUS4E-XsoDGqwHGiCbZ1jUAA4yuuTLrOMbf_artGEjXS7B9O4JY8VkZCcl-KKmafkiRIvaH6y1mGUI9aaGG0vuKYfn3rFb9xDyCgEboR92SvJECOcTcfwyUzE78c3RTNZtD7jw8059cegIIf6o1efJZPlSva4fm408WpdiKL5PavvdaB4VZZnah0th2Q?purpose=fullsize
https://images.openai.com/static-rsc-4/D01ksqeVKC1eYPq_8iZLtZKQeK76rdt6WdsJm4NxhswKhp2IF2bx4Q2JmzD_2WRZOmyUoOKzjBGTH-B9W6sfJQbRgxDkZNgoZ6oq_XKe8dWt_VLw-dnKD_AutDEqgHk-R9VjBHBfASeO5WngP6G3cndh2cqS41WmT7jUKsc4tM5mmSV-E3-jUPDQiDuoasJn?purpose=fullsize
https://images.openai.com/static-rsc-4/1rUuyP-QNvQVYOQ3Euh94U2dHX5OYliW7xKyrwfitMrsG1rTorn25Y42L_-RTdZkWDOXQzgAVcVKrj8DhHwnro54D-puhntYUBJjJX_BKyr25KOYBJ2GmrFCMdKAgfBD2dy5Gm6qC539H1GVtcx0dBZeJuoqgLvfOq5R3QJEGEKx_gYVEc4BAtok1-fObKeb?purpose=fullsize

OpenTelemetry

  • Merge of OpenTracing + OpenCensus
  • CNCF sandbox
  • Tracing reaches v1.0 (2020)

Vendor Positioning

  • Datadog
    • Initially resistant (protect proprietary APM agents)
    • Begins adding OTLP ingestion endpoints
  • New Relic
    • Early strategic pivot:
    • Promotes “open instrumentation” narrative
    • Starts aligning SDKs with OTel
  • Splunk
    • Acquires SignalFx + Omnition (OTel-native tracing company)
    • Becomes one of the biggest OTel contributors early

Scaling Context

  • Microservices explode (Kubernetes, service meshes)
  • Tracing becomes critical, but:
    • Instrumentation complexity increases exponentially

Vendors realise:

They cannot scale proprietary instrumentation across cloud-native ecosystems.


Phase 3: 2021–2022 (Adoption & Standardisation)

OTel becomes real in production

https://images.openai.com/static-rsc-4/Z0wExeghNEd09taOQOFtblH77WS3NX4HKaENkluPNwHl-q8R9gooiOwW4IM2qs-3Mh7He6CMaxtM0bcOOqTFu-8sQogwQppQAkWAIo3WDlsP1-YXkeBuEr9NRKqrNZ9aJYVVik4-mafRAw67JYpD-7ea8eUsbB98CPYiAW6Wpiz44XoKb6BtzmtyjhTWMy-2?purpose=fullsize
https://images.openai.com/static-rsc-4/ZLRRvklbeIIVOEn9y67ORlvLdt1fLWfyECqyp7vIy8404r4WSrTFa4Fi5BV8kbJ-cKu2DkVHKND59dZTE4YmaVU_4Xhx3sdmYlsEXpmICZ0TdMJDM2qjVyFvOe89mBwPgGNdsPyCJNsYj4eSAH1w-zBbZpC7LxBXFQwfztFWmPcOu1rU2NW4pd-lLnYNaYdr?purpose=fullsize
https://images.openai.com/static-rsc-4/HciNDwO0V95ZzykdnorkskYnE7_unwBlNGuO_-fiPvtGbpN-9FMPdzD5XO5woztMGvdEMAjHmzAcuRj1CFPNHH_I81ko0bwxzdYoWXNNYd7F_lTYeLjqbEtJ97g0xikyVX5Dqd8yY4kA7Ttgrf7WfuqWcx4eDDUU9xvMsVd0OHLwf5PZxw9n2LPKGMd716YD?purpose=fullsize

OpenTelemetry

  • CNCF Incubation
  • OTLP stabilised
  • Metrics reach maturity (2022)

Vendor Adoption Patterns

🟣 Datadog

  • Adds:
    • OTLP ingest (traces + metrics)
    • OTel Collector support
  • Still pushes:
    • Datadog Agent as primary path

Strategy:

“Support OTel, but keep users inside Datadog ecosystem”


🔵 New Relic

  • Fully embraces OTel:
    • OTel-native ingestion
    • Promotes agentless / open instrumentation
  • Drops pricing barriers (usage-based model shift)

Strategy:

“Win by being the most OpenTelemetry-friendly vendor”


🟢 Splunk

  • Deep integration:
    • Splunk Distribution of OpenTelemetry Collector
    • Native OTLP pipelines
  • Heavy contributor to OTel project

Strategy:

“Own the pipeline layer via OTel”


Scaling Problem (Critical Insight)

At this stage, OTel solves instrumentation, but creates:

  • Pipeline sprawl (Collectors everywhere)
  • Config complexity (YAML explosion)
  • Cardinality + cost issues

Observability shifts from:

  • “Can I collect telemetry?”
    to
  • “Can I control cost and cardinality at scale?”

Phase 4: 2023–2024 (Mainstream Adoption)

OTel becomes the default

https://images.openai.com/static-rsc-4/jfLkX08aGlnN7n2grTXcmsb7mzUDrFhhqn9jpfMSmP1Z9CkeZu8ROcCujNm4CpOV8ENtskX6Iinl1nzIrf3AfGbIdZNEsYa-c9EDYGYljRr00flAU2rmO1RR74EX9QSaMe6LbyOCeSA4znQlaEJZSKRUPFQG0QRaEaw0EsEBCfVaRwhUo40fsd2Xd83r5RWx?purpose=fullsize
https://images.openai.com/static-rsc-4/b-VcmsT0c0dke8h2TP895A1z_tc3alWPvg7UQmE10_2EiZClyYYspBRDX6YWFbkW49mXOa2LJo8PDkRdRUm_KwcNwf7aHEahbevK39YGDWZGkWzSfHcZWsty43kcVDZihOP8KdE0gAmmKCEiNuBXURvmPEOQ134w3Jwmk8jvyPlWfEu8rAm_-DWMzUP5SJZi?purpose=fullsize
https://images.openai.com/static-rsc-4/nN-0rFHUszp3ImsEKAtbNhnETmtzKRmkG2tawEIiDvcKEahaFy70yb5lciWyCUTKS47ppoO6ys_HXwvZFmfpKX14E5gWLOdayMSDGJX7Ak4MOlfvdbD7aXkD205-NfNq3sogfgK2QWIBky33SAnvfaQY4VlDBd-Ki2nJ0PqPJBE3Svb6egFaAg9ZQw0n7ZyQ?purpose=fullsize

Market Reality

  • OpenTelemetry becomes:
    • Default instrumentation standard
    • Expected in enterprise architectures

Vendor Differentiation Shifts

Datadog

  • Focus:
    • UX, correlation, AI features
  • Still optimised for:
    • Datadog-native pipelines

Key move:

  • “OTel in → Datadog internal model”

New Relic

  • Positions itself as:
    • “Best backend for OTel data”
  • Strong:
    • Unified schema (NRDB)

Splunk

  • Focus:
    • Enterprise-scale ingestion + analytics
  • OTel Collector becomes:
    • First-class ingestion layer

Scaling Complexity (Now the Core Problem)

At scale (what you’d discuss in an SRE interview):

  • Cardinality explosion
    • Metrics labels blow up costs
  • Sampling strategies
    • Head vs tail sampling in traces
  • Pipeline engineering
    • Filtering, enrichment, routing
  • Storage tiering
    • Hot vs cold observability data

Vendors now compete on:

“Who helps you manage observability complexity best?”


Phase 5: 2025–2026 (Maturity & Control Plane Thinking)

OpenTelemetry

  • CNCF Graduation (2026)
  • Focus:
    • Stability
    • Configuration standardisation
    • Pipeline governance

Vendor Convergence

All three now:

  • Support OTLP natively
  • Support OpenTelemetry Collector
  • Provide distribution/custom builds

Strategic Shift

Observability architecture becomes:

[Instrumentation: OpenTelemetry]

[Pipeline: OTel Collector / Vendor distros]

[Backend: Datadog / New Relic / Splunk]

The battleground moves to:

LayerWho owns it
InstrumentationOpenTelemetry
PipelineShared (OTel + vendors)
Storage + UXVendors

Key Takeaways (What Actually Matters)

1. OpenTelemetry commoditised instrumentation

  • Vendors lost control of the data generation layer

2. Vendors adapted differently

VendorStrategy
DatadogControlled openness (OTel-compatible, but agent-first)
New RelicFull embrace of OTel
SplunkDeep integration + pipeline ownership

3. The real problem shifted

Before OTel:

  • How do I instrument services?

After OTel:

  • How do I:
    • Control cost?
    • Manage cardinality?
    • Design pipelines?
    • Sample intelligently?

4. Modern Observability = Data Engineering Problem

At scale, you’re effectively building:

  • A real-time telemetry data platform
  • With:
    • Streaming pipelines (Collectors)
    • Schema governance
    • Cost optimisation