Distributed systems and microservice architectures can be complicated in terms of networks. When an end user clicks on a website, many web requests go from the client's browser to the web server. On the backend, there are network requests flowing all over the place between microservices, databases, proxies, and more. With the end user clicking around on the website and traffic flowing all over the place to backend systems, it is often difficult for engineers to find out where the bottlenecks are and on which components the requests are spending the most time. For example, a particular request can spend time at the database layer, trying to get all the data, calling a third-party API, or maybe it’s attempting to process business logic within several microservices. To understand this, it’s important to have distributed tracing set up.
What is Distributed Tracing?
As applications become more complex and more services are involved in serving user traffic, it becomes critical to understand how requests traverse services and how each service contributes to overall latency. This is what distributed tracing does. It captures telemetry information of user requests and how long it takes each microservice in the path to return a response.
When a user request comes in, it’s important to create a trace i.e., the representation of the request journey as it moves through all the services of a distributed system. Traces are composed of spans, where each span represents a specific request and response pair involved in serving the user request. The parent span describes the latency as observed by the end-user. Each child span describes how a particular service in the distributed system was called and responded with latency information captured for each.
What is OpenTelemetry?
OpenTelemetry is a framework that provides a collection of APIs, libraries, agents, and tools to capture traces, logs and metrics from an application. OpenTelemetry aims to standardize the way in which telemetry data is collected and sent to different backend platforms. This provides a vendor-neutral path for instrumentation and delivers the flexibility to change the observability backend platform (i.e, to move from New Relic to Datadog or vice versa) without having to instrument the application code again. To be clear, OpenTelemetry does not replace Jaeger or Prometheus, which are observability backends. But the project helps in exporting data to both open-source and commercial backends.
Below are the features that OpenTelemetry provides:
- Standardization that companies can follow, making it easy to move between vendors
- A single collector binary for deploying in multiple ways, including as an agent or gateway
- The complete control of data for sending to multiple destinations in parallel
- Open-standard semantic conventions for data collection process that is vendor agnostic
- Support multiple context propagation formats for migration
- A full-stack implementation to generate, emit, collect, process, and export telemetry data
OpenTelemetry components
There are multiple components of OpenTelemetry:
- Proto: These are language independent interface types for OpenTelemetry, definable for collectors, instrumentation libraries, etc.
- Specification: This consists of APIs, SDKs, and data to describe the requirements and expectations of the implementation in different languages. API generates the telemetry data, SDKs provide processing and exporting capabilities for implementing the APIs and data has the semantic conventions to support all kinds of vendors without changing any code.
- Collector: This component is responsible for receiving, processing, and exporting telemetry data. This implementation has to be vendor agnostic. It is the default location where the instrumentation libraries export all the telemetry data.
- Instrumentation Libraries: These are a part of the OpenTelemetry project and are available in multiple languages. These libraries provide observability for other libraries so that every application can be observed by making calls to OpenTelemetry API.
OpenTelemetry Architecture
At the high level, OpenTelemetry consists of three main pieces:
- A set of APIs to instrument the code.
- An SDK that implements those APIs.
- The collector, which can ingest data from various sources and export it to several open-source and commercial telemetry systems.
The purpose of the API is to enable the instrumentation for libraries and application code. The API can be divided into four distinct sections: tracing, meters, shared context and semantic conventions.
- The Tracer API is responsible for creating and annotating spans. The Tracer API provides a span with its name and version. It can also add additional information like a traceId. The meter API provides access to a wide variety of metric instruments. Examples of these instruments are counters, value recorders and observers.
- The context API adds context to spans and traces (i.e headers like x-datadog-trace-id), which enables tracking a span as it propagates through the system.
- The semantic conventions contain guidelines and rules for mainly naming, such as naming the spans, attributes, labels, and metric instruments themselves. And the goal of the conventions is to ensure consistency across various language implementations and for external instrumentation that might be built with those APIs.
The OpenTelemetry SDK is the implementation of the OpenTelemtry API.
The SDK is implemented in the most popular languages like JavaScript, Java, Python, Ruby, Go, .NET, C++, giving developers a broad support for their languages of choice.
The Collector is a standalone service that can ingest metrics and spans from a wide variety of sources, including Zipkin, Jaeger, OpenCensus, and of course, the OpenTelemetry protocol itself. You can configure the collector to do tail-based sampling of those spans. It also enables exporting spans and metrics to a large number of vendor and open-source telemetry systems.
It is important to note that although the OpenTelemetry architecture intends to be a complete telemetry solution out of the box, there are many extension points that allow for customization as needed.
If you do not want to be locked into a vendor contract, OpenTelemetry is a great vendor-neutral solution that supports open standards.
Caylent provides a critical DevOps-as-a-Service function to high-growth companies looking for expert support with Kubernetes, cloud security, cloud infrastructure, and CI/CD pipelines. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and read more about our DevOps-as-a-Service offering.