Observability is essential for understanding how modern applications perform and behave in production. OpenTelemetry has emerged as the industry standard for collecting, processing, and exporting telemetry data—traces, metrics, and logs—without vendor lock-in. This guide will walk you through OpenTelemetry’s core components, how it works, and why it’s a game-changer for observability.
What is OpenTelemetry?
OpenTelemetry is an open-source observability framework and toolkit designed to generate, collect, manage, and export telemetry data including – but not limited to – traces, metrics, and logs.
It is vendor- and tool-agnostic, meaning it seamlessly integrates with various components, from open-source tools like Jaeger and Prometheus to commercial solutions like Sematext.
Why Should You Care About OpenTelemetry?
- Vendor-Agnostic & Open-Source: OpenTelemetry is not tied to any specific vendor. Put another way – there is no vendor lock-in. OpenTelemetry aims to remove the need for proprietary agents that collect vendor-specific observability data in a vendor-specific fashion. Thus, you can use – and easily switch – between Jaeger, Prometheus, Datadog, New Relic, or Sematext, just to list a few. Of course, each vendor still offers other features, so you’ll still want to compare vendors and choose the one whose features, costs, etc. you like best.
- Standardized Instrumentation: Before OpenTelemetry, each monitoring tool had its instrumentation method, leading to vendor lock-in. OpenTelemetry eliminates this fragmentation, providing a universal standard for instrumentation across different languages and libraries.
- Auto-Instrumentation for Faster Adoption: Manually adding instrumentation is tedious and error-prone. OpenTelemetry supports auto-instrumentation for many popular libraries and frameworks (e.g., Flask, FastAPI, Django, PostgreSQL), reducing the time and effort needed to get started.
- Improved Debugging & Faster Issue Resolution: At one point in time metrics, logs, and traces went by the Three Pillars of Observability. These three pillars enable you to troubleshoot issues faster by correlating them, hopefully finding the root cause faster. However, each of these observability data was collected separately. There was no clear “connector” between them, so the correlation wasn’t seamless. OpenTelemetry fixes that. So now, by correlating traces, logs, and metrics, OpenTelemetry enables teams to pinpoint root causes faster, reducing MTTR (Mean Time to Resolution) and improving system reliability.
Basic Architecture
At a high level, OpenTelemetry collects telemetry data from applications via SDKs, processes it with an optional Collector, and exports it to observability backends.
- Application SDKs – Libraries that instrument your code to collect traces, metrics and logs
- Optional Collector – A standalone service that can receive, process and export telemetry data
- Observability Backends – Systems that store and visualize your telemetry data
This simple pipeline provides flexibility in how you deploy OpenTelemetry. For detailed implementation options, see the “Architecture Approaches” section later in this guide, where we’ll explore different deployment models in depth.
Telemetry Signals
A signal refers to a stream of observability data. OpenTelemetry captures multiple types of telemetry data to give a complete picture of an application’s health and performance.
1. Traces: Traces track the journey of a request as it moves through different services and components in a system. They show how requests flow across services, helping developers identify bottlenecks and latency issues.
A trace consists of multiple spans, where each span represents a single operation or step in the request flow.
Traces help detect slow queries, network delays, and failures, making it easier to optimize performance and improve system reliability.
The image below shows a trace with spans representing different operations and their durations.
2. Metrics: Metrics provide numerical measurements of system and application performance over time. They reveal trends like CPU usage, memory consumption, request latency, and error rates.
Unlike traces which follow individual requests, metrics aggregate data to show system-wide patterns. OpenTelemetry supports counters, gauges, histograms and other metric types that can be exported to monitoring platforms like Prometheus, Sematext, and Datadog.
The image below shows CPU usage on the Sematext dashboard, displaying trends over time, process-specific usage, and resource consumption insights
3. Logs: Logs record events happening within an application, either in a structured or unstructured format. They capture important details such as errors, warnings, and system activities, making them essential for debugging.
OpenTelemetry enables logs to be correlated with traces, providing deeper context when troubleshooting issues. This correlation helps developers understand how specific events impact request flows.
Logs are also valuable for forensic analysis and long-term monitoring, allowing teams to track historical data and detect patterns over time.
Example of a structured log event:
{ "timestamp": "2025-02-14T15:30:00Z", "level": "INFO", "message": "User login successful", "service": "auth-service", "user_id": "12345", "ip_address": "192.168.1.10", "request_id": "abc123-def456-ghi789" }
4. Profiling (Experimental Feature): Profiling enhances observability by capturing detailed performance data at the code level. It helps developers analyze CPU usage, memory allocation, and execution time to identify inefficiencies – down to the line of code. OpenTelemetry’s continuous profiling runs with minimal overhead, making it suitable for production environments. By correlating profiles with traces and metrics, teams can connect high-level performance issues to specific code blocks, significantly accelerating troubleshooting and optimization
OpenTelemetry is still expanding its support for profiling across different programming languages, making this an evolving and exciting space in observability.
Getting Started
SDKs
Role of SDKs in Instrumenting Applications
OpenTelemetry SDKs provide the APIs and tools needed to:
- Generate telemetry data (traces, metrics, logs)
- Auto-instrument applications
- Configure exporters, described shortly, to send data to observability tools
Each supported language has its own SDK, making it easy to integrate OpenTelemetry with different frameworks.
Language | Auto-Instrumentation | Manual Instrumentation | Supported Libraries & Frameworks |
Java | Traces, Metrics, Logs | Traces, Metrics, Logs | Spring, Quarkus, Micronaut, Jakarta EE, JDBC, Hibernate, gRPC, Kafka, Tomcat, Jetty |
Node.js | Traces, Metrics | Traces, Metrics | Express, Koa, Fastify, NestJS, GraphQL, MongoDB, Redis, PostgreSQL, MySQL, AWS SDK |
Python | Traces, Metrics, Logs | Traces, Metrics | Django, Flask, FastAPI, SQLAlchemy, Requests, aiohttp, Celery, PyMongo, Tornado |
.NET | Traces, Metrics, Logs | Traces, Metrics, Logs | ASP.NET Core, Entity Framework, gRPC, HttpClient, WCF |
PHP | Traces | Traces, Metrics, Logs | Laravel, Symfony, Guzzle, PDO, Slim, Laminas, Doctrine |
Ruby | – | Traces | ElasticSearch Client,GraphQL, Koala, LMDB |
Go | WIP | Traces, Metrics | Gin-gonic, Echo, Fiber, Go-redis, Gorilla mux, Zap |
C++ | – | Traces, Metrics, Logs | httpd(Apache), Nginx, grpc |
Rust | – | Traces (not stable yet) | Actix Web, Axum, Tide, Trillium |
Erlang | – | Traces | Cowboy, Ecto, Elli, grpcbox, Oban |
Swift | – | Traces | URLSession, NautilusTelemetry |
OTLP Protocol
The OpenTelemetry Protocol (OTLP) is the default transport mechanism for OpenTelemetry. It standardizes how telemetry data is transmitted between applications, collectors, and observability platforms.
OTLP supports traces, metrics, and logs in a unified format and uses either gRPC or HTTP for data transfer. This ensures low latency and high throughput, making it suitable for large-scale distributed systems.
Instrumentation
To collect telemetry data, applications need to be instrumented with OpenTelemetry SDKs or agents. Instrumentation can be done in two ways:
- Manual Instrumentation
- Auto-Instrumentation
Manual vs. Auto-Instrumentation
- Auto-Instrumentation: OpenTelemetry provides automatic instrumentation for many frameworks and libraries, requiring minimal or no code changes. Examples include:
- Java
- Python
- Node JS
- .NET
- PHP
- Manual Instrumentation: Developers can use OpenTelemetry SDKs to manually define custom traces, metrics, or logs in their code.
Manual and Auto-Instrumentation Examples
To collect telemetry data, applications need to be instrumented with OpenTelemetry SDKs or agents. This can be done in two ways. We’ll use Python to illustrate both of them.
Auto-Instrumentation (Python)
Auto-instrumentation collects telemetry data without code changes.
1. Install dependencies
pip install opentelemetry-distro flask opentelemetry-bootstrap -a install
2. Create a simple application (hello.py)
from flask import Flask app = Flask(__name__) @app.route("/hello") def hello(): return "Hello World" if __name__ == "__main__": app.run(port=8080)
3. Configure OpenTelemetry Collector (config.yaml)
# Configuration for OpenTelemetry Collector with Sematext Exporter # For more details, see: # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/sematextexporter receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: exporters: sematext: timeout: 500ms region: US sending_queue: enabled: true num_consumers: 5 queue_size: 100 retry_on_failure: enabled: true initial_interval: 1s max_interval: 3s max_elapsed_time: 10s metrics: app_token: <METRICS_APP_TOKEN> payload_max_lines: 10000 payload_max_bytes: 100000 logs: app_token: <LOGS_APP_TOKEN> debug: verbosity: detailed service: pipelines: metrics: receivers: [otlp] processors: [batch] exporters: [sematext] logs: receivers: [otlp] processors: [batch] exporters: [sematext]
4. Run with instrumentation
# Start collector ./otelcol-contrib --config=config.yaml # Run instrumented application opentelemetry-instrument \ --traces_exporter otlp \ --metrics_exporter otlp \ --logs_exporter otlp \ --service_name my_service \ python hello.py
Manual Instrumentation (Python)
Manual instrumentation gives you precise control over what's traced.
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Set up the tracer
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)
# Configure exporter
otlp_exporter = OTLPSpanExporter(endpoint="localhost:4317")
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)
# Example function with manual instrumentation
def process_order(order_id):
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
print(f"Processing order {order_id}")
# Your business logic here
# Usage
process_order(123)
Both approaches send telemetry data to your observability platform like Sematext Cloud, where you can view traces, metrics, and logs.
Architecture Approaches
Collector-Based Architecture
The Collector-Based Architecture introduces an additional OpenTelemetry Collector component between the application and the observability backend. This collector plays a crucial role in handling telemetry data before it reaches its final destination.
How It Works
- User Application (written in any language): The application is instrumented with the OpenTelemetry SDK. Auto-instrumentation collects telemetry data without requiring code modifications and transmits it via OTLP.
- OpenTelemetry Collector: A centralized service that processes incoming telemetry data. It consists of:
- Receivers – Accepts telemetry data from different sources.
- Processors – Handles data enrichment, filtering, batching, and sampling.
- Exporters – Transforms and sends data to multiple observability backends.
- Vendor Backends: Telemetry data is forwarded to various backends such as Prometheus, Loki, Jaeger, Sematext, or Datadog.
Benefits
- Protocol Translation – The collector can receive data in one format and export it in another, allowing integration with various systems.
- Data Enrichment – It can add additional metadata, such as labels or resource attributes, before sending data to a backend.
- Filtering & Sampling – Helps reduce data volume by discarding unnecessary logs, traces, or metrics.
- Multiple Export Targets – Can send telemetry data to multiple destinations simultaneously.
Considerations
- Additional Component to Manage – Requires deploying and maintaining an extra service.
- More Complex Configuration – Needs proper setup to ensure optimal performance.
- Higher Resource Usage – The collector itself consumes CPU and memory, adding overhead.
Direct Integration Architecture
The Direct Integration Architecture eliminates the OpenTelemetry Collector, allowing the application to send telemetry data directly to an observability backend. This results in a more lightweight setup with fewer moving parts.
How It Works
- User Application (Any Language): The application is instrumented with the OpenTelemetry SDK. Auto-instrumentation collects telemetry data (traces, logs, and metrics) without requiring code modifications and transmits it via OTLP.
- Agent (with built-in OTLP support): Acts as an intermediary, receiving OTLP data directly from the SDK.
- Vendor Backends: Telemetry data is sent directly to a backend like Prometheus (metrics), Loki (logs), Jaeger (traces), Sematext, or Datadog.
Benefits
- Simpler Deployment – No need for an additional collector, reducing setup complexity.
- Lower Resource Footprint – Uses fewer CPU and memory resources.
- Direct Communication – Reduces latency since data is sent straight to the backend.
- Single Component to Manage – The agent is lightweight and easier to maintain.
Considerations
- Backend-Specific Implementation – Requires an observability backend that supports OTLP.
- No Sampling or Further Processing – Without an intermediary like the OpenTelemetry Collector, it is not possible to apply sampling or further processing before sending the data to the backend.