Skip to content

Metrics

OpenTelemetry metrics can be collected in two main ways: zero-code instrumentation and manual instrumentation. Zero-code instrumentation automatically collects common metrics such as request rate, latency, and error counts from supported frameworks and libraries without requiring any changes to your application code. It is quick to set up and ideal for gaining standard visibility out of the box. Manual instrumentation, on the other hand, gives developers full control over what metrics to collect and how to label them. It involves adding OpenTelemetry API calls directly in the code to emit custom metrics that reflect specific business or application logic. In short, zero code instrumentation provides convenience and standardization, while manual instrumentation offers flexibility and precision.

  • What it is: Automatic metric collection with no code changes required
  • What you get: Pre-built dashboards, alerts, and reports that work out-of-the-box
  • Maintenance: Fully supported and maintained by our platform team

Manual Instrumentation - Custom Implementation Required

  • What it is: Hand-coded metric collection and custom business logic metrics
  • What you get: Complete control over what metrics are collected
  • Important: You need to ensure that manually shipped OpenTelemetry metric names match the ones expected by zero-code instrumentation to take full advantage of built-in reports and default alert rules. Otherwise, you will need to create custom reports and alerts based on your custom metrics.

Metrics collected through zero-code instrumentation and used in out-of-the-box reports and alerts are listed below. If your SDK doesn’t support OpenTelemetry zero-code instrumentation, or if you choose to use manual instrumentation for full control over your metrics, we recommend that the metrics you ship from your application code match those in the list below to take advantage of out-of-the-box reports and alerts. However, if you prefer to ship additional metrics and create custom reports you can always use the Chart Builder or define your own alert rules.

Metric Name
Key (Type) (Unit)
Description
HTTP Server Request Duration Count
otel.http_server_request_duration.count
(long counter)
Count of HTTP server request durations
HTTP Server Request Duration Sum
otel.http_server_request_duration.sum
(double counter) (milliseconds)
Sum of HTTP server request durations
HTTP Server Request Duration Max
otel.http_server_request_duration.max
(double gauge) (milliseconds)
Maximum HTTP server request duration
HTTP Server Request Duration Bucket
otel.http_server_request_duration.bucket
(long counter)
Histogram bucket for HTTP server request durations
HTTP Client Request Duration Max
otel.http_client_request_duration.max
(double gauge) (milliseconds)
Maximum HTTP client request duration
HTTP Client Request Duration Sum
otel.http_client_request_duration.sum
(double counter) (milliseconds)
Sum of HTTP client request durations
HTTP Client Request Duration Count
otel.http_client_request_duration.count
(long counter)
Count of HTTP client request durations
HTTP Client Request Duration Bucket
otel.http_client_request_duration.bucket
(long counter)
Histogram bucket for HTTP client request durations
Database Client Connection Count
otel.db_client_connection_count
(long counter)
Count of database client connections
Database Client Connection Timeouts
otel.db_client_connection_timeouts
(long counter)
Count of database client connection timeouts
Database Client Connection Wait Time Sum
otel.db_client_connection_wait_time.sum
(double counter) (milliseconds)
Sum of database client connection wait times
Database Client Connection Wait Time Count
otel.db_client_connection_wait_time.count
(long counter)
Count of database client connection wait time measurements
Database Client Connection Max
otel.db_client_connection_max
(long gauge)
Maximum database client connections
Database Client Connection Pending Requests
otel.db_client_connection_pending_requests
(long gauge)
Number of pending database client connection requests
Database Client Connection Create Time Sum
otel.db_client_connection_create_time.sum
(double counter) (milliseconds)
Sum of database client connection creation times
Database Client Connection Create Time Count
otel.db_client_connection_create_time.count
(long counter)
Count of database client connection creation time measurements
Database Client Connection Use Time Sum
otel.db_client_connection_use_time.sum
(double counter) (milliseconds)
Sum of database client connection use times
Database Client Connection Use Time Count
otel.db_client_connection_use_time.count
(long counter)
Count of database client connection use time measurements
JVM Memory Used
otel.jvm_memory_used
(long gauge) (bytes)
JVM memory currently used
JVM Memory Committed
otel.jvm_memory_committed
(long gauge) (bytes)
JVM memory committed
JVM Thread Count
otel.jvm_thread_count
(long gauge)
Number of JVM threads
JVM Class Loaded
otel.jvm_class_loaded
(long counter)
Number of JVM classes loaded
JVM Class Count
otel.jvm_class_count
(long gauge)
Current number of JVM classes
JVM GC Duration Sum
otel.jvm_gc_duration.sum
(double counter) (milliseconds)
Sum of JVM garbage collection durations
JVM GC Duration Max
otel.jvm_gc_duration.max
(double gauge) (milliseconds)
Maximum JVM garbage collection duration
JVM Memory Used After Last GC
otel.jvm_memory_used_after_last_gc
(long gauge) (bytes)
JVM memory used after last garbage collection
JVM CPU Recent Utilization
otel.jvm_cpu_recent_utilization
(double gauge) (ratio)
Recent JVM CPU utilization
Process CPU Count
otel.process_cpu_count
(long gauge)
Number of process CPUs
Process CPU Time
otel.process_cpu_time
(double counter) (seconds)
Process CPU time
Process Memory Usage
otel.process_memory_usage
(long gauge) (bytes)
Process memory usage
Process Thread Count
otel.process_thread_count
(long gauge)
Number of process threads
Process Runtime .NET GC Collections Count
otel.process_runtime_dotnet_gc_collections_count
(long counter)
.NET garbage collection count
Process Runtime .NET GC Objects Size
otel.process_runtime_dotnet_gc_objects_size
(long gauge) (bytes)
.NET garbage collection objects size
Process Runtime .NET Assemblies Count
otel.process_runtime_dotnet_assemblies_count
(long gauge)
Number of .NET assemblies loaded
Process Runtime .NET Exceptions Count
otel.process_runtime_dotnet_exceptions_count
(long counter)
.NET exceptions count
Process Runtime CPython CPU Utilization
otel.process_runtime_cpython_cpu_utilization
(double gauge) (ratio)
CPython CPU utilization
Process Runtime CPython CPU Time
otel.process_runtime_cpython_cpu_time
(double counter) (seconds)
CPython CPU time
Process Runtime CPython Thread Count
otel.process_runtime_cpython_thread_count
(long gauge)
Number of CPython threads
Process Runtime CPython Context Switches
otel.process_runtime_cpython_context_switches
(long counter)
CPython context switches count
Process Runtime CPython Memory
otel.process_runtime_cpython_memory
(long gauge) (bytes)
CPython memory usage
System Memory Usage
otel.system_memory_usage
(long gauge) (bytes)
System memory usage
System Memory Utilization
otel.system_memory_utilization
(double gauge) (ratio)
System memory utilization
System CPU Utilization
otel.system_cpu_utilization
(double gauge) (ratio)
System CPU utilization
System Disk IO
otel.system_disk_io
(long counter) (bytes)
System disk I/O
System Disk Operations
otel.system_disk_operations
(long counter)
System disk operations count
System Network IO
otel.system_network_io
(long counter) (bytes)
System network I/O
System Network Packets
otel.system_network_packets
(long counter)
System network packets count
System Network Errors
otel.system_network_errors
(long counter)
System network errors count
System Network Dropped Packets
otel.system_network_dropped_packets
(long counter)
System network dropped packets count
System Network Connections
otel.system_network_connections
(long gauge)
Number of system network connections
System Thread Count
otel.system_thread_count
(long gauge)
Number of system threads

Reports

OpenTelemetry Monitoring integration reports help you monitor your services and understand how they behave. The first set consists of generic reports under the main category, which support metrics from multiple SDKs and provide a high-level view of service performance. These reports help you quickly identify potential issues and can be grouped or filtered by service for more detailed investigation. The second set consists of SDK-specific reports, which capture metrics unique to each service and provide insights tailored to the particular behavior and characteristics of that programming language. Together, these reports offer both a broad overview and SDK-specific visibility, helping you understand and optimize your applications.

Service Health Report

Provides HTTP service monitoring focused on request performance, reliability, and traffic patterns using OpenTelemetry metrics.

Operational Health Metrics
  • Total Requests - Volume of incoming traffic to track usage patterns
  • Average Response Time - Calculate from duration sum/count to identify performance trends
  • Success Rate - Ratio of 2XX responses vs total requests to measure reliability
Performance Analysis
  • Hourly Request Count - Traffic patterns over time to identify peak hours and unusual spikes
  • Max Durations - Outlier detection for slowest requests
  • Duration Analysis - Compare max vs average response times to spot performance degradation
Response Time Distribution
  • Categorizes requests into fast (<750ms), moderate (1-7.5s), and slow (>7.5s) buckets
  • Helps identify if slowdowns affect all requests or specific segments
  • Enables capacity planning by understanding response time patterns
Error Tracking
  • HTTP Status Distribution pie chart breaks down 1XX, 2XX, 3XX, 4XX, and 5XX responses
  • Quickly spot error rate increases or unusual redirect patterns

OTEL Monitoring Service Health

Performance Summary Report

Request duration and latency analysis across all services

Performance Comparison Metrics
  • Avg Server Response Time - Calculated server-side request processing time in milliseconds
  • Avg Client Call Duration - Outbound HTTP request duration from client perspective in milliseconds
  • Average Response Time Comparison - Side-by-side trend chart comparing server vs client average response times over time
Response Time Distribution Analysis
  • Response Time Comparison (server vs client requests) - Categorizes both server and client requests into three performance tiers:
  • Fast (<750ms) - Optimal performance range displayed in green tones
  • Moderate (1-7.5s) - Acceptable performance range in orange tones
  • Slow (>7.5s) - Concerning performance requiring attention in red/brown tones
  • Enables identification of whether latency originates from server processing or client-side calls
Error Analysis
  • Error Rate Comparison - Tracks total errors (4XX + 5XX) for both server and client requests
  • Server 5XX vs Client 5XX - Isolates server errors to distinguish infrastructure issues from client-side problems
  • Helps pinpoint whether errors stem from internal services or external dependencies

OTEL Monitoring Performance Summary

Cross-Service Report

Compare HTTP performance between different services

Service Performance Metrics

  • Avg Response Time per Service - Big number display showing average response time grouped by service name
  • Calculated from duration sum/count metrics, converted to milliseconds
  • Enables quick identification of slowest services

Service Comparison Chart

  • Avg Response Time per Service - Time-series chart displaying response time trends for each service
  • Grouped by service.name tag to track individual service performance
  • Helps identify service degradation patterns and compare relative performance

OTEL Monitoring Cross Service

Client Performance Report

Monitor outbound HTTP requests made by your services

Request Duration by Method

  • HTTP Client Request Duration - Tracks average response time for each HTTP method
  • Calculated as (sum/count) × 1000 for millisecond precision

Request Volume Analysis

  • Client Request bar chart - Shows request count distribution across HTTP methods
  • Helps understand which operations dominate client-side traffic

Status Code Tracking

  • Response Status Distribution pie chart - Breaks down client responses by status category

Error Rate Monitoring

  • Client Error Rate - Displays both error count (4XX + 5XX) and total requests
  • Enables calculation of error percentage for client-side calls

OTEL Monitoring Client Performance

Database Performance Report

Monitor database connection pool health and query performance

Connection Pool Overview

  • Total Active Connections - Current number of active database connections
  • Connection Timeouts - Count of connection timeout events
  • Avg Response Time - Average wait time for connection acquisition in milliseconds

Pool Efficiency Metrics

  • Pool Efficiency - Displays used, idle, and total connections
  • Connection Pool Status pie chart - Visual breakdown of used vs idle connections

Capacity Management

  • Pool Capacity vs Usage - Stacked bar chart comparing used connections against max pool size
  • Grouped by connection pool name for multi-database monitoring
  • Helps identify pools approaching capacity limits

Performance Bottlenecks

  • Pending Requests Over Time - Area chart tracking queued connection requests by pool
  • Connection Timeouts by Pool - Bar chart highlighting which pools experience timeout issues

Connection Lifecycle Metrics

  • Connection Performance Metrics - Tracks three key timing phases:
  • Create Time - Time to establish new database connections
  • Wait Time - Time spent waiting in queue for available connection
  • Use Time - Active connection usage duration

OTEL Monitoring Database Performance

Java-specific JVM Runtime Report

Complete JVM runtime monitoring

Memory Metrics

  • Memory Used - Current JVM memory consumption in bytes
  • Memory Utilization - Percentage of committed memory being used
  • Color-coded thresholds:
    • Critical: >85% utilization
    • Warning: >70% utilization
    • Healthy: ≤70% utilization

Resource Tracking

  • Thread Count - Maximum number of active JVM threads
  • Loaded Classes - Current count of loaded classes in JVM

Detailed Charts

  • Memory Utilization - Stacked area chart showing used vs commited memory
  • GC Duration - Bar chart displaying maximum garbage collection pause times
  • Thread Count - Line chart tracking thread count over time
  • Class Count - Line chart monitoring total loaded classes

OTEL Monitoring Java JVM Runtime

Java-specific Memory Analysis Report

JVM memory usage and garbage collection analysis

Current State Metrics

  • Current Memory - Real-time memory usage in bytes (average aggregation)
  • Avg GC Duration - Average garbage collection duration in milliseconds
  • Max GC Pause - Longest garbage collection pause time (critical for latency-sensitive applications)

Memory & GC Correlation

  • Memory Usage & GC Events - Dual-metric chart displaying average memory used vs memory after last GC
  • Helps identify memory leak patterns and GC efficiency

GC Performance Analysis

  • GC Duration Analysis - Compares average vs maximum GC duration
  • Identifies GC pause outliers affecting application performance

OTEL Monitoring Java Memory Analysis

Java-specific System Resource Report

System-level resource consumption tracking

CPU Monitoring

  • System CPU Usage - Displays both average and max CPU utilization percentages
  • System CPU Utilization - Area chart with gradient showing average CPU usage trends

Memory Tracking

  • Memory Usage - Dual metric displaying memory usage vs memory utilization as percentage of commited memory
  • System CPU Utilization (Memory view) - Stacked area chart showing used vs free memory

OTEL Monitoring Java System Resource

.Net-specific CPU & Memory Report

Track process CPU and memory utilization for .NET applications

Resource Usage Metrics

  • CPU Usage - Calculated as (cpu_time / cpu_count) × 100 for percentage utilization
  • Physical Memory Usage - RSS (Resident Set Size) in bytes showing actual RAM consumption
  • Thread Count - Displays average and maximum thread counts

CPU Utilization Breakdown

  • CPU Usage - Stacked area chart separating user CPU time and system CPU time
  • Helps identify whether CPU is spent in application code vs system calls

OTEL Monitoring Dotnet CPU & Memory

.Net-specific Garbage Collection Report

Monitor .NET garbage collection patterns and heap management

Generation Collection Rates

  • Gen 0 Collections - Frequent, fast collections for short-lived objects (example: 15.9/min)
  • Gen 1 Collections - Medium-lived objects (example: 5.4/min)
  • Gen 2 Collections - Expensive collections for long-lived objects (example: 1.0/min)
  • High Gen2 rates indicate potential memory issues or large object heap problems
  • Total Heap Occupied - Current heap size in bytes
  • Cumulative GC Collections Over Time - Stacked area chart showing total collections by generation

Collection Rate Analysis

  • GC Collection Rate by Generation - Line chart with points tracking collection frequency
  • GC Collection Distribution pie chart - Proportional breakdown of collections across generations

OTEL Monitoring Dotnet Garbage Collection

.Net-specific Asembly & Exceptions Report

Monitor assembly loading and exception patterns

Assembly Management

  • Loaded Assemblies - Current count of loaded assemblies
  • Useful for detecting assembly leak issues
  • Assembly Growth - Line chart tracking assembly count over time
  • Sudden increases may indicate dynamic loading issues

Exception Monitoring

  • Total Exceptions - Cumulative exception count
  • Exception Rate - Average exception rate over time
  • Helps identify error hotspots and application stability issues

OTEL Monitoring Dotnet Assembly & Exceptions

Python-specific CPython Runtime Report

CPython runtime performance monitoring

CPU Metrics

  • CPU Utilization - Process CPU usage as percentage
  • CPU Time (Cumulative) - Tracks total CPU time consumed

Thread & Context Management

  • Thread Count - Average number of active Python threads
  • Context Switches - Cumulative context switch tracking voluntary switches vs involuntary switches

OTEL Monitoring Python Runtime

Python-specific Memory Management Report

Python process memory usage patterns

Memory Overview

  • Memory Usage - Dual display showing pyhsical RAM usage vs virtual memory size

System Memory Analysis

  • System Memory - Stacked area chart with gradient showing used memory vs free memory
  • System Memory Utilization - Percentage view of memory usage

Process Memory Tracking

  • Process Memory Usage - Stacked area chart comparing VMS vs RSS
  • Helps identify memory leaks and allocation patterns

OTEL Monitoring Python Memory Management

Python-specific System Performance Report

System-level performance and context switching

CPU Monitoring

  • CPU User - System CPU usage for user processes as percentage
  • System CPU Utilization by State - Area chart breaking down CPU usage by state (user, system, idle, etc.)

Disk Performance

  • Disk - Tracks I/O operations: Write vs Receive
  • Disk Operations - Stacked area chart with gradient for write/read operations

Network Monitoring

  • Net Errors - Count of network errors
  • Network I/O - Line chart tracking: Receive vs Transmit
  • Network Packets - Line chart showing packet counts: Receive vs Transmit

OTEL Monitoring Python System Performance

Python-specific Process Analysis Report

Process thread management and CPU analysis

Process Metrics

  • Process Threads - Average thread count
  • Process CPU - CPU utilization percentage
  • System User CPU - System-wide user CPU percentage
  • RSS Memory - Physical memory usage
  • Active Connections - Number of established network connections

Comparative Analysis

  • Thread Count Comparison - Compares: Process vs System Threads
  • CPU Utilization Comparison - Overlays: Process CPU vs System CPU

CPU Percentiles

  • CPU Utilization Percentiles - Multi-percentile view:
  • P50 (Median)
  • P95
  • P99
  • Helps identify CPU usage distribution and outliers

Network Connections

  • Network Connections by State - Tracks connections:
  • Established
  • Time Wait
  • Close Wait
  • Useful for identifying connection leak or timeout issues

OTEL Monitoring Python Process Analysis

Default Metric Alerts

Pre-configured alert rules will notify you about:

  • HTTP Server Error Rate Anomaly Alert: Alerts when services exhibit abnormal spikes in 4xx/5xx HTTP response codes, indicating application issues, bad requests, or service degradation.
  • HTTP Client Error Rate Anomaly Alert: Alerts when services experience abnormal spikes in 4xx/5xx errors from outbound HTTP calls to dependencies. Indicates issues with downstream services, API misconfigurations, or authentication problems affecting external integrations.
  • HTTP Client Latency Anomaly Alert: Alerts on unusual increases in outbound HTTP request response times, grouped by service and route. Detects performance degradation in downstream dependencies that often precedes complete failures and impacts user experience.
  • HTTP Server Latency Anomaly Alert: Alerts on unusual increases in server-side request processing time, grouped by service and HTTP route. Detects performance degradation from application code issues, resource contention, or downstream dependency slowness that impacts user-facing response times.
  • Database Connection Pool Timeout Alert: Alerts when database connection timeout events exceed 5 occurrences within a 10-minute period, indicating the database layer is struggling to handle connection requests. Early warning sign of database saturation or connectivity issues.
  • Database Connection Pool Saturation Alert: Alerts when pending database connection requests exceed 10 within a 10-minute window, signaling that the connection pool is at or near capacity. Indicates insufficient database connection resources or slow query execution blocking connections.
  • JVM Memory Pressure Alert: Alerts when JVM memory utilization exceeds 85% of committed memory for 5 minutes, indicating potential memory exhaustion. Warns of approaching out-of-memory conditions that could cause application crashes or severe performance degradation.
  • JVM Long Garbage Collection Pauses Alert: Alerts when JVM garbage collection pauses exceed 1 second within a 5-minute window. Long GC pauses cause application freezes, request timeouts, and degraded user experience, often indicating memory pressure or inefficient memory usage patterns.
  • High CPU Usage Alert: Alerts when process CPU utilization exceeds 70% over a 10-minute period. Indicates compute resource saturation that can lead to request queueing, increased latency, and potential service instability.
  • Database Query Performance Degradation Alert: Alerts on anomalous increases in average database query execution time over a 10-minute window. Detects performance degradation from inefficient queries, missing indexes, lock contention, or database resource constraints before they cause widespread application slowdowns.