Tracing Overview

The Tracing Overview provides a summary of your distributed traces, showing service performance, error rates, and trace volume for the selected time period. This report helps you understand how your application is performing and identify issues that need attention.

Tracing Overview Dashboard

Key Metrics¶

Trace Volume¶

Total Traces: Number of traces received in the selected time period
Traces per Minute: Average trace ingestion rate
Visual Timeline: Histogram showing trace distribution over time

Error Analysis¶

Error Rate: Percentage of traces containing errors
Total Errors: Count of traces with errors
Error Types: Breakdown of different error categories

Performance Metrics¶

Average Latency: Mean response time across all traces
P50 Latency: 50th percentile response time (median)
P95 Latency: 95th percentile response time
P99 Latency: 99th percentile response time

Service Health¶

Active Services: Number of services sending traces
Service Status: Health indicator for each service
Throughput per Service: Request volume for each service

Service List¶

The report shows each service with:

Service Name: As configured in your OpenTelemetry SDK
Request Count: Total number of traces for this service
Error Rate: Percentage of traces with errors
Average Latency: Mean response time
P95 Latency: 95th percentile response time
Status Indicator: Green (healthy), yellow (warning), red (error)

Clicking on a service name takes you to the Traces Explorer with results filtered for that service.

Timeline Visualizations¶

Trace Volume Over Time¶

Shows trace volume distribution across the selected time range with peak detection for unusual traffic spikes.

Latency Trends¶

Line graph showing P50, P95, and P99 latency trends over time with visual indicators for performance zones.

Error Rate Trends¶

Shows error rate changes over time, highlighting periods of elevated error rates.

Understanding the Data¶

Latency Percentiles¶

P50 (Median): Half of your requests are faster than this value
P95: 95% of requests are faster than this value (good for SLA monitoring)
P99: 99% of requests are faster than this value (captures worst-case performance)

Error Classification¶

HTTP Errors: 4xx and 5xx status codes
Exception Errors: Unhandled exceptions in application code
Timeout Errors: Requests that exceeded configured timeouts
Dependency Errors: Errors from external services or databases

Service Health Indicators¶

Green: Service operating normally, low error rate, acceptable latency
Yellow: Service showing warning signs (elevated latency or moderate error rate)
Red: Service experiencing issues (high error rate or severe latency problems)

Common Use Cases¶

Performance Monitoring¶

Track application response times across all services and identify performance degradation before it affects users.

Error Tracking¶

Monitor error rates across all services and identify which services are most error-prone.

Capacity Planning¶

Analyze traffic patterns to predict capacity needs and monitor service load distribution.

Deployment Impact Analysis¶

Compare performance before and after deployments and monitor error rates during releases.