Sampling

Trace Sampling¶

Sampling helps control the volume of traces sent to Sematext Cloud, reducing costs and performance overhead while maintaining visibility into your application's behavior.

Why Sampling?¶

Cost Management: Reduce the amount of trace data stored and processed
Performance: Lower overhead on your application
Network Traffic: Reduce bandwidth usage between your app and Sematext
Focus on Important Data: Sample strategically to capture relevant traces

Sampling Strategies¶

Always On (Development)¶

Sample all traces - useful for development and debugging:

export OTEL_TRACES_SAMPLER=always_on

Use when:

Developing and testing
Debugging specific issues
Low traffic environments

Always Off¶

Disable all tracing:

export OTEL_TRACES_SAMPLER=always_off

Use when:

Temporarily disabling tracing
Feature flags for tracing control

Trace ID Ratio (Production)¶

Sample a percentage of traces randomly:

# Sample 10% of traces
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1

# Sample 1% of traces
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.01

# Sample 50% of traces
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.5

Use when:

Production environments
High-traffic applications
Predictable sampling needed

Parent-Based Sampling (Default)¶

Respects the sampling decision from the parent span:

export OTEL_TRACES_SAMPLER=parentbased_always_on
# or
export OTEL_TRACES_SAMPLER=parentbased_always_off
# or
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1

Use when:

Microservices architectures
Maintaining trace continuity across services
Default behavior for most SDKs

SDK-Specific Configuration¶

Java¶

# Via environment variables
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1

# Via system properties
java -javaagent:opentelemetry-javaagent.jar \
  -Dotel.traces.sampler=traceidratio \
  -Dotel.traces.sampler.arg=0.1 \
  -jar your-app.jar

Python¶

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import TraceIdRatioBasedSampler

# Configure 10% sampling
tracer_provider = TracerProvider(
    sampler=TraceIdRatioBasedSampler(0.1)
)
trace.set_tracer_provider(tracer_provider)

Node.js¶

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');

const sdk = new NodeSDK({
  sampler: new TraceIdRatioBasedSampler(0.1), // 10% sampling
  // ... other configuration
});

Go¶

import (
    "go.opentelemetry.io/otel/sdk/trace"
)

tp := trace.NewTracerProvider(
    trace.WithSampler(trace.TraceIDRatioBased(0.1)), // 10% sampling
    // ... other options
)

.NET¶

using OpenTelemetry.Trace;

builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .SetSampler(new TraceIdRatioBasedSampler(0.1)) // 10% sampling
        // ... other configuration
    );

Ruby¶

require 'opentelemetry/sdk'

OpenTelemetry::SDK.configure do |c|
  c.use_all
  c.add_span_processor(
    OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
      exporter,
      sampler: OpenTelemetry::SDK::Trace::Samplers.trace_id_ratio_based(0.1) # 10% sampling
    )
  )
end

Custom Sampling Logic¶

You can implement custom sampling based on specific criteria:

Sample by Endpoint¶

Sample different endpoints at different rates:

// Node.js example
class CustomSampler {
  shouldSample(context, traceId, spanName, spanKind, attributes) {
    // Sample 100% of /health checks
    if (attributes['http.target'] === '/health') {
      return { decision: SamplingDecision.NOT_RECORD };
    }
    
    // Sample 50% of /api/critical endpoints
    if (attributes['http.target']?.startsWith('/api/critical')) {
      return { decision: Math.random() < 0.5 ? 
        SamplingDecision.RECORD_AND_SAMPLED : 
        SamplingDecision.NOT_RECORD };
    }
    
    // Sample 1% of everything else
    return { decision: Math.random() < 0.01 ? 
      SamplingDecision.RECORD_AND_SAMPLED : 
      SamplingDecision.NOT_RECORD };
  }
}

Sample Errors at Higher Rate¶

Always capture traces with errors:

# Python example
class ErrorAwareSampler(Sampler):
    def should_sample(self, parent_context, trace_id, name, kind, attributes, links):
        # Always sample if error attribute is present
        if attributes and attributes.get("error", False):
            return SamplingResult(Decision.RECORD_AND_SAMPLE)
        
        # Otherwise use 10% sampling
        return TraceIdRatioBasedSampler(0.1).should_sample(
            parent_context, trace_id, name, kind, attributes, links
        )

Sampling Best Practices¶

Development Environment¶

Use always_on for complete visibility
No sampling during debugging
Enable all trace levels

Staging Environment¶

Use moderate sampling (10-50%)
Test sampling configuration
Validate sampling decisions

Production Environment¶

Start with conservative sampling (1-10%)
Adjust based on:
Traffic volume
Cost constraints
Performance impact
Monitor sampling effectiveness

High-Traffic Services¶

Very low sampling rates (0.1-1%)
Focus on error traces
Sample critical operations at higher rates

Sampling Recommendations by Traffic¶

Requests/sec	Recommended Sampling	Rationale
< 10	100% (always_on)	Capture everything
10-100	50-100%	High visibility needed
100-1,000	10-50%	Balance visibility and volume
1,000-10,000	1-10%	Reduce overhead
> 10,000	0.1-1%	Minimize impact

Head vs Tail Sampling¶

Head Sampling (Current)¶

Decision made at trace start
Configured in your application
Lower overhead
May miss interesting traces

Tail Sampling (Future)¶

Decision made after trace completion
Can sample based on trace characteristics
Requires more infrastructure
Better for capturing anomalies

Note: Tail sampling support is planned for future releases.

Monitoring Sampling Effectiveness¶

Check Sampling Rate¶

Monitor actual sampling rate in your application:

// Track sampling decisions
let totalRequests = 0;
let sampledRequests = 0;

// In your instrumentation
totalRequests++;
if (span.isRecording()) {
  sampledRequests++;
}

// Log sampling rate periodically
console.log(`Sampling rate: ${(sampledRequests/totalRequests * 100).toFixed(2)}%`);

Validate Coverage¶

Ensure important operations are being sampled:

Error traces captured
Critical business transactions included
Performance outliers detected

Troubleshooting Sampling¶

No Traces Appearing¶

Check sampling isn't set to always_off
Verify sampling rate isn't too low
Ensure parent-based sampling isn't blocking child spans

Too Many Traces¶

Reduce sampling rate
Check for sampling configuration in multiple places
Verify environment variables are being applied

Inconsistent Sampling¶

Use parent-based sampling for distributed systems
Ensure all services use compatible sampling strategies
Check for sampling overrides in code

Sampling

Trace Sampling¶

Why Sampling?¶

Sampling Strategies¶

Always On (Development)¶

Always Off¶

Trace ID Ratio (Production)¶

Parent-Based Sampling (Default)¶

SDK-Specific Configuration¶

Java¶

Python¶

Node.js¶

Go¶

.NET¶

Ruby¶

Custom Sampling Logic¶

Sample by Endpoint¶

Sample Errors at Higher Rate¶

Sampling Best Practices¶

Development Environment¶

Staging Environment¶

Production Environment¶

High-Traffic Services¶

Sampling Recommendations by Traffic¶

Head vs Tail Sampling¶

Head Sampling (Current)¶

Tail Sampling (Future)¶

Monitoring Sampling Effectiveness¶

Check Sampling Rate¶

Validate Coverage¶

Troubleshooting Sampling¶

No Traces Appearing¶

Too Many Traces¶

Inconsistent Sampling¶

Next Steps¶

Related Documentation¶