Sampling
Trace Sampling¶
Sampling helps control the volume of traces sent to Sematext Cloud, reducing costs and performance overhead while maintaining visibility into your application's behavior.
Why Sampling?¶
- Cost Management: Reduce the amount of trace data stored and processed
- Performance: Lower overhead on your application
- Network Traffic: Reduce bandwidth usage between your app and Sematext
- Focus on Important Data: Sample strategically to capture relevant traces
Sampling Strategies¶
Always On (Development)¶
Sample all traces - useful for development and debugging:
Use when:
- Developing and testing
- Debugging specific issues
- Low traffic environments
Always Off¶
Disable all tracing:
Use when:
- Temporarily disabling tracing
- Feature flags for tracing control
Trace ID Ratio (Production)¶
Sample a percentage of traces randomly:
# Sample 10% of traces
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1
# Sample 1% of traces
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.01
# Sample 50% of traces
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.5
Use when:
- Production environments
- High-traffic applications
- Predictable sampling needed
Parent-Based Sampling (Default)¶
Respects the sampling decision from the parent span:
export OTEL_TRACES_SAMPLER=parentbased_always_on
# or
export OTEL_TRACES_SAMPLER=parentbased_always_off
# or
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1
Use when:
- Microservices architectures
- Maintaining trace continuity across services
- Default behavior for most SDKs
SDK-Specific Configuration¶
Java¶
# Via environment variables
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1
# Via system properties
java -javaagent:opentelemetry-javaagent.jar \
  -Dotel.traces.sampler=traceidratio \
  -Dotel.traces.sampler.arg=0.1 \
  -jar your-app.jar
Python¶
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import TraceIdRatioBasedSampler
# Configure 10% sampling
tracer_provider = TracerProvider(
    sampler=TraceIdRatioBasedSampler(0.1)
)
trace.set_tracer_provider(tracer_provider)
Node.js¶
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');
const sdk = new NodeSDK({
  sampler: new TraceIdRatioBasedSampler(0.1), // 10% sampling
  // ... other configuration
});
Go¶
import (
    "go.opentelemetry.io/otel/sdk/trace"
)
tp := trace.NewTracerProvider(
    trace.WithSampler(trace.TraceIDRatioBased(0.1)), // 10% sampling
    // ... other options
)
.NET¶
using OpenTelemetry.Trace;
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .SetSampler(new TraceIdRatioBasedSampler(0.1)) // 10% sampling
        // ... other configuration
    );
Ruby¶
require 'opentelemetry/sdk'
OpenTelemetry::SDK.configure do |c|
  c.use_all
  c.add_span_processor(
    OpenTelemetry::SDK::Trace::Export::BatchSpanProcessor.new(
      exporter,
      sampler: OpenTelemetry::SDK::Trace::Samplers.trace_id_ratio_based(0.1) # 10% sampling
    )
  )
end
Custom Sampling Logic¶
You can implement custom sampling based on specific criteria:
Sample by Endpoint¶
Sample different endpoints at different rates:
// Node.js example
class CustomSampler {
  shouldSample(context, traceId, spanName, spanKind, attributes) {
    // Sample 100% of /health checks
    if (attributes['http.target'] === '/health') {
      return { decision: SamplingDecision.NOT_RECORD };
    }
    
    // Sample 50% of /api/critical endpoints
    if (attributes['http.target']?.startsWith('/api/critical')) {
      return { decision: Math.random() < 0.5 ? 
        SamplingDecision.RECORD_AND_SAMPLED : 
        SamplingDecision.NOT_RECORD };
    }
    
    // Sample 1% of everything else
    return { decision: Math.random() < 0.01 ? 
      SamplingDecision.RECORD_AND_SAMPLED : 
      SamplingDecision.NOT_RECORD };
  }
}
Sample Errors at Higher Rate¶
Always capture traces with errors:
# Python example
class ErrorAwareSampler(Sampler):
    def should_sample(self, parent_context, trace_id, name, kind, attributes, links):
        # Always sample if error attribute is present
        if attributes and attributes.get("error", False):
            return SamplingResult(Decision.RECORD_AND_SAMPLE)
        
        # Otherwise use 10% sampling
        return TraceIdRatioBasedSampler(0.1).should_sample(
            parent_context, trace_id, name, kind, attributes, links
        )
Sampling Best Practices¶
Development Environment¶
- Use always_onfor complete visibility
- No sampling during debugging
- Enable all trace levels
Staging Environment¶
- Use moderate sampling (10-50%)
- Test sampling configuration
- Validate sampling decisions
Production Environment¶
- Start with conservative sampling (1-10%)
- Adjust based on:
- Traffic volume
- Cost constraints
- Performance impact
- Monitor sampling effectiveness
High-Traffic Services¶
- Very low sampling rates (0.1-1%)
- Focus on error traces
- Sample critical operations at higher rates
Sampling Recommendations by Traffic¶
| Requests/sec | Recommended Sampling | Rationale | 
|---|---|---|
| < 10 | 100% (always_on) | Capture everything | 
| 10-100 | 50-100% | High visibility needed | 
| 100-1,000 | 10-50% | Balance visibility and volume | 
| 1,000-10,000 | 1-10% | Reduce overhead | 
| > 10,000 | 0.1-1% | Minimize impact | 
Head vs Tail Sampling¶
Head Sampling (Current)¶
- Decision made at trace start
- Configured in your application
- Lower overhead
- May miss interesting traces
Tail Sampling (Future)¶
- Decision made after trace completion
- Can sample based on trace characteristics
- Requires more infrastructure
- Better for capturing anomalies
Note: Tail sampling support is planned for future releases.
Monitoring Sampling Effectiveness¶
Check Sampling Rate¶
Monitor actual sampling rate in your application:
// Track sampling decisions
let totalRequests = 0;
let sampledRequests = 0;
// In your instrumentation
totalRequests++;
if (span.isRecording()) {
  sampledRequests++;
}
// Log sampling rate periodically
console.log(`Sampling rate: ${(sampledRequests/totalRequests * 100).toFixed(2)}%`);
Validate Coverage¶
Ensure important operations are being sampled:
- Error traces captured
- Critical business transactions included
- Performance outliers detected
Troubleshooting Sampling¶
No Traces Appearing¶
- Check sampling isn't set to always_off
- Verify sampling rate isn't too low
- Ensure parent-based sampling isn't blocking child spans
Too Many Traces¶
- Reduce sampling rate
- Check for sampling configuration in multiple places
- Verify environment variables are being applied
Inconsistent Sampling¶
- Use parent-based sampling for distributed systems
- Ensure all services use compatible sampling strategies
- Check for sampling overrides in code
Next Steps¶
- Configure your SDKs
- Monitor trace volume
- Optimize costs with advanced strategies
- Set up alerts
- Troubleshooting Guide