Cost Optimization

Tracing Cost Optimization¶

Distributed tracing can generate significant data volumes, especially in high-traffic applications. This guide provides strategies to optimize your tracing costs while maintaining essential observability.

Note: Tracing costs are primarily driven by the volume of trace data ingested and stored. Any data you filter out during collection or reduce through sampling directly impacts your costs. Sampling must be configured in your applications - the Sematext Agent forwards all traces it receives.

Sampling Strategies¶

Sampling is the most effective way to reduce tracing costs while maintaining statistical significance. For detailed configuration instructions, see the Sampling Configuration Guide.

Head-Based Sampling (Recommended Start)¶

Sample traces at the application level before they're sent:

Production Environments:

High-traffic services: Keep 1-5% of traces (0.01-0.05 sampling rate)
Medium-traffic services: Keep 5-10% of traces (0.05-0.1 sampling rate)
Low-traffic services: Keep 10-50% of traces (0.1-0.5 sampling rate)

Development/Staging: - Use 100% sampling for complete visibility during testing

Service-Specific Sampling¶

Apply different sampling rates based on service importance:

Critical Services (Higher Sampling):

Payment processing: 50-100%
Authentication: 20-50%
Core APIs: 10-20%

Support Services (Lower Sampling):

Health checks: 100% (always sample - low volume, critical error detection)
Static content: 0.1-1%
Internal utilities: 1-5%

Attribute and Data Optimization¶

Remove Unnecessary Attributes¶

Filter out attributes that don't provide actionable insights:

High-Volume, Low-Value Attributes:

Large request/response bodies
Detailed stack traces for successful operations
Verbose debug information
Redundant metadata

OpenTelemetry Attribute Filtering:

// Example: Filter out large attributes
const span = tracer.startSpan('operation');
// Don't add large request bodies
// span.setAttributes({ 'http.request.body': largeBody }); // Avoid this

// Instead, add size or summary information
span.setAttributes({ 
  'http.request.size': largeBody.length,
  'http.request.type': 'json'
});

Optimize Span Names and Operations¶

Use consistent, concise span names:

Good Examples: - GET /api/users/{id} - database.query - payment.process

Avoid: - GET /api/users/12345 (high cardinality) - Very long descriptive operation name with lots of details - Dynamic span names with timestamps or IDs

Infrastructure and Agent Optimization¶

Agent Configuration¶

For agent-specific optimizations, see the Agent OpenTelemetry Configuration guide.

Service and Environment Strategies¶

Environment Separation¶

Production:

Lower sampling rates (1-10%)
Focus on error and performance traces
Longer retention for critical paths

Staging:

Medium sampling rates (10-50%)
Comprehensive error tracking
Shorter retention periods

Development:

Higher sampling rates or 100% sampling
Full attribute collection for debugging
Shortest retention periods

Service Prioritization¶

Tier 1 (Business Critical):

Customer-facing APIs
Payment processing
Authentication services
Sampling: Keep 10-50% of traces

Tier 2 (Important):

Internal APIs
Data processing
Integration services
Sampling: Keep 5-20% of traces

Tier 3 (Supporting):

Health checks
Metrics collection
Background tasks
Sampling: Keep 1-5% of traces (except health checks - keep 100%)

Retention and Plan Optimization¶

Data Retention Strategy¶

Short-Term (1-7 days):

High-volume, low-priority traces
Development environment traces
Automated testing traces

Medium-Term (7-30 days):

Production error traces
Performance baseline traces
Critical business flows

Long-Term (30+ days):

Compliance-required traces
Security audit trails
Business intelligence traces

Plan Selection¶

Choose the right Sematext plan based on your needs:

Basic Plan:

Small applications
Low to medium traffic
Basic retention needs

Standard Plan:

Growing applications
Medium to high traffic
Enhanced analytics needs

Pro Plan:

Large-scale applications
Enterprise requirements
Advanced analytics and longer retention

See detailed features and pricing at sematext.com/pricing

Monitoring and Alerting Cost Impact¶

Cost-Aware Alerting¶

Set up alerts that balance coverage with cost:

High-Value Alerts (Always Monitor):

Error rate spikes
Critical service failures
SLA violations

Medium-Value Alerts (Sampled Monitoring):

Performance degradation
Unusual traffic patterns
Service dependencies

Cost Monitoring¶

Track Key Metrics:

Daily trace volume by service
Sampling effectiveness
Storage utilization
Plan usage trends

Cost Optimization Alerts:

Alert when trace volume increases unexpectedly
Monitor sampling rate effectiveness
Track storage growth trends

Implementation Checklist¶

Phase 1: Assessment¶

[ ] Analyze current trace volume by service
[ ] Identify high-volume, low-value traces
[ ] Evaluate service criticality tiers
[ ] Review current sampling configuration

Phase 2: Quick Wins¶

[ ] Implement basic sampling (5-10% for high-traffic services)
[ ] Remove unnecessary attributes from spans
[ ] Consider filtering static content traces (keep health checks for error detection)
[ ] Optimize span naming conventions

Phase 3: Advanced Optimization¶

[ ] Set up service-specific sampling rates
[ ] Configure attribute filtering rules
[ ] Optimize retention policies

Phase 4: Monitoring and Tuning¶

[ ] Set up cost monitoring dashboards
[ ] Implement cost-aware alerting
[ ] Regular review and adjustment of sampling rates
[ ] Monitor for sampling bias in critical paths

Best Practices¶

Sampling Best Practices¶

Start conservative: Begin with lower sampling rates and increase as needed
Preserve errors: Always sample error traces at higher rates
Monitor bias: Ensure sampling doesn't hide important patterns
Service correlation: Consider trace propagation when setting service-specific rates

Attribute Management¶

Business value: Only collect attributes that provide actionable insights
Cardinality control: Avoid high-cardinality attributes (user IDs, timestamps)
Size limits: Set reasonable limits on attribute value sizes
Sensitive data: Never include secrets, passwords, or PII in traces

Regular Reviews¶

Monthly: Review trace volumes and costs
Quarterly: Adjust sampling rates based on traffic patterns
Annually: Evaluate plan needs and retention requirements

Troubleshooting Cost Issues¶

High Costs Checklist¶

Check sampling rates - Are they too high for your traffic volume?
Review service distribution - Are non-critical services generating most traces?
Analyze trace sizes - Are spans containing unnecessary large attributes?
Examine retention - Are you storing traces longer than needed?
Service proliferation - Are test or temporary services generating traces?

Common Cost Drivers¶

No sampling in production environments
Health check traces not filtered out
Large request/response bodies in span attributes
High-cardinality span names (with IDs or timestamps)
Development traces not properly separated