OpenTelemetry for AI/ML Observability: Instrumenting the Intelligent Enterprise
How OpenTelemetry enables comprehensive observability for AI/ML workloads with GenAI semantic conventions, distributed tracing, and unified telemetry collection
Table of Contents
The AI Observability Challenge
As organizations deploy increasingly complex AI/ML systems - from simple model inference to sophisticated multi-agent architectures - traditional observability tools fall short. AI workloads present unique challenges: non-deterministic outputs, token-based costs, prompt engineering iterations, and complex chain-of-thought reasoning that spans multiple services.
OpenTelemetry (OTel) has emerged as the industry standard for vendor-neutral observability, and with the introduction of GenAI semantic conventions, it now provides first-class support for AI/ML workloads.
Why OpenTelemetry for AI?
The Vendor Lock-in Problem
Every AI observability vendor - LangSmith, Weights & Biases, Arize - uses proprietary instrumentation. This creates:
- Vendor lock-in making it expensive to switch providers
- Fragmented telemetry across different backends
- Inconsistent data models that prevent unified analysis
- Integration overhead with each new tool
The OTel Solution
OpenTelemetry provides:
- Vendor-neutral instrumentation that works with any backend
- Unified data model for traces, metrics, and logs
- Automatic instrumentation for popular AI frameworks
- Standardized semantic conventions for GenAI systems
OpenTelemetry Architecture
OpenTelemetry for AI/ML
Core Components
1. OpenTelemetry API
The API provides the interface for instrumentation without implementation. This is the only OTel dependency your application code should import directly.
2. OpenTelemetry SDK
The SDK implements the API with configurable providers, exporters, and processors for each signal type (traces, metrics, logs).
3. OTel Collector
A vendor-agnostic proxy that receives, processes, and exports telemetry data. Can run as an agent (sidecar) or gateway (centralized).
4. GenAI Instrumentation
Specialized instrumentation libraries for AI frameworks like OpenAI, Anthropic, LangChain, and LlamaIndex.
GenAI Semantic Conventions
OpenTelemetry has standardized attributes for AI/ML systems:
| Attribute | Description | Example |
|---|---|---|
| gen_ai.system | The AI vendor | openai, anthropic |
| gen_ai.request.model | Model requested | gpt-4-turbo |
| gen_ai.operation.name | Operation type | chat, embedding |
| gen_ai.request.temperature | Temperature param | 0.7 |
| gen_ai.usage.input_tokens | Input token count | 150 |
| gen_ai.usage.output_tokens | Output token count | 500 |
| gen_ai.response.finish_reasons | Stop reason | stop, length |
Implementation: Python
Basic Setup
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
# Initialize tracing
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer("ai-service")
# Add OTLP exporter
otlp_trace_exporter = OTLPSpanExporter(endpoint="localhost:4317")
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(otlp_trace_exporter)
)
# Initialize metrics
meter = metrics.get_meter("ai-service")
# Create AI-specific metrics
token_counter = meter.create_counter(
name="gen_ai.tokens.total",
description="Total tokens processed",
unit="tokens"
)
latency_histogram = meter.create_histogram(
name="gen_ai.request.duration",
description="Request latency distribution",
unit="ms"
)
cost_counter = meter.create_counter(
name="gen_ai.cost.total",
description="Estimated inference cost",
unit="USD"
)
Instrumenting LLM Calls
import time
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import openai
tracer = trace.get_tracer("llm-service")
def call_llm(prompt: str, model: str = "gpt-4"):
with tracer.start_as_current_span(
f"chat {model}",
kind=trace.SpanKind.CLIENT
) as span:
# Set GenAI semantic convention attributes
span.set_attribute("gen_ai.system", "openai")
span.set_attribute("gen_ai.request.model", model)
span.set_attribute("gen_ai.operation.name", "chat")
span.set_attribute("gen_ai.request.temperature", 0.7)
start_time = time.time()
try:
client = openai.OpenAI()
response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}]
)
# Record response attributes
span.set_attribute("gen_ai.response.model", response.model)
span.set_attribute("gen_ai.usage.input_tokens",
response.usage.prompt_tokens)
span.set_attribute("gen_ai.usage.output_tokens",
response.usage.completion_tokens)
# Record metrics
total_tokens = response.usage.total_tokens
token_counter.add(total_tokens, {"model": model})
latency_ms = (time.time() - start_time) * 1000
latency_histogram.record(latency_ms, {"model": model})
# Estimate cost (GPT-4 pricing)
cost = (response.usage.prompt_tokens * 0.00003 +
response.usage.completion_tokens * 0.00006)
cost_counter.add(cost, {"model": model})
span.set_status(Status(StatusCode.OK))
return response
except Exception as e:
span.set_status(Status(StatusCode.ERROR, str(e)))
span.record_exception(e)
raise
Auto-Instrumentation
For zero-code instrumentation, use the OpenLLMetry library:
pip install opentelemetry-instrumentation-openai
# Set environment variables
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=my-ai-service
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
# Run with auto-instrumentation
opentelemetry-instrument python main.py
Collector Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 1024
# Filter sensitive prompt data
attributes:
actions:
- key: gen_ai.input.messages
action: delete
# Add deployment context
resource:
attributes:
- key: deployment.environment
value: production
action: upsert
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, attributes, resource]
exporters: [otlp/jaeger]
metrics:
receivers: [otlp]
processors: [batch, resource]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [batch, resource]
exporters: [loki]
Deployment Patterns
Collector Deployment Topology
Agent Mode (DaemonSet)
Deploy as a DaemonSet for per-node collection:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: otel-collector-agent
spec:
selector:
matchLabels:
app: otel-collector
template:
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:latest
args: ["--config=/conf/otel-config.yaml"]
ports:
- containerPort: 4317
- containerPort: 4318
Gateway Mode (Deployment)
Deploy as a centralized gateway for aggregation:
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector-gateway
spec:
replicas: 3
template:
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:latest
resources:
requests:
memory: "512Mi"
cpu: "500m"
Best Practices
1. Use Semantic Conventions
Always follow GenAI semantic conventions for interoperability across tools and vendors.
2. Protect Privacy
Never capture prompts and completions by default. Make content capture opt-in and implement PII redaction.
3. Deploy Collectors
Always route telemetry through collectors rather than direct export. This enables batching, filtering, and multi-destination routing.
4. Implement Tail-Based Sampling
For high-volume inference, use tail-based sampling to capture interesting traces (errors, slow requests) while reducing volume.
5. Track Costs
Include token counts and estimated costs as metrics for budget monitoring and optimization.
Business Impact
| Metric | Improvement |
|---|---|
| Mean Time to Detection | 70% reduction |
| Cost Visibility | Complete token-level tracking |
| Vendor Flexibility | Zero lock-in with standard protocols |
| Debug Time | 60% faster root cause analysis |
| Compliance | Full audit trail for AI operations |
Key Takeaways
- OpenTelemetry is the standard for AI/ML observability with GenAI semantic conventions
- Vendor-neutral instrumentation prevents lock-in and enables best-of-breed backends
- Unified telemetry correlates traces, metrics, and logs for complete visibility
- Privacy-first design protects sensitive prompt data while maintaining observability
- Collector architecture enables flexible routing, sampling, and processing