OpenTelemetry for AI/ML Observability: Instrumenting the Intelligent Enterprise

The AI Observability Challenge

As organizations deploy increasingly complex AI/ML systems - from simple model inference to sophisticated multi-agent architectures - traditional observability tools fall short. AI workloads present unique challenges: non-deterministic outputs, token-based costs, prompt engineering iterations, and complex chain-of-thought reasoning that spans multiple services.

OpenTelemetry (OTel) has emerged as the industry standard for vendor-neutral observability, and with the introduction of GenAI semantic conventions, it now provides first-class support for AI/ML workloads.

Why OpenTelemetry for AI?

The Vendor Lock-in Problem

Every AI observability vendor - LangSmith, Weights & Biases, Arize - uses proprietary instrumentation. This creates:

Vendor lock-in making it expensive to switch providers
Fragmented telemetry across different backends
Inconsistent data models that prevent unified analysis
Integration overhead with each new tool

The OTel Solution

OpenTelemetry provides:

Vendor-neutral instrumentation that works with any backend
Unified data model for traces, metrics, and logs
Automatic instrumentation for popular AI frameworks
Standardized semantic conventions for GenAI systems

OpenTelemetry Architecture

OpenTelemetry for AI/ML

Core Components

1. OpenTelemetry API

The API provides the interface for instrumentation without implementation. This is the only OTel dependency your application code should import directly.

2. OpenTelemetry SDK

The SDK implements the API with configurable providers, exporters, and processors for each signal type (traces, metrics, logs).

3. OTel Collector

A vendor-agnostic proxy that receives, processes, and exports telemetry data. Can run as an agent (sidecar) or gateway (centralized).

4. GenAI Instrumentation

Specialized instrumentation libraries for AI frameworks like OpenAI, Anthropic, LangChain, and LlamaIndex.

GenAI Semantic Conventions

OpenTelemetry has standardized attributes for AI/ML systems:

Attribute	Description	Example
gen_ai.system	The AI vendor	openai, anthropic
gen_ai.request.model	Model requested	gpt-4-turbo
gen_ai.operation.name	Operation type	chat, embedding
gen_ai.request.temperature	Temperature param	0.7
gen_ai.usage.input_tokens	Input token count	150
gen_ai.usage.output_tokens	Output token count	500
gen_ai.response.finish_reasons	Stop reason	stop, length

Implementation: Python

Basic Setup

from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

# Initialize tracing
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer("ai-service")

# Add OTLP exporter
otlp_trace_exporter = OTLPSpanExporter(endpoint="localhost:4317")
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(otlp_trace_exporter)
)

# Initialize metrics
meter = metrics.get_meter("ai-service")

# Create AI-specific metrics
token_counter = meter.create_counter(
    name="gen_ai.tokens.total",
    description="Total tokens processed",
    unit="tokens"
)

latency_histogram = meter.create_histogram(
    name="gen_ai.request.duration",
    description="Request latency distribution",
    unit="ms"
)

cost_counter = meter.create_counter(
    name="gen_ai.cost.total",
    description="Estimated inference cost",
    unit="USD"
)

Instrumenting LLM Calls

import time
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import openai

tracer = trace.get_tracer("llm-service")

def call_llm(prompt: str, model: str = "gpt-4"):
    with tracer.start_as_current_span(
        f"chat {model}",
        kind=trace.SpanKind.CLIENT
    ) as span:
        # Set GenAI semantic convention attributes
        span.set_attribute("gen_ai.system", "openai")
        span.set_attribute("gen_ai.request.model", model)
        span.set_attribute("gen_ai.operation.name", "chat")
        span.set_attribute("gen_ai.request.temperature", 0.7)

        start_time = time.time()

        try:
            client = openai.OpenAI()
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}]
            )

            # Record response attributes
            span.set_attribute("gen_ai.response.model", response.model)
            span.set_attribute("gen_ai.usage.input_tokens",
                             response.usage.prompt_tokens)
            span.set_attribute("gen_ai.usage.output_tokens",
                             response.usage.completion_tokens)

            # Record metrics
            total_tokens = response.usage.total_tokens
            token_counter.add(total_tokens, {"model": model})

            latency_ms = (time.time() - start_time) * 1000
            latency_histogram.record(latency_ms, {"model": model})

            # Estimate cost (GPT-4 pricing)
            cost = (response.usage.prompt_tokens * 0.00003 +
                   response.usage.completion_tokens * 0.00006)
            cost_counter.add(cost, {"model": model})

            span.set_status(Status(StatusCode.OK))
            return response

        except Exception as e:
            span.set_status(Status(StatusCode.ERROR, str(e)))
            span.record_exception(e)
            raise

Auto-Instrumentation

For zero-code instrumentation, use the OpenLLMetry library:

pip install opentelemetry-instrumentation-openai

# Set environment variables
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=my-ai-service
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true

# Run with auto-instrumentation
opentelemetry-instrument python main.py

Collector Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

  # Filter sensitive prompt data
  attributes:
    actions:
      - key: gen_ai.input.messages
        action: delete

  # Add deployment context
  resource:
    attributes:
      - key: deployment.environment
        value: production
        action: upsert

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true

  prometheus:
    endpoint: 0.0.0.0:8889

  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes, resource]
      exporters: [otlp/jaeger]

    metrics:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [prometheus]

    logs:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [loki]

Deployment Patterns

Collector Deployment Topology

Agent Mode (DaemonSet)

Deploy as a DaemonSet for per-node collection:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector-agent
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    spec:
      containers:
      - name: collector
        image: otel/opentelemetry-collector-contrib:latest
        args: ["--config=/conf/otel-config.yaml"]
        ports:
        - containerPort: 4317
        - containerPort: 4318

Gateway Mode (Deployment)

Deploy as a centralized gateway for aggregation:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector-gateway
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: collector
        image: otel/opentelemetry-collector-contrib:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"

Best Practices

1. Use Semantic Conventions

Always follow GenAI semantic conventions for interoperability across tools and vendors.

2. Protect Privacy

Never capture prompts and completions by default. Make content capture opt-in and implement PII redaction.

3. Deploy Collectors

Always route telemetry through collectors rather than direct export. This enables batching, filtering, and multi-destination routing.

4. Implement Tail-Based Sampling

For high-volume inference, use tail-based sampling to capture interesting traces (errors, slow requests) while reducing volume.

5. Track Costs

Include token counts and estimated costs as metrics for budget monitoring and optimization.

Business Impact

Metric	Improvement
Mean Time to Detection	70% reduction
Cost Visibility	Complete token-level tracking
Vendor Flexibility	Zero lock-in with standard protocols
Debug Time	60% faster root cause analysis
Compliance	Full audit trail for AI operations

Key Takeaways

OpenTelemetry is the standard for AI/ML observability with GenAI semantic conventions
Vendor-neutral instrumentation prevents lock-in and enables best-of-breed backends
Unified telemetry correlates traces, metrics, and logs for complete visibility
Privacy-first design protects sensitive prompt data while maintaining observability
Collector architecture enables flexible routing, sampling, and processing