OpenTelemetry for AI/ML Observability: Instrumenting the Intelligent Enterprise

How OpenTelemetry enables comprehensive observability for AI/ML workloads with GenAI semantic conventions, distributed tracing, and unified telemetry collection

GT
Gonnect Team
January 16, 202514 min read
OpenTelemetryPythonAI/MLObservabilityDistributed Tracing

The AI Observability Challenge

As organizations deploy increasingly complex AI/ML systems - from simple model inference to sophisticated multi-agent architectures - traditional observability tools fall short. AI workloads present unique challenges: non-deterministic outputs, token-based costs, prompt engineering iterations, and complex chain-of-thought reasoning that spans multiple services.

OpenTelemetry (OTel) has emerged as the industry standard for vendor-neutral observability, and with the introduction of GenAI semantic conventions, it now provides first-class support for AI/ML workloads.

Why OpenTelemetry for AI?

The Vendor Lock-in Problem

Every AI observability vendor - LangSmith, Weights & Biases, Arize - uses proprietary instrumentation. This creates:

  • Vendor lock-in making it expensive to switch providers
  • Fragmented telemetry across different backends
  • Inconsistent data models that prevent unified analysis
  • Integration overhead with each new tool

The OTel Solution

OpenTelemetry provides:

  • Vendor-neutral instrumentation that works with any backend
  • Unified data model for traces, metrics, and logs
  • Automatic instrumentation for popular AI frameworks
  • Standardized semantic conventions for GenAI systems

OpenTelemetry Architecture

OpenTelemetry for AI/ML

Loading diagram...

Core Components

1. OpenTelemetry API

The API provides the interface for instrumentation without implementation. This is the only OTel dependency your application code should import directly.

2. OpenTelemetry SDK

The SDK implements the API with configurable providers, exporters, and processors for each signal type (traces, metrics, logs).

3. OTel Collector

A vendor-agnostic proxy that receives, processes, and exports telemetry data. Can run as an agent (sidecar) or gateway (centralized).

4. GenAI Instrumentation

Specialized instrumentation libraries for AI frameworks like OpenAI, Anthropic, LangChain, and LlamaIndex.

GenAI Semantic Conventions

OpenTelemetry has standardized attributes for AI/ML systems:

AttributeDescriptionExample
gen_ai.systemThe AI vendoropenai, anthropic
gen_ai.request.modelModel requestedgpt-4-turbo
gen_ai.operation.nameOperation typechat, embedding
gen_ai.request.temperatureTemperature param0.7
gen_ai.usage.input_tokensInput token count150
gen_ai.usage.output_tokensOutput token count500
gen_ai.response.finish_reasonsStop reasonstop, length

Implementation: Python

Basic Setup

from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter

# Initialize tracing
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer("ai-service")

# Add OTLP exporter
otlp_trace_exporter = OTLPSpanExporter(endpoint="localhost:4317")
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(otlp_trace_exporter)
)

# Initialize metrics
meter = metrics.get_meter("ai-service")

# Create AI-specific metrics
token_counter = meter.create_counter(
    name="gen_ai.tokens.total",
    description="Total tokens processed",
    unit="tokens"
)

latency_histogram = meter.create_histogram(
    name="gen_ai.request.duration",
    description="Request latency distribution",
    unit="ms"
)

cost_counter = meter.create_counter(
    name="gen_ai.cost.total",
    description="Estimated inference cost",
    unit="USD"
)

Instrumenting LLM Calls

import time
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import openai

tracer = trace.get_tracer("llm-service")

def call_llm(prompt: str, model: str = "gpt-4"):
    with tracer.start_as_current_span(
        f"chat {model}",
        kind=trace.SpanKind.CLIENT
    ) as span:
        # Set GenAI semantic convention attributes
        span.set_attribute("gen_ai.system", "openai")
        span.set_attribute("gen_ai.request.model", model)
        span.set_attribute("gen_ai.operation.name", "chat")
        span.set_attribute("gen_ai.request.temperature", 0.7)

        start_time = time.time()

        try:
            client = openai.OpenAI()
            response = client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": prompt}]
            )

            # Record response attributes
            span.set_attribute("gen_ai.response.model", response.model)
            span.set_attribute("gen_ai.usage.input_tokens",
                             response.usage.prompt_tokens)
            span.set_attribute("gen_ai.usage.output_tokens",
                             response.usage.completion_tokens)

            # Record metrics
            total_tokens = response.usage.total_tokens
            token_counter.add(total_tokens, {"model": model})

            latency_ms = (time.time() - start_time) * 1000
            latency_histogram.record(latency_ms, {"model": model})

            # Estimate cost (GPT-4 pricing)
            cost = (response.usage.prompt_tokens * 0.00003 +
                   response.usage.completion_tokens * 0.00006)
            cost_counter.add(cost, {"model": model})

            span.set_status(Status(StatusCode.OK))
            return response

        except Exception as e:
            span.set_status(Status(StatusCode.ERROR, str(e)))
            span.record_exception(e)
            raise

Auto-Instrumentation

For zero-code instrumentation, use the OpenLLMetry library:

pip install opentelemetry-instrumentation-openai

# Set environment variables
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=my-ai-service
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true

# Run with auto-instrumentation
opentelemetry-instrument python main.py

Collector Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

  # Filter sensitive prompt data
  attributes:
    actions:
      - key: gen_ai.input.messages
        action: delete

  # Add deployment context
  resource:
    attributes:
      - key: deployment.environment
        value: production
        action: upsert

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true

  prometheus:
    endpoint: 0.0.0.0:8889

  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes, resource]
      exporters: [otlp/jaeger]

    metrics:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [prometheus]

    logs:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [loki]

Deployment Patterns

Collector Deployment Topology

Loading diagram...

Agent Mode (DaemonSet)

Deploy as a DaemonSet for per-node collection:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-collector-agent
spec:
  selector:
    matchLabels:
      app: otel-collector
  template:
    spec:
      containers:
      - name: collector
        image: otel/opentelemetry-collector-contrib:latest
        args: ["--config=/conf/otel-config.yaml"]
        ports:
        - containerPort: 4317
        - containerPort: 4318

Gateway Mode (Deployment)

Deploy as a centralized gateway for aggregation:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector-gateway
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: collector
        image: otel/opentelemetry-collector-contrib:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"

Best Practices

1. Use Semantic Conventions

Always follow GenAI semantic conventions for interoperability across tools and vendors.

2. Protect Privacy

Never capture prompts and completions by default. Make content capture opt-in and implement PII redaction.

3. Deploy Collectors

Always route telemetry through collectors rather than direct export. This enables batching, filtering, and multi-destination routing.

4. Implement Tail-Based Sampling

For high-volume inference, use tail-based sampling to capture interesting traces (errors, slow requests) while reducing volume.

5. Track Costs

Include token counts and estimated costs as metrics for budget monitoring and optimization.

Business Impact

MetricImprovement
Mean Time to Detection70% reduction
Cost VisibilityComplete token-level tracking
Vendor FlexibilityZero lock-in with standard protocols
Debug Time60% faster root cause analysis
ComplianceFull audit trail for AI operations

Key Takeaways

  1. OpenTelemetry is the standard for AI/ML observability with GenAI semantic conventions
  2. Vendor-neutral instrumentation prevents lock-in and enables best-of-breed backends
  3. Unified telemetry correlates traces, metrics, and logs for complete visibility
  4. Privacy-first design protects sensitive prompt data while maintaining observability
  5. Collector architecture enables flexible routing, sampling, and processing

Further Reading