Skip to content

feat(observability): wire Phoenix OTLP tracing for local LLM routes [OMN-8697]#1289

Merged
jonahgabriel merged 2 commits into
mainfrom
jonahgabriel/omn-8697-phoenix-llm-tracing
Apr 14, 2026
Merged

feat(observability): wire Phoenix OTLP tracing for local LLM routes [OMN-8697]#1289
jonahgabriel merged 2 commits into
mainfrom
jonahgabriel/omn-8697-phoenix-llm-tracing

Conversation

@jonahgabriel

@jonahgabriel jonahgabriel commented Apr 14, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Wires OTEL env vars to runtime containers: OTEL_EXPORTER_OTLP_ENDPOINT=http://phoenix:6006 and OTEL_TRACES_EXPORTER=otlp added to x-runtime-env in docker-compose.infra.yml and via the tracing bundle (now included by runtime in bundles.yaml)
  • Instruments _execute_llm_http_call in MixinLlmHttpTransport with a gen_ai.{chat,text_completion} OpenTelemetry span that captures: gen_ai.request.model, gen_ai.system, gen_ai.operation.name, server.address, omninode.correlation_id, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens (from the response usage field)
  • 4 new unit tests (TestOtelLlmSpan) verify span creation, token count extraction, operation name detection, and span-on-error behavior using InMemorySpanExporter

The existing configure_tracing() call in service_kernel.py (OMN-3811) already initializes the OTLP exporter at startup — this PR closes the gap by (1) setting the endpoint env var in runtime containers and (2) adding the LLM-specific span around the actual inference calls.

Test plan

  • All 98 existing test_mixin_llm_http_transport.py tests pass
  • 4 new TestOtelLlmSpan tests pass (test_span_created_on_success, test_span_records_token_counts, test_span_uses_text_completion_op_without_messages, test_span_emitted_even_on_error)
  • mypy clean on the modified mixin
  • After deploy via deploy agent: Phoenix UI at http://192.168.86.201:6006 shows gen_ai.chat spans within 5 minutes of an LLM routing event

Summary by CodeRabbit

  • New Features

    • Added OpenTelemetry tracing for LLM HTTP calls, capturing request metadata and token usage metrics throughout retry attempts.
  • Infrastructure

    • Enabled OpenTelemetry tracing in runtime environment with OTLP exporter configuration.
  • Tests

    • Added comprehensive unit tests validating span emission and attribute recording for LLM HTTP operations.

…OMN-8697]

- Add OTEL_EXPORTER_OTLP_ENDPOINT=http://phoenix:6006 and
  OTEL_TRACES_EXPORTER=otlp to x-runtime-env in docker-compose.infra.yml
  so all runtime containers export spans to Phoenix without needing the
  tracing bundle to be explicitly selected.
- Include the tracing bundle in runtime bundle includes (bundles.yaml) so
  catalog-driven deploys also wire Phoenix and inject OTEL env vars.
- Instrument _execute_llm_http_call in MixinLlmHttpTransport with a
  gen_ai.{chat,text_completion} OpenTelemetry span that records:
  gen_ai.request.model, gen_ai.system, gen_ai.operation.name,
  server.address, omninode.correlation_id, gen_ai.usage.input_tokens,
  gen_ai.usage.output_tokens (from response usage field).
- Add TestOtelLlmSpan (4 tests) to verify span creation, token count
  extraction, text_completion vs chat detection, and span-on-error.
@coderabbitai

coderabbitai Bot commented Apr 14, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR integrates OpenTelemetry tracing into the LLM HTTP transport layer. Changes include: adding the tracing bundle to runtime configuration, setting OTLP endpoint variables in Docker Compose, wrapping the retry loop with span emission in the transport mixin, and adding unit tests validating span creation and token usage attributes.

Changes

Cohort / File(s) Summary
Infrastructure Configuration
docker/catalog/bundles.yaml, docker/docker-compose.infra.yml
Extended runtime bundle to include tracing. Added concrete OpenTelemetry environment variables (OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_TRACES_EXPORTER) to Docker Compose shared template, enabling OTLP export to phoenix service.
LLM Transport Instrumentation
src/omnibase_infra/mixins/mixin_llm_http_transport.py
Wrapped retry loop with OpenTelemetry span context to emit gen_ai.chat or gen_ai.text_completion spans. Added request/model/server/correlation attributes to spans. Captured token usage from JSON response and recorded as span attributes. Refactored retry condition checks to use retry_state.is_retriable() and classification.record_circuit_failure instead of classification.should_retry in certain branches.
Span Validation Tests
tests/unit/mixins/test_mixin_llm_http_transport.py
Added TestOtelLlmSpan test class with in-memory tracer fixture and four test methods validating span creation/completion on success, token count recording, text_completion operation detection, and error cases.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Transport as LLM HTTP Transport
    participant OTel as OpenTelemetry Tracer
    participant OTLP as OTLP Exporter
    participant LLM as LLM Service

    Client->>Transport: _execute_llm_http_call(request)
    Transport->>OTel: start_as_current_span("gen_ai.chat")
    OTel->>OTel: Set span attributes<br/>(model, server, correlation)
    Transport->>Transport: Enter retry loop
    loop Retry Logic
        Transport->>LLM: POST request
        LLM-->>Transport: Response
        alt Retry needed
            Transport->>Transport: Check is_retriable()
        else Success (2xx)
            Transport->>Transport: Parse JSON
            Transport->>OTel: Set token usage attributes<br/>(input_tokens, output_tokens)
        else Error
            Transport->>Transport: Raise InfraProtocolError
        end
    end
    Transport->>OTel: End span
    OTel->>OTLP: Export span
    OTLP-->>Transport: Ack
    Transport-->>Client: Return result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hops with glee through traces bright,
Spans now wrap the retry flight,
Tokens counted, flows made clear,
Phoenix glows—observability's here!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: wiring Phoenix OTLP tracing for local LLM routes, which is the primary objective across all file modifications (bundles.yaml, docker-compose.infra.yml, and the instrumentation of MixinLlmHttpTransport).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jonahgabriel/omn-8697-phoenix-llm-tracing

Comment @coderabbitai help to get the list of available commands and usage tips.

@jonahgabriel jonahgabriel enabled auto-merge April 14, 2026 10:42
@jonahgabriel jonahgabriel disabled auto-merge April 14, 2026 10:43
@jonahgabriel jonahgabriel enabled auto-merge April 14, 2026 10:43
@jonahgabriel jonahgabriel added this pull request to the merge queue Apr 14, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unit/mixins/test_mixin_llm_http_transport.py (1)

1886-2022: Add one retry-path span test.

These cases only cover single-attempt success/failure. They will not catch the easiest regression for this change: moving span creation inside the retry loop and exporting one span per attempt. A fail-once / succeed-once case that still expects exactly one finished span would pin down the intended “one span around the whole retry lifecycle” behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/mixins/test_mixin_llm_http_transport.py` around lines 1886 - 2022,
Add a new async unit test that verifies only one gen_ai span is finished across
retry attempts: implement a handler used by _make_mock_client that returns an
error response on the first call and a successful response on the second,
instantiate LlmTransportHarness and call _execute_llm_http_call with max_retries
set to allow one retry, then assert self._exporter.get_finished_spans() has
length 1 (and optionally assert the span name/attributes like "gen_ai.chat" and
the model) to ensure span creation is around the whole retry lifecycle rather
than per attempt; reference the existing test pattern using
LlmTransportHarness._execute_llm_http_call, _make_mock_client, and
self._exporter to locate where to add the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/docker-compose.infra.yml`:
- Around line 214-218: The comment and x-runtime-env don't cover all runtime
services: add the OTEL vars to the service blocks that currently omit
x-runtime-env (agent-actions-consumer, skill-lifecycle-consumer,
context-audit-consumer, intelligence-api) or refactor those service stanzas to
inherit x-runtime-env so OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_TRACES_EXPORTER
are present for local compose; update the service definitions
(agent-actions-consumer, skill-lifecycle-consumer, context-audit-consumer,
intelligence-api) to include the OTEL env vars or reference x-runtime-env to
ensure local tracing parity with bundle-based deployments.

In `@src/omnibase_infra/mixins/mixin_llm_http_transport.py`:
- Around line 665-674: The span created in mixin_llm_http_transport.py using
_otel_trace.get_tracer(...).start_as_current_span should be updated to use
kind=SpanKind.CLIENT and to replace the deprecated "gen_ai.system" attribute
with "gen_ai.provider.name"; also stop emitting the full URL as "server.address"
— parse the URL (in the same scope where url is available) and emit
"server.name" (hostname/IP) and "server.port" (numeric port, if present)
instead, while keeping other attributes like "gen_ai.operation.name",
"gen_ai.request.model", "omninode.correlation_id", and "omninode.target" the
same to avoid leaking path/query/credentials.

---

Nitpick comments:
In `@tests/unit/mixins/test_mixin_llm_http_transport.py`:
- Around line 1886-2022: Add a new async unit test that verifies only one gen_ai
span is finished across retry attempts: implement a handler used by
_make_mock_client that returns an error response on the first call and a
successful response on the second, instantiate LlmTransportHarness and call
_execute_llm_http_call with max_retries set to allow one retry, then assert
self._exporter.get_finished_spans() has length 1 (and optionally assert the span
name/attributes like "gen_ai.chat" and the model) to ensure span creation is
around the whole retry lifecycle rather than per attempt; reference the existing
test pattern using LlmTransportHarness._execute_llm_http_call,
_make_mock_client, and self._exporter to locate where to add the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 915c7ed2-a69f-422a-aa5a-0904096c702f

📥 Commits

Reviewing files that changed from the base of the PR and between 70562dc and b311443.

📒 Files selected for processing (4)
  • docker/catalog/bundles.yaml
  • docker/docker-compose.infra.yml
  • src/omnibase_infra/mixins/mixin_llm_http_transport.py
  • tests/unit/mixins/test_mixin_llm_http_transport.py

Comment on lines +214 to +218
# --- OpenTelemetry (OMN-8697: wired to Phoenix for LLM span visibility) ---
# Phoenix runs alongside the runtime; all containers export spans via OTLP HTTP.
# OTEL_SERVICE_NAME is set per-service (see individual service blocks below).
OTEL_EXPORTER_OTLP_ENDPOINT: "http://phoenix:6006"
OTEL_TRACES_EXPORTER: "otlp"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

x-runtime-env only covers a subset of the runtime profile.

agent-actions-consumer, skill-lifecycle-consumer, context-audit-consumer, and intelligence-api do not inherit *runtime-env, so they will not see these OTEL vars under local compose. That makes the Line 215 comment inaccurate today, and it also means local compose tracing can diverge from bundle-based runtime deployments if any of those entrypoints initialize tracing.
Based on learnings, "ensure any comments/docs about env-var behavior match the real activation logic."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/docker-compose.infra.yml` around lines 214 - 218, The comment and
x-runtime-env don't cover all runtime services: add the OTEL vars to the service
blocks that currently omit x-runtime-env (agent-actions-consumer,
skill-lifecycle-consumer, context-audit-consumer, intelligence-api) or refactor
those service stanzas to inherit x-runtime-env so OTEL_EXPORTER_OTLP_ENDPOINT
and OTEL_TRACES_EXPORTER are present for local compose; update the service
definitions (agent-actions-consumer, skill-lifecycle-consumer,
context-audit-consumer, intelligence-api) to include the OTEL env vars or
reference x-runtime-env to ensure local tracing parity with bundle-based
deployments.

Comment on lines +665 to +674
with _otel_trace.get_tracer("omnibase_infra.llm").start_as_current_span(
f"gen_ai.{_gen_ai_op}",
attributes={
"gen_ai.system": "openai",
"gen_ai.operation.name": _gen_ai_op,
"gen_ai.request.model": str(payload.get("model", "unknown")),
"server.address": url,
"omninode.correlation_id": str(correlation_id),
"omninode.target": self._llm_target_name,
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

OpenTelemetry semantic conventions gen_ai span attributes 2025 2026

💡 Result:

OpenTelemetry semantic conventions for Generative AI (GenAI) span attributes are defined in the official documentation at https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ and related pages. These conventions are currently in Development status (not yet Stable) as of 2026-04-14. Key span types include: - Inference spans (client calls to GenAI models): Span name {gen_ai.operation.name} {gen_ai.request.model}, span kind CLIENT (or INTERNAL if same process). - Embeddings spans. - Retrieval spans. - Execute tool spans: Span name execute_tool {gen_ai.tool.name}, span kind INTERNAL. Core required attributes (all strings unless noted): - gen_ai.operation.name (Required): e.g., "chat", "generate_content", "text_completion", "embeddings", "retrieve", "execute_tool", "create_agent", "invoke_agent". - gen_ai.provider.name (Required): e.g., "openai", "anthropic", "aws.bedrock", "gcp.vertex_ai". - gen_ai.request.model (Required for inference/embeddings): Model name, e.g., "gpt-4". Conditionally required/Recommended: - error.type (if error): Matches provider error code. - gen_ai.conversation.id (when available). - gen_ai.request.choice.count (int, if !=1). - gen_ai.output.type: e.g., "text", "json", "image". - gen_ai.response.finish_reasons (string[]): Reasons model stopped. - gen_ai.usage.input_tokens/output_tokens (int). Opt-in/Structured (large/sensitive, opt-in): - gen_ai.input.messages, gen_ai.output.messages (JSON arrays per schema). - gen_ai.system_instructions. - gen_ai.tool.definitions (any). Provider-specific overrides exist (e.g., OpenAI sets gen_ai.provider.name="openai"). Recent updates (2025): Renamed gen_ai.system to gen_ai.provider.name; added structured input/output attributes; migration via OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental (existing instrumentations default to v1.36.0 or prior). No major 2026 changes noted; still transitioning to Stable. Full attribute registry: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. For agent spans: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/ (create_agent, invoke_agent).

Citations:


🌐 Web query:

OpenTelemetry Python start_as_current_span SpanKind default

💡 Result:

In OpenTelemetry Python, the start_as_current_span method of the Tracer class has a default value for the kind parameter of SpanKind.INTERNAL. This is explicitly shown in the OpenTelemetry Python API source code where the method signature is defined as: python def start_as_current_span( self, name: str, context: Optional[Context] = None, kind: SpanKind = SpanKind.INTERNAL, # ... other parameters ) SpanKind.INTERNAL (value 0) indicates that the span is used internally by the application code.

Citations:


🌐 Web query:

OpenTelemetry GenAI semantic conventions gen_ai.provider.name gen_ai.system

💡 Result:

In OpenTelemetry GenAI semantic conventions, gen_ai.provider.name is a required string attribute on GenAI spans (e.g., inference, embeddings, agent operations). It identifies the Generative AI provider as known by the client or server instrumentation, acting as a discriminator for provider-specific telemetry format. Examples: "openai", "aws.bedrock", "azure.ai.openai", "gcp.vertex_ai". It was renamed from the deprecated gen_ai.system in v1.37.0 (August 2025) to follow system-specific naming policy. gen_ai.system is a deprecated attribute from earlier versions (e.g., v1.27.0, v1.36.0), previously used to identify the GenAI product/system (e.g., "openai"). Existing instrumentations using v1.36.0 or prior should not switch by default but can opt-in to latest via OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental. Note: gen_ai.system_instructions is a separate attribute for system messages/instructions provided to the model, not related to provider.

Citations:


🌐 Web query:

OpenTelemetry semantic conventions server.address attribute definition

💡 Result:

The OpenTelemetry semantic conventions define the server.address attribute as a string representing the server domain name if available without reverse DNS lookup; otherwise, the IP address or Unix domain socket name. Examples include example.com, 10.1.2.80, or /tmp/my.sock. This attribute is part of the Server attributes section, applicable to connection-based network interactions (e.g., TCP, UDP like QUIC/DNS) where one side initiates the connection. Key notes: - Stability: Stable (per registry) or as specified in contexts like general attributes (Recommended level). - From the client side, when communicating through intermediaries (e.g., proxies), it SHOULD represent the server address behind the intermediaries if available. - For IP-based communication, prefer DNS hostname; use IP if hostname unavailable (no reverse DNS lookup). - On client: remote service name (e.g., from URL host); on server: local service name as seen by clients. - For Unix domain sockets: remote endpoint (client side), local endpoint (server side). - For pipes (network.transport="pipe"): absolute path to the pipe file, or empty string if anonymous. - Used in HTTP spans/metrics (e.g., http.server.request.duration), but HTTP-specific guidance may reference general definition or warn about cardinality risks when opt-in. Note: An open GitHub issue (#2443, Jun 2025) discusses minor definition variations across HTTP client metrics/spans vs. general server attributes, but the core definition remains consistent across official docs.

Citations:


🌐 Web query:

OpenTelemetry server attributes host port URL semantics

💡 Result:

In OpenTelemetry semantic conventions, server attributes server.address and server.port describe the server in connection-based network interactions (TCP, UDP like QUIC/DNS) where one side initiates the connection. - server.address (string, Stable, Recommended): Server domain name if available without reverse DNS lookup; otherwise IP address or Unix domain socket name (e.g., example.com, 10.1.2.80, /tmp/my.sock). From client side, prefers logical address behind proxies/intermediaries if available. Represents logical server name: DNS hostname for IP, path for pipes/Unix sockets. - server.port (int, Stable, Recommended): Server port number (e.g., 80, 8080, 443). From client side, prefers port behind proxies. These are general attributes but critical in HTTP spans: HTTP Client spans (SpanKind.CLIENT): server.address and server.port are Required, derived from request URL/Host header/:authority (e.g., https://example.com:8080 sets server.address=example.com, server.port=8080). Use logical/original destination, not physical peer after load balancing. HTTP Server spans (SpanKind.SERVER): server.address Recommended, server.port Conditionally Required if server.address set. Set from Host header/:authority, X-Forwarded-Host, Forwarded#host (original client-perceived server), not local bind address. They enable consistent client/server correlation (same values ideally, ignoring proxies/multi-IP), low-cardinality grouping, and sampling decisions. Prefer over deprecated net.peer.* / net.host.*. No direct 'server URL' attribute; use url.full (client) or url.scheme + server.address + url.path + url.query (server).

Citations:


Fix OpenTelemetry GenAI span attributes and span kind to match current semantic conventions.

For inference spans calling LLMs, set kind=SpanKind.CLIENT (not the INTERNAL default); replace deprecated gen_ai.system with gen_ai.provider.name; and split server.address into hostname/IP and port attributes (do not embed the full URL in server.address as it risks leaking credentials or path/query parameters into telemetry).

🛠️ Suggested fix
+        _parsed_url = urlparse(url)
         with _otel_trace.get_tracer("omnibase_infra.llm").start_as_current_span(
             f"gen_ai.{_gen_ai_op}",
+            kind=_otel_trace.SpanKind.CLIENT,
             attributes={
-                "gen_ai.system": "openai",
+                "gen_ai.provider.name": "openai",
                 "gen_ai.operation.name": _gen_ai_op,
                 "gen_ai.request.model": str(payload.get("model", "unknown")),
-                "server.address": url,
+                "server.address": _parsed_url.hostname or "",
+                **(
+                    {"server.port": _parsed_url.port}
+                    if _parsed_url.port is not None
+                    else {}
+                ),
                 "omninode.correlation_id": str(correlation_id),
                 "omninode.target": self._llm_target_name,
             },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with _otel_trace.get_tracer("omnibase_infra.llm").start_as_current_span(
f"gen_ai.{_gen_ai_op}",
attributes={
"gen_ai.system": "openai",
"gen_ai.operation.name": _gen_ai_op,
"gen_ai.request.model": str(payload.get("model", "unknown")),
"server.address": url,
"omninode.correlation_id": str(correlation_id),
"omninode.target": self._llm_target_name,
},
_parsed_url = urlparse(url)
with _otel_trace.get_tracer("omnibase_infra.llm").start_as_current_span(
f"gen_ai.{_gen_ai_op}",
kind=_otel_trace.SpanKind.CLIENT,
attributes={
"gen_ai.provider.name": "openai",
"gen_ai.operation.name": _gen_ai_op,
"gen_ai.request.model": str(payload.get("model", "unknown")),
"server.address": _parsed_url.hostname or "",
**(
{"server.port": _parsed_url.port}
if _parsed_url.port is not None
else {}
),
"omninode.correlation_id": str(correlation_id),
"omninode.target": self._llm_target_name,
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/omnibase_infra/mixins/mixin_llm_http_transport.py` around lines 665 -
674, The span created in mixin_llm_http_transport.py using
_otel_trace.get_tracer(...).start_as_current_span should be updated to use
kind=SpanKind.CLIENT and to replace the deprecated "gen_ai.system" attribute
with "gen_ai.provider.name"; also stop emitting the full URL as "server.address"
— parse the URL (in the same scope where url is available) and emit
"server.name" (hostname/IP) and "server.port" (numeric port, if present)
instead, while keeping other attributes like "gen_ai.operation.name",
"gen_ai.request.model", "omninode.correlation_id", and "omninode.target" the
same to avoid leaking path/query/credentials.

Merged via the queue into main with commit 5c98585 Apr 14, 2026
60 of 61 checks passed
@jonahgabriel jonahgabriel deleted the jonahgabriel/omn-8697-phoenix-llm-tracing branch April 14, 2026 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant