feat(observability): wire Phoenix OTLP tracing for local LLM routes [OMN-8697]#1289
Conversation
…OMN-8697]
- Add OTEL_EXPORTER_OTLP_ENDPOINT=http://phoenix:6006 and
OTEL_TRACES_EXPORTER=otlp to x-runtime-env in docker-compose.infra.yml
so all runtime containers export spans to Phoenix without needing the
tracing bundle to be explicitly selected.
- Include the tracing bundle in runtime bundle includes (bundles.yaml) so
catalog-driven deploys also wire Phoenix and inject OTEL env vars.
- Instrument _execute_llm_http_call in MixinLlmHttpTransport with a
gen_ai.{chat,text_completion} OpenTelemetry span that records:
gen_ai.request.model, gen_ai.system, gen_ai.operation.name,
server.address, omninode.correlation_id, gen_ai.usage.input_tokens,
gen_ai.usage.output_tokens (from response usage field).
- Add TestOtelLlmSpan (4 tests) to verify span creation, token count
extraction, text_completion vs chat detection, and span-on-error.
📝 WalkthroughWalkthroughThis PR integrates OpenTelemetry tracing into the LLM HTTP transport layer. Changes include: adding the tracing bundle to runtime configuration, setting OTLP endpoint variables in Docker Compose, wrapping the retry loop with span emission in the transport mixin, and adding unit tests validating span creation and token usage attributes. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Transport as LLM HTTP Transport
participant OTel as OpenTelemetry Tracer
participant OTLP as OTLP Exporter
participant LLM as LLM Service
Client->>Transport: _execute_llm_http_call(request)
Transport->>OTel: start_as_current_span("gen_ai.chat")
OTel->>OTel: Set span attributes<br/>(model, server, correlation)
Transport->>Transport: Enter retry loop
loop Retry Logic
Transport->>LLM: POST request
LLM-->>Transport: Response
alt Retry needed
Transport->>Transport: Check is_retriable()
else Success (2xx)
Transport->>Transport: Parse JSON
Transport->>OTel: Set token usage attributes<br/>(input_tokens, output_tokens)
else Error
Transport->>Transport: Raise InfraProtocolError
end
end
Transport->>OTel: End span
OTel->>OTLP: Export span
OTLP-->>Transport: Ack
Transport-->>Client: Return result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/unit/mixins/test_mixin_llm_http_transport.py (1)
1886-2022: Add one retry-path span test.These cases only cover single-attempt success/failure. They will not catch the easiest regression for this change: moving span creation inside the retry loop and exporting one span per attempt. A fail-once / succeed-once case that still expects exactly one finished span would pin down the intended “one span around the whole retry lifecycle” behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/mixins/test_mixin_llm_http_transport.py` around lines 1886 - 2022, Add a new async unit test that verifies only one gen_ai span is finished across retry attempts: implement a handler used by _make_mock_client that returns an error response on the first call and a successful response on the second, instantiate LlmTransportHarness and call _execute_llm_http_call with max_retries set to allow one retry, then assert self._exporter.get_finished_spans() has length 1 (and optionally assert the span name/attributes like "gen_ai.chat" and the model) to ensure span creation is around the whole retry lifecycle rather than per attempt; reference the existing test pattern using LlmTransportHarness._execute_llm_http_call, _make_mock_client, and self._exporter to locate where to add the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker/docker-compose.infra.yml`:
- Around line 214-218: The comment and x-runtime-env don't cover all runtime
services: add the OTEL vars to the service blocks that currently omit
x-runtime-env (agent-actions-consumer, skill-lifecycle-consumer,
context-audit-consumer, intelligence-api) or refactor those service stanzas to
inherit x-runtime-env so OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_TRACES_EXPORTER
are present for local compose; update the service definitions
(agent-actions-consumer, skill-lifecycle-consumer, context-audit-consumer,
intelligence-api) to include the OTEL env vars or reference x-runtime-env to
ensure local tracing parity with bundle-based deployments.
In `@src/omnibase_infra/mixins/mixin_llm_http_transport.py`:
- Around line 665-674: The span created in mixin_llm_http_transport.py using
_otel_trace.get_tracer(...).start_as_current_span should be updated to use
kind=SpanKind.CLIENT and to replace the deprecated "gen_ai.system" attribute
with "gen_ai.provider.name"; also stop emitting the full URL as "server.address"
— parse the URL (in the same scope where url is available) and emit
"server.name" (hostname/IP) and "server.port" (numeric port, if present)
instead, while keeping other attributes like "gen_ai.operation.name",
"gen_ai.request.model", "omninode.correlation_id", and "omninode.target" the
same to avoid leaking path/query/credentials.
---
Nitpick comments:
In `@tests/unit/mixins/test_mixin_llm_http_transport.py`:
- Around line 1886-2022: Add a new async unit test that verifies only one gen_ai
span is finished across retry attempts: implement a handler used by
_make_mock_client that returns an error response on the first call and a
successful response on the second, instantiate LlmTransportHarness and call
_execute_llm_http_call with max_retries set to allow one retry, then assert
self._exporter.get_finished_spans() has length 1 (and optionally assert the span
name/attributes like "gen_ai.chat" and the model) to ensure span creation is
around the whole retry lifecycle rather than per attempt; reference the existing
test pattern using LlmTransportHarness._execute_llm_http_call,
_make_mock_client, and self._exporter to locate where to add the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 915c7ed2-a69f-422a-aa5a-0904096c702f
📒 Files selected for processing (4)
docker/catalog/bundles.yamldocker/docker-compose.infra.ymlsrc/omnibase_infra/mixins/mixin_llm_http_transport.pytests/unit/mixins/test_mixin_llm_http_transport.py
| # --- OpenTelemetry (OMN-8697: wired to Phoenix for LLM span visibility) --- | ||
| # Phoenix runs alongside the runtime; all containers export spans via OTLP HTTP. | ||
| # OTEL_SERVICE_NAME is set per-service (see individual service blocks below). | ||
| OTEL_EXPORTER_OTLP_ENDPOINT: "http://phoenix:6006" | ||
| OTEL_TRACES_EXPORTER: "otlp" |
There was a problem hiding this comment.
x-runtime-env only covers a subset of the runtime profile.
agent-actions-consumer, skill-lifecycle-consumer, context-audit-consumer, and intelligence-api do not inherit *runtime-env, so they will not see these OTEL vars under local compose. That makes the Line 215 comment inaccurate today, and it also means local compose tracing can diverge from bundle-based runtime deployments if any of those entrypoints initialize tracing.
Based on learnings, "ensure any comments/docs about env-var behavior match the real activation logic."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/docker-compose.infra.yml` around lines 214 - 218, The comment and
x-runtime-env don't cover all runtime services: add the OTEL vars to the service
blocks that currently omit x-runtime-env (agent-actions-consumer,
skill-lifecycle-consumer, context-audit-consumer, intelligence-api) or refactor
those service stanzas to inherit x-runtime-env so OTEL_EXPORTER_OTLP_ENDPOINT
and OTEL_TRACES_EXPORTER are present for local compose; update the service
definitions (agent-actions-consumer, skill-lifecycle-consumer,
context-audit-consumer, intelligence-api) to include the OTEL env vars or
reference x-runtime-env to ensure local tracing parity with bundle-based
deployments.
| with _otel_trace.get_tracer("omnibase_infra.llm").start_as_current_span( | ||
| f"gen_ai.{_gen_ai_op}", | ||
| attributes={ | ||
| "gen_ai.system": "openai", | ||
| "gen_ai.operation.name": _gen_ai_op, | ||
| "gen_ai.request.model": str(payload.get("model", "unknown")), | ||
| "server.address": url, | ||
| "omninode.correlation_id": str(correlation_id), | ||
| "omninode.target": self._llm_target_name, | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
OpenTelemetry semantic conventions gen_ai span attributes 2025 2026
💡 Result:
OpenTelemetry semantic conventions for Generative AI (GenAI) span attributes are defined in the official documentation at https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/ and related pages. These conventions are currently in Development status (not yet Stable) as of 2026-04-14. Key span types include: - Inference spans (client calls to GenAI models): Span name {gen_ai.operation.name} {gen_ai.request.model}, span kind CLIENT (or INTERNAL if same process). - Embeddings spans. - Retrieval spans. - Execute tool spans: Span name execute_tool {gen_ai.tool.name}, span kind INTERNAL. Core required attributes (all strings unless noted): - gen_ai.operation.name (Required): e.g., "chat", "generate_content", "text_completion", "embeddings", "retrieve", "execute_tool", "create_agent", "invoke_agent". - gen_ai.provider.name (Required): e.g., "openai", "anthropic", "aws.bedrock", "gcp.vertex_ai". - gen_ai.request.model (Required for inference/embeddings): Model name, e.g., "gpt-4". Conditionally required/Recommended: - error.type (if error): Matches provider error code. - gen_ai.conversation.id (when available). - gen_ai.request.choice.count (int, if !=1). - gen_ai.output.type: e.g., "text", "json", "image". - gen_ai.response.finish_reasons (string[]): Reasons model stopped. - gen_ai.usage.input_tokens/output_tokens (int). Opt-in/Structured (large/sensitive, opt-in): - gen_ai.input.messages, gen_ai.output.messages (JSON arrays per schema). - gen_ai.system_instructions. - gen_ai.tool.definitions (any). Provider-specific overrides exist (e.g., OpenAI sets gen_ai.provider.name="openai"). Recent updates (2025): Renamed gen_ai.system to gen_ai.provider.name; added structured input/output attributes; migration via OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental (existing instrumentations default to v1.36.0 or prior). No major 2026 changes noted; still transitioning to Stable. Full attribute registry: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. For agent spans: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/ (create_agent, invoke_agent).
Citations:
- 1: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
- 2: https://opentelemetry.io/docs/specs/semconv/gen-ai/
- 3: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/openai.md
- 4: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-agent-spans.md
- 5: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
- 6: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/
- 7: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans
- 8: Update GenAI conventions based on system-specific naming guidance open-telemetry/semantic-conventions#2046
- 9: Using attributes for chat history on gen_ai spans and events open-telemetry/semantic-conventions#2179
🌐 Web query:
OpenTelemetry Python start_as_current_span SpanKind default
💡 Result:
In OpenTelemetry Python, the start_as_current_span method of the Tracer class has a default value for the kind parameter of SpanKind.INTERNAL. This is explicitly shown in the OpenTelemetry Python API source code where the method signature is defined as: python def start_as_current_span( self, name: str, context: Optional[Context] = None, kind: SpanKind = SpanKind.INTERNAL, # ... other parameters ) SpanKind.INTERNAL (value 0) indicates that the span is used internally by the application code.
Citations:
- 1: https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-api/src/opentelemetry/trace/__init__.py
- 2: https://github.com/open-telemetry/opentelemetry-python/blob/eed100c3/opentelemetry-api/src/opentelemetry/trace/__init__.py
🌐 Web query:
OpenTelemetry GenAI semantic conventions gen_ai.provider.name gen_ai.system
💡 Result:
In OpenTelemetry GenAI semantic conventions, gen_ai.provider.name is a required string attribute on GenAI spans (e.g., inference, embeddings, agent operations). It identifies the Generative AI provider as known by the client or server instrumentation, acting as a discriminator for provider-specific telemetry format. Examples: "openai", "aws.bedrock", "azure.ai.openai", "gcp.vertex_ai". It was renamed from the deprecated gen_ai.system in v1.37.0 (August 2025) to follow system-specific naming policy. gen_ai.system is a deprecated attribute from earlier versions (e.g., v1.27.0, v1.36.0), previously used to identify the GenAI product/system (e.g., "openai"). Existing instrumentations using v1.36.0 or prior should not switch by default but can opt-in to latest via OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental. Note: gen_ai.system_instructions is a separate attribute for system messages/instructions provided to the model, not related to provider.
Citations:
- 1: https://opentelemetry.io/docs/specs/semconv/gen-ai/
- 2: http://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
- 3: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/openai.md
- 4: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-spans.md
- 5: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/
- 6: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
- 7: Update GenAI conventions based on system-specific naming guidance open-telemetry/semantic-conventions#2046
- 8: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
- 9: https://github.com/open-telemetry/semantic-conventions/releases/tag/v1.37.0
🌐 Web query:
OpenTelemetry semantic conventions server.address attribute definition
💡 Result:
The OpenTelemetry semantic conventions define the server.address attribute as a string representing the server domain name if available without reverse DNS lookup; otherwise, the IP address or Unix domain socket name. Examples include example.com, 10.1.2.80, or /tmp/my.sock. This attribute is part of the Server attributes section, applicable to connection-based network interactions (e.g., TCP, UDP like QUIC/DNS) where one side initiates the connection. Key notes: - Stability: Stable (per registry) or as specified in contexts like general attributes (Recommended level). - From the client side, when communicating through intermediaries (e.g., proxies), it SHOULD represent the server address behind the intermediaries if available. - For IP-based communication, prefer DNS hostname; use IP if hostname unavailable (no reverse DNS lookup). - On client: remote service name (e.g., from URL host); on server: local service name as seen by clients. - For Unix domain sockets: remote endpoint (client side), local endpoint (server side). - For pipes (network.transport="pipe"): absolute path to the pipe file, or empty string if anonymous. - Used in HTTP spans/metrics (e.g., http.server.request.duration), but HTTP-specific guidance may reference general definition or warn about cardinality risks when opt-in. Note: An open GitHub issue (#2443, Jun 2025) discusses minor definition variations across HTTP client metrics/spans vs. general server attributes, but the core definition remains consistent across official docs.
Citations:
- 1: http://opentelemetry.io/docs/specs/semconv/registry/attributes/server/
- 2: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md
- 3: https://opentelemetry.io/docs/specs/semconv/registry/attributes/server/
- 4: https://opentelemetry.io/docs/specs/semconv/attributes-registry/server/
- 5: https://opentelemetry.io/docs/specs/semconv/general/attributes/
🌐 Web query:
OpenTelemetry server attributes host port URL semantics
💡 Result:
In OpenTelemetry semantic conventions, server attributes server.address and server.port describe the server in connection-based network interactions (TCP, UDP like QUIC/DNS) where one side initiates the connection. - server.address (string, Stable, Recommended): Server domain name if available without reverse DNS lookup; otherwise IP address or Unix domain socket name (e.g., example.com, 10.1.2.80, /tmp/my.sock). From client side, prefers logical address behind proxies/intermediaries if available. Represents logical server name: DNS hostname for IP, path for pipes/Unix sockets. - server.port (int, Stable, Recommended): Server port number (e.g., 80, 8080, 443). From client side, prefers port behind proxies. These are general attributes but critical in HTTP spans: HTTP Client spans (SpanKind.CLIENT): server.address and server.port are Required, derived from request URL/Host header/:authority (e.g., https://example.com:8080 sets server.address=example.com, server.port=8080). Use logical/original destination, not physical peer after load balancing. HTTP Server spans (SpanKind.SERVER): server.address Recommended, server.port Conditionally Required if server.address set. Set from Host header/:authority, X-Forwarded-Host, Forwarded#host (original client-perceived server), not local bind address. They enable consistent client/server correlation (same values ideally, ignoring proxies/multi-IP), low-cardinality grouping, and sampling decisions. Prefer over deprecated net.peer.* / net.host.*. No direct 'server URL' attribute; use url.full (client) or url.scheme + server.address + url.path + url.query (server).
Citations:
- 1: https://opentelemetry.io/docs/specs/semconv/attributes-registry/server/
- 2: https://opentelemetry.io/docs/specs/semconv/registry/attributes/server/
- 3: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md
- 4: https://opentelemetry.io/docs/specs/semconv/general/attributes/
- 5: https://opentelemetry.io/docs/specs/semconv/http/http-spans/
- 6: http://opentelemetry.io/docs/specs/semconv/registry/attributes/server/
- 7: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/http/
Fix OpenTelemetry GenAI span attributes and span kind to match current semantic conventions.
For inference spans calling LLMs, set kind=SpanKind.CLIENT (not the INTERNAL default); replace deprecated gen_ai.system with gen_ai.provider.name; and split server.address into hostname/IP and port attributes (do not embed the full URL in server.address as it risks leaking credentials or path/query parameters into telemetry).
🛠️ Suggested fix
+ _parsed_url = urlparse(url)
with _otel_trace.get_tracer("omnibase_infra.llm").start_as_current_span(
f"gen_ai.{_gen_ai_op}",
+ kind=_otel_trace.SpanKind.CLIENT,
attributes={
- "gen_ai.system": "openai",
+ "gen_ai.provider.name": "openai",
"gen_ai.operation.name": _gen_ai_op,
"gen_ai.request.model": str(payload.get("model", "unknown")),
- "server.address": url,
+ "server.address": _parsed_url.hostname or "",
+ **(
+ {"server.port": _parsed_url.port}
+ if _parsed_url.port is not None
+ else {}
+ ),
"omninode.correlation_id": str(correlation_id),
"omninode.target": self._llm_target_name,
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with _otel_trace.get_tracer("omnibase_infra.llm").start_as_current_span( | |
| f"gen_ai.{_gen_ai_op}", | |
| attributes={ | |
| "gen_ai.system": "openai", | |
| "gen_ai.operation.name": _gen_ai_op, | |
| "gen_ai.request.model": str(payload.get("model", "unknown")), | |
| "server.address": url, | |
| "omninode.correlation_id": str(correlation_id), | |
| "omninode.target": self._llm_target_name, | |
| }, | |
| _parsed_url = urlparse(url) | |
| with _otel_trace.get_tracer("omnibase_infra.llm").start_as_current_span( | |
| f"gen_ai.{_gen_ai_op}", | |
| kind=_otel_trace.SpanKind.CLIENT, | |
| attributes={ | |
| "gen_ai.provider.name": "openai", | |
| "gen_ai.operation.name": _gen_ai_op, | |
| "gen_ai.request.model": str(payload.get("model", "unknown")), | |
| "server.address": _parsed_url.hostname or "", | |
| **( | |
| {"server.port": _parsed_url.port} | |
| if _parsed_url.port is not None | |
| else {} | |
| ), | |
| "omninode.correlation_id": str(correlation_id), | |
| "omninode.target": self._llm_target_name, | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/omnibase_infra/mixins/mixin_llm_http_transport.py` around lines 665 -
674, The span created in mixin_llm_http_transport.py using
_otel_trace.get_tracer(...).start_as_current_span should be updated to use
kind=SpanKind.CLIENT and to replace the deprecated "gen_ai.system" attribute
with "gen_ai.provider.name"; also stop emitting the full URL as "server.address"
— parse the URL (in the same scope where url is available) and emit
"server.name" (hostname/IP) and "server.port" (numeric port, if present)
instead, while keeping other attributes like "gen_ai.operation.name",
"gen_ai.request.model", "omninode.correlation_id", and "omninode.target" the
same to avoid leaking path/query/credentials.
Summary
OTEL_EXPORTER_OTLP_ENDPOINT=http://phoenix:6006andOTEL_TRACES_EXPORTER=otlpadded tox-runtime-envindocker-compose.infra.ymland via thetracingbundle (now included byruntimeinbundles.yaml)_execute_llm_http_callinMixinLlmHttpTransportwith agen_ai.{chat,text_completion}OpenTelemetry span that captures:gen_ai.request.model,gen_ai.system,gen_ai.operation.name,server.address,omninode.correlation_id,gen_ai.usage.input_tokens,gen_ai.usage.output_tokens(from the responseusagefield)TestOtelLlmSpan) verify span creation, token count extraction, operation name detection, and span-on-error behavior usingInMemorySpanExporterThe existing
configure_tracing()call inservice_kernel.py(OMN-3811) already initializes the OTLP exporter at startup — this PR closes the gap by (1) setting the endpoint env var in runtime containers and (2) adding the LLM-specific span around the actual inference calls.Test plan
test_mixin_llm_http_transport.pytests passTestOtelLlmSpantests pass (test_span_created_on_success,test_span_records_token_counts,test_span_uses_text_completion_op_without_messages,test_span_emitted_even_on_error)http://192.168.86.201:6006showsgen_ai.chatspans within 5 minutes of an LLM routing eventSummary by CodeRabbit
New Features
Infrastructure
Tests