feat(openai-agents): GenAI semconv compliance#3837
feat(openai-agents): GenAI semconv compliance#3837max-deygin-traceloop merged 23 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughRefactors OpenAI Agents instrumentation to adopt OTel GenAI semantic conventions: replaces indexed prompt/completion attributes with parts-based JSON Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TracingProcessor
participant SpanStore
participant Exporter
Client->>TracingProcessor: on_span_start(event)
TracingProcessor->>TracingProcessor: _start_* helper (agent/handoff/tool/generation/realtime)
TracingProcessor->>SpanStore: create span (provider/operation/kind attrs)
Client->>TracingProcessor: on_span_end(event, data)
TracingProcessor->>TracingProcessor: _end_* helper (extract inputs, outputs, tools, finish reasons)
TracingProcessor->>SpanStore: set gen_ai.input.messages / gen_ai.output.messages / tool definitions / finish reasons
TracingProcessor->>Exporter: export finalized spans
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py (1)
1-8: Consider extending_testing.pyto cover new message/tool attributes.The shared compliance tests in
opentelemetry.semconv_ai._testing(per the context snippet atpackages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py:1-56) currently validate renamedGEN_AI_*constants but do not include tests for:
GenAIAttributes.GEN_AI_INPUT_MESSAGESGenAIAttributes.GEN_AI_OUTPUT_MESSAGESGenAIAttributes.GEN_AI_TOOL_DEFINITIONSThese constants are imported from the upstream
opentelemetry.semconv._incubating.attributes.gen_ai_attributesmodule and used heavily throughout_hooks.pyand_realtime_wrappers.py. While the import will fail if the constants don't exist, explicit compliance tests would provide better validation and documentation.Would you like me to draft compliance tests for these new upstream constants in
_testing.py?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py` around lines 1 - 8, Add explicit compliance assertions in the shared test module _testing.py to verify the presence and expected values of the three new GenAI constants: GenAIAttributes.GEN_AI_INPUT_MESSAGES, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, and GenAIAttributes.GEN_AI_TOOL_DEFINITIONS; import these from opentelemetry.semconv._incubating.attributes.gen_ai_attributes (same source used by _hooks.py and _realtime_wrappers.py) and add simple equality/assertion checks that the exported constants in opentelemetry.semconv_ai match the upstream constants so the tests will fail if those names or values are missing or renamed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py`:
- Around line 1-8: Add explicit compliance assertions in the shared test module
_testing.py to verify the presence and expected values of the three new GenAI
constants: GenAIAttributes.GEN_AI_INPUT_MESSAGES,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, and
GenAIAttributes.GEN_AI_TOOL_DEFINITIONS; import these from
opentelemetry.semconv._incubating.attributes.gen_ai_attributes (same source used
by _hooks.py and _realtime_wrappers.py) and add simple equality/assertion checks
that the exported constants in opentelemetry.semconv_ai match the upstream
constants so the tests will fail if those names or values are missing or
renamed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 55f0bea4-7b6a-434a-89a3-58e2c26ee263
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-openai-agents/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/pyproject.tomlpackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
💤 Files with no reviewable changes (1)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
doronkopit5
left a comment
There was a problem hiding this comment.
did you succeed testing sending traces with this instrumentation using a sample-app?
fe44b61 to
aadaa1a
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
491-563: Consider standardizing or documenting non-standard operation names.The
gen_ai.operation.namevalues "speech", "transcription", and "speech_group" are not standard OpenTelemetry GenAI semantic convention values (which include "chat", "text_completion", "embeddings"). While these may be intentional extensions for OpenAI realtime operations, observability tools may not recognize them.If these are intentional extensions, consider documenting them. Alternatively, consider whether "chat" could be used for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 491 - 563, The GenAI operation names used in _hooks.py (GenAIAttributes.GEN_AI_OPERATION_NAME values "speech", "transcription", "speech_group" set in the speech_attributes, transcription_attributes, and speech_group_attributes objects and span names "openai.realtime.speech", "openai.realtime.transcription", "openai.realtime.speech_group" in tracer.start_span) are non-standard; either map them to standard GenAI semantic convention values (e.g., use "chat" or another accepted value) or explicitly document these as OpenAI realtime extensions. Update the attribute assignments and span naming to use the chosen standardized token(s) (or add constants documenting the extension), and add a concise comment/docstring near the tracer.start_span calls explaining that these are intentional realtime extensions if you keep them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 491-563: The GenAI operation names used in _hooks.py
(GenAIAttributes.GEN_AI_OPERATION_NAME values "speech", "transcription",
"speech_group" set in the speech_attributes, transcription_attributes, and
speech_group_attributes objects and span names "openai.realtime.speech",
"openai.realtime.transcription", "openai.realtime.speech_group" in
tracer.start_span) are non-standard; either map them to standard GenAI semantic
convention values (e.g., use "chat" or another accepted value) or explicitly
document these as OpenAI realtime extensions. Update the attribute assignments
and span naming to use the chosen standardized token(s) (or add constants
documenting the extension), and add a concise comment/docstring near the
tracer.start_span calls explaining that these are intentional realtime
extensions if you keep them.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e0cf5a2d-869d-40a2-9e91-3ccda106c0df
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-openai-agents/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/pyproject.tomlpackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
💤 Files with no reviewable changes (1)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
✅ Files skipped from review due to trivial changes (2)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
- packages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
aadaa1a to
a3bed2a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
460-463:⚠️ Potential issue | 🔴 CriticalReplace deprecated
gen_ai.systemwithgen_ai.provider.namefor semantic conventions compliance.The OpenTelemetry GenAI semantic conventions (post-2025) deprecated
gen_ai.systemin favor ofgen_ai.provider.name. Line 462 and all other response/generation span attributes in this instrumentation currently use the deprecatedGEN_AI_SYSTEMconstant with value"openai". SinceGEN_AI_PROVIDER_NAMEis already available in the importedGenAIAttributesand the project targets a 2026 codebase, update to useGenAIAttributes.GEN_AI_PROVIDER_NAME: "openai"instead.This applies to all response/generation/realtime operation spans (lines 462, 480, 502, 528 in
_hooks.pyand lines 263, 356 in_realtime_wrappers.py). Agent-level spans correctly usegen_ai.system: "openai_agents"(per the learning, framework-level spans should distinguish themselves), but API response spans should reflect the underlying provider, not the framework.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 460 - 463, Replace uses of the deprecated GenAIAttributes.GEN_AI_SYSTEM with GenAIAttributes.GEN_AI_PROVIDER_NAME for all provider-level response/generation/realtime span attributes: update the dictionary named response_attributes (and analogous generation_/realtime_ attribute dicts) to set GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai" instead of GenAIAttributes.GEN_AI_SYSTEM: "openai"; keep agent-level spans that intentionally use GEN_AI_SYSTEM as-is. Locate occurrences by searching for GenAIAttributes.GEN_AI_SYSTEM and the attribute dicts (e.g., response_attributes, generation_attributes, realtime_*_attributes) in the instrumentation hooks and realtime wrappers and swap the constant to GenAIAttributes.GEN_AI_PROVIDER_NAME with the same "openai" value.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
636-645: Consider extracting message JSON construction to a helper.The pattern
json.dumps([{"role": ..., "content": ...}])for single-element message arrays is repeated across multiple realtime span handlers (lines 636-638, 642-645, 655-658, 662-665, 676-678). While acceptable given the package independence requirement, consider a small local helper if this pattern grows further.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 636 - 645, The repeated pattern json.dumps([{"role": ..., "content": ...}]) for constructing single-element message arrays should be moved into a small local helper function (e.g., to_message_json(role: str, content: Any) -> str) and used where the current code sets GenAIAttributes.GEN_AI_INPUT_MESSAGES and GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in the realtime span handlers; update the calls that build the input JSON (using input_text) and the output JSON (using getattr(span_data, "output", None) / output_audio) to call the helper before passing the string to otel_span.set_attribute so the logic is centralized and easier to maintain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 260-265: The span attributes currently set
GenAIAttributes.GEN_AI_OPERATION_NAME to "realtime"; update the attribute value
to the canonical "chat" to comply with OpenTelemetry GenAI semantic conventions.
Locate the span creation in the realtime wrappers (the code that calls
start_span or creates attributes with GenAIAttributes.GEN_AI_OPERATION_NAME in
_realtime_wrappers.py) and replace the string "realtime" with "chat", keeping
the surrounding attributes and parent_context intact so the span continues to be
created the same way.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 460-463: Replace uses of the deprecated
GenAIAttributes.GEN_AI_SYSTEM with GenAIAttributes.GEN_AI_PROVIDER_NAME for all
provider-level response/generation/realtime span attributes: update the
dictionary named response_attributes (and analogous generation_/realtime_
attribute dicts) to set GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai" instead
of GenAIAttributes.GEN_AI_SYSTEM: "openai"; keep agent-level spans that
intentionally use GEN_AI_SYSTEM as-is. Locate occurrences by searching for
GenAIAttributes.GEN_AI_SYSTEM and the attribute dicts (e.g.,
response_attributes, generation_attributes, realtime_*_attributes) in the
instrumentation hooks and realtime wrappers and swap the constant to
GenAIAttributes.GEN_AI_PROVIDER_NAME with the same "openai" value.
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 636-645: The repeated pattern json.dumps([{"role": ..., "content":
...}]) for constructing single-element message arrays should be moved into a
small local helper function (e.g., to_message_json(role: str, content: Any) ->
str) and used where the current code sets GenAIAttributes.GEN_AI_INPUT_MESSAGES
and GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in the realtime span handlers; update
the calls that build the input JSON (using input_text) and the output JSON
(using getattr(span_data, "output", None) / output_audio) to call the helper
before passing the string to otel_span.set_attribute so the logic is centralized
and easier to maintain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 652d6dab-a4a7-466e-8849-b8f91aeb4ef9
📒 Files selected for processing (7)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
💤 Files with no reviewable changes (1)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
✅ Files skipped from review due to trivial changes (1)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
…LM_ → GEN_AI_ - Replace SpanAttributes.LLM_REQUEST_TYPE with GenAIAttributes.GEN_AI_OPERATION_NAME - Replace SpanAttributes.LLM_REQUEST_FUNCTIONS with GenAIAttributes.GEN_AI_TOOL_DEFINITIONS - Replace SpanAttributes.LLM_SYSTEM with GenAIAttributes.GEN_AI_SYSTEM - Replace SpanAttributes.LLM_USAGE_TOTAL_TOKENS with SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS - Add test_semconv_compliance.py and [tool.uv.sources] for local semconv_ai Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l definitions to JSON array Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…en test constants, remove duplicate GEN_AI_SYSTEM key Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ssage format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ve orphaned completion.tool.* attrs
- _realtime_wrappers.py: replace flat gen_ai.prompt.*/gen_ai.completion.* in
create_llm_span() with GEN_AI_INPUT_MESSAGES/GEN_AI_OUTPUT_MESSAGES JSON arrays
- _hooks.py: remove gen_ai.completion.tool.{name,type,strict_json_schema} sub-attributes
from FunctionSpanData handler (tool name already captured via GEN_AI_TOOL_NAME)
- tests: update test_realtime_session.py assertions to parse JSON array attributes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, remove AgentSpanData misfeature - _hooks.py: remove misplaced catch-all 'elif span_data:' that was shadowing SpeechSpanData, TranscriptionSpanData, SpeechGroupSpanData, and AgentSpanData branches - _hooks.py: remove AgentSpanData handler that incorrectly propagated model settings to agent spans (test spec: agent spans must NOT carry gen_ai.request.* params) - _hooks.py: replace hardcoded "openai.agent.model.frequency_penalty" with GenAIAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY constant - tests: fix dead "llm.usage.*" prefix check, fix vestigial "gen_ai.prompt" scan, fix hardcoded frequency_penalty string, fix long line in test_realtime_session.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
a3bed2a to
741bdce
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 255-260: The current branch stringifies any assistant `content`
before adding tool-call parts, which flattens block-list structures; update the
logic in the block that handles `content` and `tool_calls` (variables/functions:
parts, content, tool_calls, _stringify_content, _tool_call_to_part) to preserve
structured content: if `content` is a list (block list) extend `parts` with its
elements, if it is a dict append it as-is, otherwise fall back to calling
`_stringify_content` and appending a {"type": "text", "content": text} part;
after preserving/appending the structured content, then extend with
`_tool_call_to_part(tc)` for each tool call.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py`:
- Around line 900-906: Tests currently mutate MagicMock.__name__ to impersonate
ResponseSpanData which affects global state; instead create a lightweight stub
(e.g., a small class or types.SimpleNamespace) named ResponseSpanData and
instantiate it for response_data, set input and response on that instance, and
pass it to MockAgentSpan so you do not modify MagicMock.__name__ or global mock
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc14880d-9335-4d6f-a5a4-fcdc890ef878
📒 Files selected for processing (8)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
✅ Files skipped from review due to trivial changes (1)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
- packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
5fa66cb to
826a2a7
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)
53-70:⚠️ Potential issue | 🟡 MinorActually parse these message attributes as JSON.
This test now targets
gen_ai.input.messages/gen_ai.output.messages, but it still swallowsJSONDecodeErrorand only attempts parsing for object-looking payloads. Because the new attrs are JSON arrays, malformed message payloads will still pass here.Suggested fix
if prompt_content_check: # All content attributes should be strings, not dicts error_msg = f"Attribute {attr_name} should be a string, got {type(attr_value)}: {attr_value}" assert isinstance(attr_value, str), error_msg - # If it looks like JSON, verify it can be parsed - if attr_value.startswith("{") and attr_value.endswith("}"): - try: - json.loads(attr_value) - except json.JSONDecodeError: - # If it fails to parse, that's still fine - just not JSON - pass + json.loads(attr_value)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py` around lines 53 - 70, The test currently checks gen_ai.input.messages / gen_ai.output.messages only superficially and swallows JSONDecodeError; update the loop that iterates spans → span.attributes (using variables span, attr_name, attr_value and the keys "gen_ai.input.messages" and "gen_ai.output.messages") so that after asserting attr_value is a str you always attempt to parse it with json.loads (do not limit parsing to strings that start/end with "{" / "}") and fail the test on json.JSONDecodeError (e.g., let the exception propagate or assert False with the error), ensuring malformed JSON arrays do not pass silently.
♻️ Duplicate comments (3)
packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py (1)
1047-1056:⚠️ Potential issue | 🟠 MajorRemove the
if raw_*guards so these semconv tests can actually fail.A missing
gen_ai.tool.definitions,gen_ai.input.messages, orgen_ai.output.messagescurrently turns into a pass. That makes the suite false-green on exactly the attributes this PR is supposed to lock down.Suggested fix
raw_defs = resp_span.attributes.get(GenAIAttributes.GEN_AI_TOOL_DEFINITIONS) - if raw_defs: - defs = json.loads(raw_defs) - assert len(defs) >= 1 - tool_def = defs[0] - # Per spec: preserve source system's representation - assert "type" in tool_def, "Tool definition should preserve 'type' field" - assert tool_def["type"] == "function" - assert "function" in tool_def, "Tool definition should preserve 'function' wrapper" + assert raw_defs is not None + defs = json.loads(raw_defs) + assert len(defs) >= 1 + tool_def = defs[0] + # Per spec: preserve source system's representation + assert "type" in tool_def, "Tool definition should preserve 'type' field" + assert tool_def["type"] == "function" + assert "function" in tool_def, "Tool definition should preserve 'function' wrapper"Apply the same pattern to the realtime
raw_inputandraw_outputchecks below.Also applies to: 1095-1100, 1131-1135, 1167-1175
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py` around lines 1047 - 1056, The tests currently short-circuit when attributes are missing because of conditional guards like "if raw_defs:"; remove those guards and instead assert presence of the raw attribute before parsing (e.g., assert raw_defs is not None or use assert raw_defs, then call json.loads(raw_defs) and continue validating tool_def fields), and apply the same change for raw_input and raw_output checks (references: raw_defs, raw_input, raw_output, GenAIAttributes.GEN_AI_TOOL_DEFINITIONS, and the message assertions) so the tests fail when the semconv attributes are absent; also make the same replacement for the realtime raw_input/raw_output checks mentioned.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
253-261:⚠️ Potential issue | 🟠 MajorPreserve structured
contentwhen assistant messages also includetool_calls.This branch stringifies
contentbefore appending tool-call parts. If an assistant message carries a block list here, the multimodal structure is flattened into one text blob andgen_ai.input.messagesno longer matches the parts schema this PR is migrating to.Suggested fix
parts = [] if tool_calls: if content is not None: - text = _stringify_content(content) - if text: - parts.append({"type": "text", "content": text}) + if isinstance(content, list): + parts.extend(_content_to_parts(content)) + elif isinstance(content, dict): + parts.append(_content_block_to_part(content)) + else: + text = _stringify_content(content) + if text: + parts.append({"type": "text", "content": text}) parts.extend(_tool_call_to_part(tc) for tc in tool_calls)Based on learnings: In the traceloop/openllmetry repository, the OTel GenAI semantic convention formal JSON schema for
gen_ai.input.messagesandgen_ai.output.messagesuses{role, parts}messages.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 253 - 261, The code flattens structured assistant `content` into a single text part when `tool_calls` exist by using _stringify_content, breaking the intended {role, parts} schema; change the branch that handles tool_calls so that if `content` is not None you call _content_to_parts(content) (or fall back to _stringify_content only if that returns empty) and extend `parts` with those parts before extending with _tool_call_to_part for each `tc`; update the block around parts, tool_calls, content to use _content_to_parts, preserve original structured parts, and keep _tool_call_to_part usage unchanged.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
349-358:⚠️ Potential issue | 🟠 MajorUse
"chat"for the realtime prompt/completion span.
create_llm_span()is the actual conversational model invocation. Emittinggen_ai.operation.name="realtime"here makes these spans semconv-incompatible and inconsistent with the rest of the package’s response spans.Suggested fix
span = self.tracer.start_span( "openai.realtime", kind=SpanKind.CLIENT, context=parent_context, start_time=start_time, attributes={ - GenAIAttributes.GEN_AI_OPERATION_NAME: "realtime", + GenAIAttributes.GEN_AI_OPERATION_NAME: "chat", GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai", GenAIAttributes.GEN_AI_REQUEST_MODEL: model_name_str, }, )Based on learnings: In the OpenTelemetry GenAI semantic conventions,
gen_ai.operation.nameuses standard predefined values such aschat,text_completion, andembeddings.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py` around lines 349 - 358, The span started in _realtime_wrappers.py is emitting gen_ai.operation.name="realtime" which is semconv-incompatible; update the attributes passed to self.tracer.start_span (the span variable created in that block) to set GenAIAttributes.GEN_AI_OPERATION_NAME to "chat" instead of "realtime" so the realtime prompt/completion span matches the package’s response spans and OpenTelemetry GenAI conventions (the conversational invocation remains handled by create_llm_span()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 53-70: The test currently checks gen_ai.input.messages /
gen_ai.output.messages only superficially and swallows JSONDecodeError; update
the loop that iterates spans → span.attributes (using variables span, attr_name,
attr_value and the keys "gen_ai.input.messages" and "gen_ai.output.messages") so
that after asserting attr_value is a str you always attempt to parse it with
json.loads (do not limit parsing to strings that start/end with "{" / "}") and
fail the test on json.JSONDecodeError (e.g., let the exception propagate or
assert False with the error), ensuring malformed JSON arrays do not pass
silently.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 253-261: The code flattens structured assistant `content` into a
single text part when `tool_calls` exist by using _stringify_content, breaking
the intended {role, parts} schema; change the branch that handles tool_calls so
that if `content` is not None you call _content_to_parts(content) (or fall back
to _stringify_content only if that returns empty) and extend `parts` with those
parts before extending with _tool_call_to_part for each `tc`; update the block
around parts, tool_calls, content to use _content_to_parts, preserve original
structured parts, and keep _tool_call_to_part usage unchanged.
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 349-358: The span started in _realtime_wrappers.py is emitting
gen_ai.operation.name="realtime" which is semconv-incompatible; update the
attributes passed to self.tracer.start_span (the span variable created in that
block) to set GenAIAttributes.GEN_AI_OPERATION_NAME to "chat" instead of
"realtime" so the realtime prompt/completion span matches the package’s response
spans and OpenTelemetry GenAI conventions (the conversational invocation remains
handled by create_llm_span()).
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py`:
- Around line 1047-1056: The tests currently short-circuit when attributes are
missing because of conditional guards like "if raw_defs:"; remove those guards
and instead assert presence of the raw attribute before parsing (e.g., assert
raw_defs is not None or use assert raw_defs, then call json.loads(raw_defs) and
continue validating tool_def fields), and apply the same change for raw_input
and raw_output checks (references: raw_defs, raw_input, raw_output,
GenAIAttributes.GEN_AI_TOOL_DEFINITIONS, and the message assertions) so the
tests fail when the semconv attributes are absent; also make the same
replacement for the realtime raw_input/raw_output checks mentioned.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e89a76c-8bd4-4afb-a6d1-3ee1010f47ab
📒 Files selected for processing (9)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_processor_refactor.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
✅ Files skipped from review due to trivial changes (1)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 61-66: The mapper _map_finish_reason currently returns the raw
unknown value (return _FINISH_REASON_MAP.get(raw, raw)); change the fallback to
an empty string so unknown/non-enum finish reasons do not leak raw values (e.g.,
use _FINISH_REASON_MAP.get(raw, "") instead), keep the existing None -> None
behavior if desired, and apply the same change to the other similar mappers
referenced in the file (the ones at the regions around lines 373-382 and
432-433) so all finish-reason mapping consistently defaults unknowns to "".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5b79bc03-1f47-4dee-a82b-e8e5b386a6c6
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-openai-agents/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.pypackages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py
✅ Files skipped from review due to trivial changes (3)
- packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
- packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
355-356:⚠️ Potential issue | 🟠 MajorUse the canonical chat operation for the prompt/completion span.
This span is emitting
gen_ai.input.messagesandgen_ai.output.messagesfor a conversational exchange, so"realtime"is the wronggen_ai.operation.namehere. It should use the well-known chat operation value instead.OpenTelemetry GenAI semantic conventions: what are the valid values for `gen_ai.operation.name`, and should a span representing a prompt/completion conversational exchange use `chat` rather than `realtime`?Based on learnings: In the OpenTelemetry GenAI semantic conventions,
gen_ai.operation.nameuses standard predefined values such as "chat", "text_completion", "embeddings", etc. — NOT method-specific names.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py` around lines 355 - 356, The span in _realtime_wrappers.py is using a non-canonical operation name ("realtime") for a prompt/completion conversational exchange; update the mapping for GenAIAttributes.GEN_AI_OPERATION_NAME to use the OpenTelemetry GenAI canonical chat value ("chat") wherever the span emits gen_ai.input.messages and gen_ai.output.messages (look for the dict setting that includes GenAIAttributes.GEN_AI_OPERATION_NAME and GenAIAttributes.GEN_AI_PROVIDER_NAME) so the span correctly identifies the operation as a chat prompt/completion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 218-219: The code in _realtime_wrappers.py sets GenAI tool call
results using str(output), producing Python reprs; change this to serialize
output as JSON (matching _end_function_span in _hooks.py) by JSON-encoding the
output before calling span.set_attribute (handle non-serializable objects safely
or fall back to a string). Locate the
span.set_attribute(GenAIAttributes.GEN_AI_TOOL_CALL_RESULT, str(output)) call
and replace str(output) with a json.dumps-based serialization (import json) that
mirrors the serialization approach used in _end_function_span, ensuring
consistent downstream parsing.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_processor_refactor.py`:
- Around line 154-229: The tests expect _start_handoff_span to create a span
named "{from} → {to}.handoff" with SpanKind.INTERNAL and to register reverse
handoffs in _reverse_handoffs_dict using the key "{to}:{trace_id}"; update the
implementation in _hooks.py (function _start_handoff_span) to (1) build the span
name using the " → " arrow and suffix ".handoff" (falling back to "unknown" for
None names), (2) create the span with kind SpanKind.INTERNAL instead of
SpanKind.CLIENT, (3) set the handoff-specific attributes
(SpanAttributes.TRACELOOP_SPAN_KIND="handoff", GEN_AI_HANDOFF_FROM_AGENT and
GEN_AI_HANDOFF_TO_AGENT, and GenAIAttributes.GEN_AI_PROVIDER_NAME="openai"), and
(4) register the reverse mapping into _reverse_handoffs_dict as
"{to}:{trace_id}" -> from_agent.
- Around line 69-143: Tests expect _start_agent_span (in _hooks.py) to create
spans named "{agent_name}.agent" (default "unknown_agent.agent"), set
SpanKind.INTERNAL, set GenAIAttributes.GEN_AI_AGENT_NAME to the agent name and
GenAIAttributes.GEN_AI_PROVIDER_NAME to "openai", serialize each handoff into
attributes "openai.agent.handoff{n}" (JSON), and if
processor._reverse_handoffs_dict contains "AgentName:trace_id" add and then
remove attribute "gen_ai.agent.handoff_parent" with that value; update
_start_agent_span to implement these behaviors and remove the old behavior that
created "agent_{agent_name}" spans, used GEN_AI_SYSTEM for agent name, and used
GenAIAttributes.GEN_AI_HANDOFF_FROM_AGENT for reverse handoffs so the function
matches the new test contract.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 355-356: The span in _realtime_wrappers.py is using a
non-canonical operation name ("realtime") for a prompt/completion conversational
exchange; update the mapping for GenAIAttributes.GEN_AI_OPERATION_NAME to use
the OpenTelemetry GenAI canonical chat value ("chat") wherever the span emits
gen_ai.input.messages and gen_ai.output.messages (look for the dict setting that
includes GenAIAttributes.GEN_AI_OPERATION_NAME and
GenAIAttributes.GEN_AI_PROVIDER_NAME) so the span correctly identifies the
operation as a chat prompt/completion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d70f79c0-b9c4-4f36-9b5f-a4bfbb081843
📒 Files selected for processing (5)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_processor_refactor.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
✅ Files skipped from review due to trivial changes (1)
- packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
8b617a0 to
aa8707c
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 501-504: The code currently sets the non-upstream attribute
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS using usage.total_tokens in _hooks.py;
remove or stop emitting that upstream key and instead either (a) emit total
tokens under a custom/traceloop namespace (e.g. use a custom attribute name like
"traceloop.gen_ai.usage.total_tokens") when calling otel_span.set_attribute, or
(b) omit setting total_tokens entirely and rely on existing upstream attributes
SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS and
SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS so total can be derived downstream;
update the usage.total_tokens check and the otel_span.set_attribute call (the
lines referencing usage.total_tokens and
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS) accordingly.
- Around line 518-540: The _extract_tool_definitions function currently drops
tools that lack .function or .name; update it to first detect and preserve dicts
and Pydantic models (use isinstance(tool, dict) and check for pydantic.BaseModel
as in utils.py serialization helper) by appending their raw dict form (for
models call .dict() or the same serializer used in utils.py) into tool_defs
before doing attribute fallbacks; also accept params_json_schema as an
alternative to parameters (map params_json_schema -> parameters if present) and
ensure both function-form tools (function.parameters /
function.params_json_schema) and direct tools (parameters / params_json_schema)
are handled so no tool definitions are silently dropped in
_extract_tool_definitions.
- Around line 297-320: In _extract_prompt_attributes, handle the case where
ResponseSpanData.input is a string by converting string input into a single chat
message before iterating: detect if input_data is an instance of str and replace
it with a list containing one element representing a user message (so subsequent
_msg_to_dict/_convert_chat_message paths work); keep existing behavior for list
and None. Update references in _extract_prompt_attributes (and ensure
compatibility with _msg_to_dict, _convert_chat_message, and
_convert_agents_sdk_message) so gen_ai.input.messages is emitted for string
inputs the same way realtime spans do.
- Around line 151-155: The helper _stringify_content can raise via json.dumps
for arbitrary objects, causing on_span_end to abort before otel_span.end(); make
_stringify_content exception-safe by wrapping json.dumps(content) in a
try/except that catches Exception and returns a safe fallback (e.g.,
repr(content) or a short "<unserializable: ...>" string), and ensure the
function always returns a str and never raises; update references to
_stringify_content in on_span_end to rely on this non-throwing behavior so spans
are always closed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c066b7d6-fb7a-4100-acd8-948f0dc7701f
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (5)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (4)
310-323:⚠️ Potential issue | 🟠 MajorHandle string
ResponseSpanData.inputbefore iterating.A plain string input is iterated character-by-character here, so no
gen_ai.input.messagesis emitted for commonRunner.run(..., "prompt")inputs. Normalize strings into a single user message first.🐛 Proposed fix
+def _iter_input_messages(input_data): + if isinstance(input_data, str): + return [{"role": "user", "content": input_data}] + if isinstance(input_data, dict): + return [input_data] + return input_data + + def _extract_prompt_attributes(otel_span, input_data, trace_content: bool): @@ messages = [] - for message in input_data: + for message in _iter_input_messages(input_data): msg = _msg_to_dict(message)#!/bin/bash set -euo pipefail python - <<'PY' input_data = "What is AI?" print("Current loop items:", list(input_data[:5])) print("Expected normalized message:", [{"role": "user", "content": input_data}]) PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 310 - 323, _input_data may be a plain string (e.g. ResponseSpanData.input) and the current loop iterates characters; update _extract_prompt_attributes to detect when input_data is a str and normalize it into a list containing a single message dict (e.g. {"role": "user", "content": input_data"}) before the for message in input_data loop so _msg_to_dict and subsequent logic operate on message objects rather than characters; preserve the existing early return when not input_data or not trace_content and ensure other callers of _extract_prompt_attributes still receive the same attribute output shape.
535-565:⚠️ Potential issue | 🟠 MajorPreserve dict/Pydantic tool definitions instead of dropping them.
_extract_tool_definitions()only handles.functionand.nameattributes, so source-system dict or model-shaped tool definitions are silently omitted fromgen_ai.tool.definitions. Preserve already-structured tool definitions before falling back to attribute extraction. Based on learnings, instrumentation packages should leverage semantic conventions to generate spans and tracing data compliant with OpenTelemetry semantic conventions.🐛 Proposed fix
def _extract_tool_definitions(tools): @@ tool_defs = [] for tool in tools: - if hasattr(tool, "function"): + if isinstance(tool, dict): + tool_defs.append(tool) + continue + if hasattr(tool, "model_dump"): + tool_defs.append(tool.model_dump(exclude_none=True)) + continue + if hasattr(tool, "dict"): + tool_defs.append(tool.dict(exclude_none=True)) + continue + + if hasattr(tool, "function"): function = tool.function func_def = { "name": getattr(function, "name", ""), "description": getattr(function, "description", ""), } if hasattr(function, "parameters"): func_def["parameters"] = function.parameters + elif hasattr(function, "params_json_schema"): + func_def["parameters"] = function.params_json_schema @@ if hasattr(tool, "parameters"): func_def["parameters"] = tool.parameters + elif hasattr(tool, "params_json_schema"): + func_def["parameters"] = tool.params_json_schemaOpenTelemetry GenAI semantic conventions gen_ai.tool.definitions source system tool definition format🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 535 - 565, The _extract_tool_definitions function currently only extracts tools via .function or .name and thus drops tools that are already dicts or Pydantic/model-shaped objects; update _extract_tool_definitions to first detect and preserve already-structured definitions (e.g., if tool is instance of dict or has a .dict() / .model_dump() / .model_dump_json() method) and append that serialized representation directly to tool_defs before falling back to attribute-based extraction (the code paths referencing tool.function, tool.name, function.parameters, and building func_def/tool_def should remain for non-structured tools).
152-156:⚠️ Potential issue | 🟠 MajorMake JSON serialization exception-safe before ending spans.
json.dumps()can raise for arbitrary tool inputs/outputs or content objects; inon_span_end, that skipsotel_span.end()and leaks the span because@dont_throwcatches the method-level exception. Use a safe serializer everywhere these content attributes are built.🐛 Proposed fix
+def _safe_json_dumps(value) -> str: + try: + return json.dumps(value, default=str) + except (TypeError, ValueError): + return str(value) + + def _stringify_content(content) -> str: """Coerce non-string content to a string for simple text parts.""" if isinstance(content, str): return content - return json.dumps(content) + return _safe_json_dumps(content) @@ if tool_input is not None: otel_span.set_attribute( GenAIAttributes.GEN_AI_TOOL_CALL_ARGUMENTS, - tool_input if isinstance(tool_input, str) else json.dumps(tool_input), + tool_input if isinstance(tool_input, str) else _safe_json_dumps(tool_input), ) @@ if tool_output is not None: otel_span.set_attribute( GenAIAttributes.GEN_AI_TOOL_CALL_RESULT, - tool_output if isinstance(tool_output, str) else json.dumps(tool_output), + tool_output if isinstance(tool_output, str) else _safe_json_dumps(tool_output), )#!/bin/bash set -euo pipefail python - <<'PY' import json class NonSerializable: pass for value in [NonSerializable(), {"bad": NonSerializable()}, {1, 2, 3}]: try: json.dumps(value) except Exception as exc: print(type(value).__name__, "raises", type(exc).__name__) PYAlso applies to: 912-924
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 152 - 156, The _stringify_content helper currently calls json.dumps directly which can raise and prevent otel spans from ending; create a small safe serializer (e.g., safe_json_serialize(value) or _safe_stringify) that wraps json.dumps in a try/except and falls back to repr(value) or a fixed "<unserializable>" string on error, then replace json.dumps calls in _stringify_content and all places that build content for spans (notably on_span_end) to use this safe serializer so serialization failures never propagate and spans are always ended.
67-71:⚠️ Potential issue | 🟠 MajorDo not emit raw unknown finish reasons.
_map_finish_reason()still returns unmapped provider values as-is, so unknown values can leak intogen_ai.response.finish_reasonsand output-messagefinish_reason. Default unknowns toNone/empty string instead. Based on learnings, follow the OpenTelemetry GenAI semantic specification athttps://opentelemetry.io/docs/specs/semconv/gen-ai/.🐛 Proposed fix
def _map_finish_reason(raw): """Map a provider-specific finish reason to the OTel enum value.""" if raw is None: return None - return _FINISH_REASON_MAP.get(raw, raw) + return _FINISH_REASON_MAP.get(raw)OpenTelemetry GenAI semantic conventions gen_ai.response.finish_reasons enum values unknown finish_reasonAlso applies to: 390-501
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 67 - 71, The _map_finish_reason function currently returns unmapped provider values as-is; change it so that if raw is not in _FINISH_REASON_MAP it returns None (or an empty string where callers expect strings) instead of returning raw, ensuring unknown provider finish reasons are not emitted into gen_ai.response.finish_reasons or output-message finish_reason; update the _map_finish_reason(raw) implementation to use _FINISH_REASON_MAP.get(raw) and default to None, and audit the other uses referenced (the same mapping logic around the sections noted ~390-501) to apply the same defaulting behavior so only known enum values from _FINISH_REASON_MAP are emitted.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
266-266:⚠️ Potential issue | 🟠 MajorUse a well-known GenAI operation name for realtime LLM spans.
These spans serialize chat-style input/output messages, so
"chat"is the closest well-knowngen_ai.operation.name;"realtime"remains a custom value and was already flagged previously. Based on learnings,gen_ai.operation.nameuses standard predefined values such as"chat","text_completion","embeddings", etc., and similar chat-like operations should use"chat".OpenTelemetry GenAI semantic conventions gen_ai.operation.name well-known values chat realtimeAlso applies to: 365-365
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py` at line 266, Replace the custom operation name "realtime" with the well-known GenAI semantic convention value "chat" for realtime LLM spans: update the assignment setting GenAIAttributes.GEN_AI_OPERATION_NAME in _realtime_wrappers.py (the occurrences that currently set it to "realtime") to "chat" so spans use the standard gen_ai.operation.name; apply the same change to the other occurrence in the file (the second GenAIAttributes.GEN_AI_OPERATION_NAME instance).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 761-770: The handoff instructions are being added unconditionally
to attributes["openai.agent.handoffs"], which leaks prompt content; update the
loop that builds handoffs_list so it only includes the "instructions" field when
should_send_prompts() returns true—otherwise omit that key entirely (i.e., add
"instructions" only if should_send_prompts() is enabled for each handoff_agent).
Ensure you call should_send_prompts() (import or reference the existing helper)
and then json.dumps the resulting handoffs_list as before.
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 325-330: The dedup logic currently hashes only content[:100],
causing false positives; update the logic in the RealTime wrapper that sets
content_hash (refer to the variables content, content_hash,
self._seen_completions and self._seen_completions_max) to compute a stable
digest of the entire content instead of the 100-char prefix (e.g., use
hashlib.sha256 on content.encode("utf-8") and take hexdigest() or a truncated
stable digest) and use that digest as the key when checking/adding to
self._seen_completions and evicting oldest entries.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 310-323: _input_data may be a plain string (e.g.
ResponseSpanData.input) and the current loop iterates characters; update
_extract_prompt_attributes to detect when input_data is a str and normalize it
into a list containing a single message dict (e.g. {"role": "user", "content":
input_data"}) before the for message in input_data loop so _msg_to_dict and
subsequent logic operate on message objects rather than characters; preserve the
existing early return when not input_data or not trace_content and ensure other
callers of _extract_prompt_attributes still receive the same attribute output
shape.
- Around line 535-565: The _extract_tool_definitions function currently only
extracts tools via .function or .name and thus drops tools that are already
dicts or Pydantic/model-shaped objects; update _extract_tool_definitions to
first detect and preserve already-structured definitions (e.g., if tool is
instance of dict or has a .dict() / .model_dump() / .model_dump_json() method)
and append that serialized representation directly to tool_defs before falling
back to attribute-based extraction (the code paths referencing tool.function,
tool.name, function.parameters, and building func_def/tool_def should remain for
non-structured tools).
- Around line 152-156: The _stringify_content helper currently calls json.dumps
directly which can raise and prevent otel spans from ending; create a small safe
serializer (e.g., safe_json_serialize(value) or _safe_stringify) that wraps
json.dumps in a try/except and falls back to repr(value) or a fixed
"<unserializable>" string on error, then replace json.dumps calls in
_stringify_content and all places that build content for spans (notably
on_span_end) to use this safe serializer so serialization failures never
propagate and spans are always ended.
- Around line 67-71: The _map_finish_reason function currently returns unmapped
provider values as-is; change it so that if raw is not in _FINISH_REASON_MAP it
returns None (or an empty string where callers expect strings) instead of
returning raw, ensuring unknown provider finish reasons are not emitted into
gen_ai.response.finish_reasons or output-message finish_reason; update the
_map_finish_reason(raw) implementation to use _FINISH_REASON_MAP.get(raw) and
default to None, and audit the other uses referenced (the same mapping logic
around the sections noted ~390-501) to apply the same defaulting behavior so
only known enum values from _FINISH_REASON_MAP are emitted.
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Line 266: Replace the custom operation name "realtime" with the well-known
GenAI semantic convention value "chat" for realtime LLM spans: update the
assignment setting GenAIAttributes.GEN_AI_OPERATION_NAME in
_realtime_wrappers.py (the occurrences that currently set it to "realtime") to
"chat" so spans use the standard gen_ai.operation.name; apply the same change to
the other occurrence in the file (the second
GenAIAttributes.GEN_AI_OPERATION_NAME instance).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8a8c4e28-e177-4197-82c7-5b09688d908c
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-openai-agents/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_tracing_processor.py
✅ Files skipped from review due to trivial changes (1)
- packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
693-723:⚠️ Potential issue | 🟠 MajorGuarantee span cleanup with
finally.Any exception from
_end_generation_span,_end_function_span, realtime attribute extraction, or serialization exits before Line 719 and Line 722. Since@dont_throwswallows it, the OTel span remains open and the context token stays attached.Proposed structure
if span in self._otel_spans: otel_span = self._otel_spans[span] - span_data = getattr(span, "span_data", None) - trace_content = should_send_prompts() - if span_data and ( - type(span_data).__name__ == "ResponseSpanData" - or isinstance(span_data, GenerationSpanData) - ): - self._end_generation_span(otel_span, span_data, trace_content) - - elif span_data and isinstance(span_data, FunctionSpanData): - self._end_function_span(otel_span, span_data, trace_content) - - elif trace_content and span_data and _has_realtime_spans: - ... - - if hasattr(span, "error") and span.error: - otel_span.set_status(Status(StatusCode.ERROR, str(span.error))) - else: - otel_span.set_status(Status(StatusCode.OK)) - - otel_span.end() - del self._otel_spans[span] - if span in self._span_contexts: - context.detach(self._span_contexts[span]) - del self._span_contexts[span] + try: + span_data = getattr(span, "span_data", None) + trace_content = should_send_prompts() + if span_data and ( + type(span_data).__name__ == "ResponseSpanData" + or isinstance(span_data, GenerationSpanData) + ): + self._end_generation_span(otel_span, span_data, trace_content) + elif span_data and isinstance(span_data, FunctionSpanData): + self._end_function_span(otel_span, span_data, trace_content) + elif trace_content and span_data and _has_realtime_spans: + ... + + if hasattr(span, "error") and span.error: + otel_span.set_status(Status(StatusCode.ERROR, str(span.error))) + else: + otel_span.set_status(Status(StatusCode.OK)) + except Exception as exc: + otel_span.set_status(Status(StatusCode.ERROR, str(exc))) + finally: + otel_span.end() + del self._otel_spans[span] + if span in self._span_contexts: + context.detach(self._span_contexts[span]) + del self._span_contexts[span]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 693 - 723, The span finalization and context cleanup must run in a finally block to guarantee otel_span.end() and removal from self._otel_spans / self._span_contexts even if _end_generation_span, _end_function_span, _set_realtime_io_attributes, serialization, or other calls throw; refactor the logic inside the if span in self._otel_spans: block so that all processing (inspecting span_data, calling _end_generation_span/_end_function_span/_set_realtime_io_attributes and setting otel_span status) happens in try and the otel_span.set_status(...), otel_span.end(), del self._otel_spans[span], and context.detach(...) / del self._span_contexts[span] happen in finally, using the existing symbols (self._otel_spans, self._span_contexts, _end_generation_span, _end_function_span, _set_realtime_io_attributes, should_send_prompts, span_data, otel_span) to locate and modify the code.
♻️ Duplicate comments (7)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (5)
67-71:⚠️ Potential issue | 🟠 MajorDo not emit raw unknown finish reasons.
Line 71 still returns unmapped provider values as-is, which can put non-enum values into
gen_ai.response.finish_reasonsand output-messagefinish_reason. ReturnNone/empty instead so unknowns are omitted or serialized as"".Proposed fix
def _map_finish_reason(raw): """Map a provider-specific finish reason to the OTel enum value.""" if raw is None: return None - return _FINISH_REASON_MAP.get(raw, raw) + return _FINISH_REASON_MAP.get(raw)The OpenTelemetry registry describes
gen_ai.response.finish_reasonsas the array of reasons received and examples use the defined finish-reason values rather than provider-specific fallbacks: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. Based on learnings, Follow the OpenTelemetry GenAI semantic specification at https://opentelemetry.io/docs/specs/semconv/gen-ai/.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 67 - 71, The _map_finish_reason function currently returns unmapped provider-specific values (returning raw) which can violate the GenAI enum for gen_ai.response.finish_reasons; update _map_finish_reason to only return mapped values from _FINISH_REASON_MAP and return None (or empty string) when raw is not found or is unknown so unknown finish reasons are omitted/serialized as empty, ensuring only defined enum values are emitted for gen_ai.response.finish_reasons and related finish_reason output fields.
310-323:⚠️ Potential issue | 🟠 MajorHandle string
input_databefore iterating.If
ResponseSpanData.inputis a string, Line 322 iterates characters and emits nogen_ai.input.messages. Normalize strings to a single user message first, matching the realtime path.Proposed fix
+def _iter_input_messages(input_data): + if isinstance(input_data, str): + return [{"role": "user", "content": input_data}] + if isinstance(input_data, dict): + return [input_data] + return input_data + + def _extract_prompt_attributes(otel_span, input_data, trace_content: bool): @@ messages = [] - for message in input_data: + for message in _iter_input_messages(input_data): msg = _msg_to_dict(message)Based on learnings, the OTel GenAI semantic convention formal JSON schema for
gen_ai.input.messagesuses{"role": "...", "parts": [{"type": "text", "content": "..."}]}format.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 310 - 323, In _extract_prompt_attributes, handle the case where input_data is a plain string before the loop: if isinstance(input_data, str) convert it into a single-message structure matching the GenAI schema (e.g., a user message with role "user" and parts = [{"type":"text","content": input_data}]) so that the subsequent for message in input_data and calls to _msg_to_dict produce a single gen_ai.input.messages entry; add this normalization immediately before the messages = [] / for message in input_data block.
527-530:⚠️ Potential issue | 🟡 MinorAvoid emitting non-upstream
gen_ai.usage.total_tokens.The migration goal is upstream GenAI semconv compliance, but Line 529 still emits
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS. The current OpenTelemetry GenAI registry lists input/output usage attributes (gen_ai.usage.input_tokens,gen_ai.usage.output_tokens) and not a total-token span attribute; total can be derived downstream or emitted under a custom namespace.Reference: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. Based on learnings, Follow the OpenTelemetry GenAI semantic specification at https://opentelemetry.io/docs/specs/semconv/gen-ai/.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 527 - 530, The code currently sets a non-upstream attribute SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS using usage.total_tokens in the otel_span.set_attribute call; remove the emission of GEN_AI_USAGE_TOTAL_TOKENS and instead emit only the upstream-compliant attributes (e.g., gen_ai.usage.input_tokens and gen_ai.usage.output_tokens) when available via usage.input_tokens / usage.output_tokens, or omit emitting total tokens entirely (or emit it under a custom namespace) to comply with the OpenTelemetry GenAI semconv; update the logic around usage.total_tokens, otel_span.set_attribute, and SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS accordingly.
152-156:⚠️ Potential issue | 🟠 MajorMake content/tool serialization non-throwing.
json.dumps()can raise for arbitrary SDK/tool objects. Because these paths run insideon_span_end, a serialization failure can skip attributes and, without stronger cleanup, interrupt span lifecycle handling.Proposed fix
+def _safe_json_dumps(value) -> str: + try: + return json.dumps(value, default=str) + except (TypeError, ValueError): + return str(value) + + def _stringify_content(content) -> str: """Coerce non-string content to a string for simple text parts.""" if isinstance(content, str): return content - return json.dumps(content) + return _safe_json_dumps(content) @@ if tool_input is not None: otel_span.set_attribute( GenAIAttributes.GEN_AI_TOOL_CALL_ARGUMENTS, - tool_input if isinstance(tool_input, str) else json.dumps(tool_input), + tool_input if isinstance(tool_input, str) else _safe_json_dumps(tool_input), ) @@ if tool_output is not None: otel_span.set_attribute( GenAIAttributes.GEN_AI_TOOL_CALL_RESULT, - tool_output if isinstance(tool_output, str) else json.dumps(tool_output), + tool_output if isinstance(tool_output, str) else _safe_json_dumps(tool_output), )Also applies to: 913-925
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 152 - 156, The _stringify_content function currently uses json.dumps(content) which can raise for arbitrary SDK/tool objects and thereby disrupt on_span_end handling; update _stringify_content (and the analogous serialization in the other block around the 913-925 range) to catch exceptions from json.dumps and fall back to a safe, non-throwing representation (e.g., try json.dumps with default=str inside a try/except, on exception return repr(content) or str(content) and if that also fails return a fixed placeholder like "<unserializable>"); ensure the function returns a string in all cases so span attribute setting cannot throw.
535-565:⚠️ Potential issue | 🟠 MajorPreserve dict/Pydantic tool definitions.
This only handles objects with
.functionor.name; dict/Pydantic tool definitions are silently dropped.gen_ai.tool.definitionsshould preserve the source-system tool definition objects when available.Also handle
params_json_schemaas an alias forparameters.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py` around lines 535 - 565, _extract_tool_definitions currently only handles objects with .function or .name and drops dict/Pydantic tool definitions; update that function to also detect and preserve dict-like or Pydantic definitions by returning them as-is (or their .dict() for Pydantic models) into the tool_defs list instead of ignoring them, and treat a params_json_schema key as an alias for parameters (copy params_json_schema into parameters when parameters is missing). Locate the code in _extract_tool_definitions, add an early branch like "if isinstance(tool, dict) or has attr 'dict' (Pydantic): append the original dict/pydantic dict to tool_defs", and when building func_def handle both "parameters" and "params_json_schema" keys (prefer parameters, fallback to params_json_schema) so gen_ai.tool.definitions preserves source-system definitions.packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)
386-390:⚠️ Potential issue | 🟡 MinorAvoid emitting non-upstream
gen_ai.usage.total_tokenshere too.Realtime spans still set
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS. For upstream GenAI semconv compliance, prefer onlygen_ai.usage.input_tokensandgen_ai.usage.output_tokens, or move total tokens under a custom Traceloop namespace.Reference: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. Based on learnings, Follow the OpenTelemetry GenAI semantic specification at https://opentelemetry.io/docs/specs/semconv/gen-ai/.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py` around lines 386 - 390, The realtime wrapper currently sets the non-upstream attribute SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS via span.set_attribute; remove that call and instead emit only upstream GenAI semconv attributes (SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS and SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) when those values exist in self.pending_usage, or if you must keep a total, write it under a Traceloop custom namespace (e.g., "traceloop.gen_ai.usage.total_tokens") rather than using SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS; update the code around the span.set_attribute call where self.pending_usage["total_tokens"] is referenced to perform these replacements and guard emissions behind existence checks.
256-269:⚠️ Potential issue | 🟠 MajorUse the canonical operation name for realtime LLM calls.
Both realtime spans still emit
gen_ai.operation.name = "realtime". The OpenTelemetry GenAI registry lists predefined values and says applicable predefined values should be used; for OpenAI conversational realtime LLM calls, use"chat"unless this package documents a system-specific operation.Proposed fix
- GenAIAttributes.GEN_AI_OPERATION_NAME: "realtime", + GenAIAttributes.GEN_AI_OPERATION_NAME: "chat",Reference: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. Based on learnings, In the OpenTelemetry GenAI semantic conventions,
gen_ai.operation.nameuses standard predefined values such as "chat", "text_completion", "embeddings", etc.Also applies to: 359-369
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py` around lines 256 - 269, The span attribute GenAIAttributes.GEN_AI_OPERATION_NAME is set to the non-canonical value "realtime"; update it to the canonical OpenTelemetry GenAI operation name "chat" for OpenAI conversational realtime LLM calls. Locate the span creation in start_audio_span (and the analogous realtime span creation around lines 359-369) and change the attribute value from "realtime" to "chat" so the attributes dict uses GenAIAttributes.GEN_AI_OPERATION_NAME: "chat" while keeping other attributes and span configuration unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 321-331: The record_completion method currently deduplicates
completions solely by hash(content) using _seen_completions, which can drop
legitimate later replies and cause prompt/completion misalignment; change the
dedupe key to include a unique response or turn identifier (if available from
the event) or include the turn/session id so duplicates are scoped per turn.
Update record_completion to build a composite key (e.g., (response_id or
turn_id, hash(content))) when checking and inserting into _seen_completions,
keep the eviction logic with _seen_completions_max, and ensure
create_llm_span(content) is still called for non-duplicate composite keys.
Reference: record_completion, _seen_completions, _seen_completions_max,
create_llm_span.
---
Outside diff comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 693-723: The span finalization and context cleanup must run in a
finally block to guarantee otel_span.end() and removal from self._otel_spans /
self._span_contexts even if _end_generation_span, _end_function_span,
_set_realtime_io_attributes, serialization, or other calls throw; refactor the
logic inside the if span in self._otel_spans: block so that all processing
(inspecting span_data, calling
_end_generation_span/_end_function_span/_set_realtime_io_attributes and setting
otel_span status) happens in try and the otel_span.set_status(...),
otel_span.end(), del self._otel_spans[span], and context.detach(...) / del
self._span_contexts[span] happen in finally, using the existing symbols
(self._otel_spans, self._span_contexts, _end_generation_span,
_end_function_span, _set_realtime_io_attributes, should_send_prompts, span_data,
otel_span) to locate and modify the code.
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 67-71: The _map_finish_reason function currently returns unmapped
provider-specific values (returning raw) which can violate the GenAI enum for
gen_ai.response.finish_reasons; update _map_finish_reason to only return mapped
values from _FINISH_REASON_MAP and return None (or empty string) when raw is not
found or is unknown so unknown finish reasons are omitted/serialized as empty,
ensuring only defined enum values are emitted for gen_ai.response.finish_reasons
and related finish_reason output fields.
- Around line 310-323: In _extract_prompt_attributes, handle the case where
input_data is a plain string before the loop: if isinstance(input_data, str)
convert it into a single-message structure matching the GenAI schema (e.g., a
user message with role "user" and parts = [{"type":"text","content":
input_data}]) so that the subsequent for message in input_data and calls to
_msg_to_dict produce a single gen_ai.input.messages entry; add this
normalization immediately before the messages = [] / for message in input_data
block.
- Around line 527-530: The code currently sets a non-upstream attribute
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS using usage.total_tokens in the
otel_span.set_attribute call; remove the emission of GEN_AI_USAGE_TOTAL_TOKENS
and instead emit only the upstream-compliant attributes (e.g.,
gen_ai.usage.input_tokens and gen_ai.usage.output_tokens) when available via
usage.input_tokens / usage.output_tokens, or omit emitting total tokens entirely
(or emit it under a custom namespace) to comply with the OpenTelemetry GenAI
semconv; update the logic around usage.total_tokens, otel_span.set_attribute,
and SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS accordingly.
- Around line 152-156: The _stringify_content function currently uses
json.dumps(content) which can raise for arbitrary SDK/tool objects and thereby
disrupt on_span_end handling; update _stringify_content (and the analogous
serialization in the other block around the 913-925 range) to catch exceptions
from json.dumps and fall back to a safe, non-throwing representation (e.g., try
json.dumps with default=str inside a try/except, on exception return
repr(content) or str(content) and if that also fails return a fixed placeholder
like "<unserializable>"); ensure the function returns a string in all cases so
span attribute setting cannot throw.
- Around line 535-565: _extract_tool_definitions currently only handles objects
with .function or .name and drops dict/Pydantic tool definitions; update that
function to also detect and preserve dict-like or Pydantic definitions by
returning them as-is (or their .dict() for Pydantic models) into the tool_defs
list instead of ignoring them, and treat a params_json_schema key as an alias
for parameters (copy params_json_schema into parameters when parameters is
missing). Locate the code in _extract_tool_definitions, add an early branch like
"if isinstance(tool, dict) or has attr 'dict' (Pydantic): append the original
dict/pydantic dict to tool_defs", and when building func_def handle both
"parameters" and "params_json_schema" keys (prefer parameters, fallback to
params_json_schema) so gen_ai.tool.definitions preserves source-system
definitions.
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 386-390: The realtime wrapper currently sets the non-upstream
attribute SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS via span.set_attribute;
remove that call and instead emit only upstream GenAI semconv attributes
(SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS and
SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) when those values exist in
self.pending_usage, or if you must keep a total, write it under a Traceloop
custom namespace (e.g., "traceloop.gen_ai.usage.total_tokens") rather than using
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS; update the code around the
span.set_attribute call where self.pending_usage["total_tokens"] is referenced
to perform these replacements and guard emissions behind existence checks.
- Around line 256-269: The span attribute GenAIAttributes.GEN_AI_OPERATION_NAME
is set to the non-canonical value "realtime"; update it to the canonical
OpenTelemetry GenAI operation name "chat" for OpenAI conversational realtime LLM
calls. Locate the span creation in start_audio_span (and the analogous realtime
span creation around lines 359-369) and change the attribute value from
"realtime" to "chat" so the attributes dict uses
GenAIAttributes.GEN_AI_OPERATION_NAME: "chat" while keeping other attributes and
span configuration unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b1239d1f-34d5-4765-b850-b8d0c166fc83
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py
ac4d39d to
6b78202
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)
323-333:⚠️ Potential issue | 🟠 MajorDeduping by content text alone can drop legitimate later turns.
Switching to
hash(content)over the full string (Line 327) fixes the earlier prefix-collision bug, but the dedupe key is still text-only. A valid later turn returning identical content (e.g.,"OK","Got it","Yes") is silently dropped, andcreate_llm_spanis never called — leaving the next unique completion to pair with a stale pending prompt and producing misaligned input/output spans.Scope the dedupe by a turn/response identifier (e.g.,
response_idoritem_idfrom the raw model event) so identical content across separate turns isn't conflated.🔧 Proposed approach
- def record_completion(self, role: str, content: str): + def record_completion(self, role: str, content: str, response_id: Optional[str] = None): """Record a completion message - creates an LLM span with prompt and completion.""" if not content: return - content_hash = hash(content) - if content_hash in self._seen_completions: + dedupe_key = (response_id, hash(content)) if response_id else hash(content) + if dedupe_key in self._seen_completions: return - self._seen_completions[content_hash] = None + self._seen_completions[dedupe_key] = None if len(self._seen_completions) > self._seen_completions_max: self._seen_completions.popitem(last=False) self.create_llm_span(content)Then thread the response/item id through the call sites in
traced_put_event(e.g.,response.doneprovides aresponse.id).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py` around lines 323 - 333, The dedupe currently uses only hash(content) in record_completion which will drop legitimate later turns with identical text; change the dedupe key to include the response/item identifier so identical content from different turns aren’t conflated (e.g., use a tuple like (response_id, content_hash) or (response_id, content) as the key in self._seen_completions and keep the existing LRU eviction logic), and thread that response/item id from traced_put_event (response.done -> response.id or item_id) into the record_completion call so create_llm_span is invoked per response instance.
268-269:⚠️ Potential issue | 🟠 Major
gen_ai.operation.name = "realtime"is not a valid well-known value.Both the audio span (Line 268) and the realtime LLM span (Line 367) still emit
"realtime". The OTel GenAI semconv defines well-known values:chat,create_agent,embeddings,execute_tool,generate_content,invoke_agent,retrieval,text_completion. Since the OpenAI Realtime API is a multi-turn conversational surface,"chat"is the correct value (mirroring how_start_generation_spanuses"chat"forResponseSpanDataper the test on Line 425/444 intest_tracing_processor.py).Based on learnings: In the OpenTelemetry GenAI semantic conventions,
gen_ai.operation.nameuses standard predefined values such as "chat", "text_completion", "embeddings", etc. — NOT method-specific names.OpenTelemetry GenAI semantic conventions gen_ai.operation.name well-known values realtime🔧 Proposed fix
@@ Line 267-270 (start_audio_span) attributes={ - GenAIAttributes.GEN_AI_OPERATION_NAME: "realtime", + GenAIAttributes.GEN_AI_OPERATION_NAME: "chat", GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai", }, @@ Line 366-370 (create_llm_span) attributes={ - GenAIAttributes.GEN_AI_OPERATION_NAME: "realtime", + GenAIAttributes.GEN_AI_OPERATION_NAME: "chat", GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai", GenAIAttributes.GEN_AI_REQUEST_MODEL: model_name_str, },Note: corresponding assertions in
test_realtime_session.py/test_realtime.pywill need updating accordingly.Also applies to: 367-368
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py` around lines 268 - 269, Replace the invalid gen_ai.operation.name value "realtime" with the well-known value "chat" where GenAIAttributes.GEN_AI_OPERATION_NAME is set for the realtime audio span and the realtime LLM span; update the assignments in the functions that create those spans (e.g., the methods that start the audio span and the realtime-LLM span where GenAIAttributes.GEN_AI_OPERATION_NAME is currently "realtime") to use "chat" and adjust the corresponding assertions in test_realtime_session.py / test_realtime.py to expect "chat".
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)
416-419: Stale comment block — consider removing.These three comment lines describe behavior that's no longer present in the surrounding code (no top-level
gen_ai.response.finish_reasonsis being conditionally set here). Either drop the comment or move it to where the omission actually happens, to avoid confusing future readers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py` around lines 416 - 419, Remove the stale three-line comment block in _realtime_wrappers.py (the lines referencing "Realtime API does not provide finish reasons" and top-level gen_ai.response.finish_reasons) because it no longer matches the surrounding code; either delete these lines or move them to the actual location where finish_reasons is conditionally omitted (refer to the handling in _hooks.py) so comments match the implementation.packages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py (1)
304-306: Assertion logic could be tightened for clarity.
assert "id" not in part or part["id"]is correct but a bit indirect — it allows both "key missing" and "key present with truthy value". Given the implementation (if tool_call_id: part["id"] = tool_call_id), the stronger and clearer assertion is simplyassert "id" not in part, which directly encodes the spec intent ("id must be omitted when call_id is absent").🔧 Proposed change
- assert "id" not in part or part["id"], ( - f"id must be absent or non-empty when call_id not provided, got: {part}" - ) + assert "id" not in part, ( + f"id must be omitted when call_id not provided, got: {part}" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py` around lines 304 - 306, The assertion in the test uses a permissive condition (`assert "id" not in part or part["id"]`) which is indirect; update the test to assert the stricter intended behavior by replacing that condition with a direct check that the "id" key is absent (i.e., assert "id" not in part) so it matches the implementation where `part["id"]` is only set when `tool_call_id` is truthy and must be omitted otherwise; locate the assertion referencing `part` in the test and change it accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_content_parts.py`:
- Around line 40-46: The test
test_data_url_no_mime_produces_blob_without_mime_type currently uses
"data:base64,abc123" which yields a non-empty mime extracted by _url_to_part;
update the test to use a data URL that produces an empty mime_part (e.g.
"data:,abc123" or "data:;base64,abc123") so the code path that omits mime_type
is exercised, and add an assertion that "mime_type" not in part (referencing
_dict_block and _url_to_part to locate the logic).
---
Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 323-333: The dedupe currently uses only hash(content) in
record_completion which will drop legitimate later turns with identical text;
change the dedupe key to include the response/item identifier so identical
content from different turns aren’t conflated (e.g., use a tuple like
(response_id, content_hash) or (response_id, content) as the key in
self._seen_completions and keep the existing LRU eviction logic), and thread
that response/item id from traced_put_event (response.done -> response.id or
item_id) into the record_completion call so create_llm_span is invoked per
response instance.
- Around line 268-269: Replace the invalid gen_ai.operation.name value
"realtime" with the well-known value "chat" where
GenAIAttributes.GEN_AI_OPERATION_NAME is set for the realtime audio span and the
realtime LLM span; update the assignments in the functions that create those
spans (e.g., the methods that start the audio span and the realtime-LLM span
where GenAIAttributes.GEN_AI_OPERATION_NAME is currently "realtime") to use
"chat" and adjust the corresponding assertions in test_realtime_session.py /
test_realtime.py to expect "chat".
---
Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 416-419: Remove the stale three-line comment block in
_realtime_wrappers.py (the lines referencing "Realtime API does not provide
finish reasons" and top-level gen_ai.response.finish_reasons) because it no
longer matches the surrounding code; either delete these lines or move them to
the actual location where finish_reasons is conditionally omitted (refer to the
handling in _hooks.py) so comments match the implementation.
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py`:
- Around line 304-306: The assertion in the test uses a permissive condition
(`assert "id" not in part or part["id"]`) which is indirect; update the test to
assert the stricter intended behavior by replacing that condition with a direct
check that the "id" key is absent (i.e., assert "id" not in part) so it matches
the implementation where `part["id"]` is only set when `tool_call_id` is truthy
and must be omitted otherwise; locate the assertion referencing `part` in the
test and change it accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2c3f06f6-66f0-4361-acaf-c99cdbd033ef
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-openai-agents/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (11)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.pypackages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.pypackages/opentelemetry-instrumentation-openai-agents/pyproject.tomlpackages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_handoff_span_operation_name.yamlpackages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_span_operation_name.yamlpackages/opentelemetry-instrumentation-openai-agents/tests/test_content_parts.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.pypackages/opentelemetry-instrumentation-openai-agents/tests/test_tracing_processor.py
✅ Files skipped from review due to trivial changes (5)
- packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
- packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_handoff_span_operation_name.yaml
- packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_span_operation_name.yaml
- packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py
- packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
Migrate openai-agents instrumentation to OTel GenAI semconv
Aligns the OpenAI Agents SDK instrumentation with the upstream OpenTelemetry GenAI semantic conventions.
Changes:
SpanAttributes.LLM_*constants with upstreamGenAIAttributes.GEN_AI_*equivalentsgen_ai.prompt.{i}.*/gen_ai.completion.{i}.*attributes togen_ai.input.messages/gen_ai.output.messagesJSON arraysgen_ai.tool.definitions.{i}.*attributes togen_ai.tool.definitionsJSON arraygen_ai.operation.namedict keys in_hooks.pygen_ai.systemkey in_realtime_wrappers.pySummary by CodeRabbit
New Features
Improvements
Tests