Skip to content

feat(openai-agents): GenAI semconv compliance#3837

Merged
max-deygin-traceloop merged 23 commits intomainfrom
max/tlp-1928-openai-agents-insturmentation
Apr 30, 2026
Merged

feat(openai-agents): GenAI semconv compliance#3837
max-deygin-traceloop merged 23 commits intomainfrom
max/tlp-1928-openai-agents-insturmentation

Conversation

@max-deygin-traceloop
Copy link
Copy Markdown
Contributor

@max-deygin-traceloop max-deygin-traceloop commented Mar 22, 2026

Migrate openai-agents instrumentation to OTel GenAI semconv

Aligns the OpenAI Agents SDK instrumentation with the upstream OpenTelemetry GenAI semantic conventions.

Changes:

  • Replace all SpanAttributes.LLM_* constants with upstream GenAIAttributes.GEN_AI_* equivalents
  • Migrate flat gen_ai.prompt.{i}.* / gen_ai.completion.{i}.* attributes to gen_ai.input.messages / gen_ai.output.messages JSON arrays
  • Migrate flat gen_ai.tool.definitions.{i}.* attributes to gen_ai.tool.definitions JSON array
  • Fix 5 duplicate gen_ai.operation.name dict keys in _hooks.py
  • Fix duplicate gen_ai.system key in _realtime_wrappers.py
  • Update all tests to use new attribute names and JSON array format

Note: This is a recreation of #3823, which was auto-closed when its target branch was merged into main. No code changes were made.

Summary by CodeRabbit

  • New Features

    • Bounded seen_completions property for realtime tracing state.
  • Improvements

    • Consolidated input/output into JSON parts-based message attributes and unified tool definitions/results into single JSON attributes.
    • Standardized provider/operation semantic attributes; normalized finish-reason reporting and token-usage attribution.
    • Changed agent/workflow span kinds to INTERNAL and harmonized realtime span semantics.
  • Tests

    • Added extensive semantic-convention and behavior tests for messaging, finish reasons, tooling, realtime, and span hierarchy.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 22, 2026

📝 Walkthrough

Walkthrough

Refactors OpenAI Agents instrumentation to adopt OTel GenAI semantic conventions: replaces indexed prompt/completion attributes with parts-based JSON gen_ai.input.messages/gen_ai.output.messages, normalizes finish reasons, consolidates tool definitions, restructures span start/end helpers, adjusts provider/operation/kind attributes, and bounds realtime deduplication.

Changes

Cohort / File(s) Summary
Instrumentation core
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
Large refactor: extracted span start/end helpers, updated span naming/kinds, removed legacy model-settings propagation, emit parts-based gen_ai.input.messages/gen_ai.output.messages, aggregate gen_ai.response.finish_reasons, and serialize gen_ai.tool.definitions.
Realtime wrappers
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
Replaced unbounded dedupe with LRU-capped map (1000), added seen_completions property, switched span kinds to INTERNAL, replaced legacy semconv attrs with gen_ai.provider.name/gen_ai.operation.name, and emit consolidated input/output messages and normalized usage attribute.
Tracing processor & helpers
packages/opentelemetry-instrumentation-openai-agents/.../tests/test_tracing_processor.py
New comprehensive tests for extracted helpers and end-to-end dispatch validating span creation, handoff registration, tool arg/result gating, JSON serialization, and error propagation.
Semconv/message serialization tests
packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py, .../tests/test_semconv_compliance.py, packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py
New/extended tests verifying parts-based message schema, part type mappings (multimodal), tool_call/tool_call_response normalization, finish-reason mapping/aggregation, and upstream semconv constants.
Finish-reason tests
packages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py
New unit tests asserting per-message finish_reason mapping, deduped top-level finish reason tuple, status→finish_reason mapping, and trace_content gating.
Realtime/session tests
packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py, .../tests/test_realtime_session.py
Updated assertions to expect gen_ai.provider.name == "openai" and JSON-parsed gen_ai.input.messages/gen_ai.output.messages; added seen_completions bounding checks and operation-name expectations.
Agent tests & hierarchy updates
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py, .../tests/test_recipe_agents_hierarchy.py
Updated span-kind and attribute expectations to new GenAI schema (provider/operation attrs, GEN_AI_USAGE_TOTAL_TOKENS, parts-based message attributes).
Content parts tests
packages/opentelemetry-instrumentation-openai-agents/tests/test_content_parts.py
New multimodal conversion tests covering image/audio data-uri → blob parts, URI parts, and mime-type mappings.
Utils/constants
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py
Added GEN_AI_HANDOFF_PARENT_AGENT and OPENAI_AGENT_HANDOFFS constants.
Cassettes & pyproject
packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/*, packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
Added new test cassettes for handoff/tool flows; raised test dependency openai-agents min version.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TracingProcessor
    participant SpanStore
    participant Exporter
    Client->>TracingProcessor: on_span_start(event)
    TracingProcessor->>TracingProcessor: _start_* helper (agent/handoff/tool/generation/realtime)
    TracingProcessor->>SpanStore: create span (provider/operation/kind attrs)
    Client->>TracingProcessor: on_span_end(event, data)
    TracingProcessor->>TracingProcessor: _end_* helper (extract inputs, outputs, tools, finish reasons)
    TracingProcessor->>SpanStore: set gen_ai.input.messages / gen_ai.output.messages / tool definitions / finish reasons
    TracingProcessor->>Exporter: export finalized spans
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

Suggested labels

Python SDK

Suggested reviewers

  • OzBenSimhonTraceloop

Poem

🐇 I hopped through traces, parts in tow,
JSON messages in tidy rows.
Handoffs traced and reasons true,
Tools and outputs wrapped anew —
A rabbit cheers for semconv's glow!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main objective of the PR: migrating the OpenAI Agents instrumentation to comply with upstream OpenTelemetry GenAI semantic conventions.
Docstring Coverage ✅ Passed Docstring coverage is 82.05% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch max/tlp-1928-openai-agents-insturmentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py (1)

1-8: Consider extending _testing.py to cover new message/tool attributes.

The shared compliance tests in opentelemetry.semconv_ai._testing (per the context snippet at packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py:1-56) currently validate renamed GEN_AI_* constants but do not include tests for:

  • GenAIAttributes.GEN_AI_INPUT_MESSAGES
  • GenAIAttributes.GEN_AI_OUTPUT_MESSAGES
  • GenAIAttributes.GEN_AI_TOOL_DEFINITIONS

These constants are imported from the upstream opentelemetry.semconv._incubating.attributes.gen_ai_attributes module and used heavily throughout _hooks.py and _realtime_wrappers.py. While the import will fail if the constants don't exist, explicit compliance tests would provide better validation and documentation.

Would you like me to draft compliance tests for these new upstream constants in _testing.py?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py`
around lines 1 - 8, Add explicit compliance assertions in the shared test module
_testing.py to verify the presence and expected values of the three new GenAI
constants: GenAIAttributes.GEN_AI_INPUT_MESSAGES,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, and
GenAIAttributes.GEN_AI_TOOL_DEFINITIONS; import these from
opentelemetry.semconv._incubating.attributes.gen_ai_attributes (same source used
by _hooks.py and _realtime_wrappers.py) and add simple equality/assertion checks
that the exported constants in opentelemetry.semconv_ai match the upstream
constants so the tests will fail if those names or values are missing or
renamed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py`:
- Around line 1-8: Add explicit compliance assertions in the shared test module
_testing.py to verify the presence and expected values of the three new GenAI
constants: GenAIAttributes.GEN_AI_INPUT_MESSAGES,
GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, and
GenAIAttributes.GEN_AI_TOOL_DEFINITIONS; import these from
opentelemetry.semconv._incubating.attributes.gen_ai_attributes (same source used
by _hooks.py and _realtime_wrappers.py) and add simple equality/assertion checks
that the exported constants in opentelemetry.semconv_ai match the upstream
constants so the tests will fail if those names or values are missing or
renamed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 55f0bea4-7b6a-434a-89a3-58e2c26ee263

📥 Commits

Reviewing files that changed from the base of the PR and between 3f2418b and af07493.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
💤 Files with no reviewable changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py

Comment thread packages/opentelemetry-instrumentation-openai-agents/pyproject.toml Outdated
Copy link
Copy Markdown
Member

@doronkopit5 doronkopit5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you succeed testing sending traces with this instrumentation using a sample-app?

@doronkopit5 doronkopit5 changed the title feat(openai-agents) GenAI semconv compliance feat(openai-agents): GenAI semconv compliance Mar 23, 2026
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1928-openai-agents-insturmentation branch from fe44b61 to aadaa1a Compare March 24, 2026 14:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

491-563: Consider standardizing or documenting non-standard operation names.

The gen_ai.operation.name values "speech", "transcription", and "speech_group" are not standard OpenTelemetry GenAI semantic convention values (which include "chat", "text_completion", "embeddings"). While these may be intentional extensions for OpenAI realtime operations, observability tools may not recognize them.

If these are intentional extensions, consider documenting them. Alternatively, consider whether "chat" could be used for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 491 - 563, The GenAI operation names used in _hooks.py
(GenAIAttributes.GEN_AI_OPERATION_NAME values "speech", "transcription",
"speech_group" set in the speech_attributes, transcription_attributes, and
speech_group_attributes objects and span names "openai.realtime.speech",
"openai.realtime.transcription", "openai.realtime.speech_group" in
tracer.start_span) are non-standard; either map them to standard GenAI semantic
convention values (e.g., use "chat" or another accepted value) or explicitly
document these as OpenAI realtime extensions. Update the attribute assignments
and span naming to use the chosen standardized token(s) (or add constants
documenting the extension), and add a concise comment/docstring near the
tracer.start_span calls explaining that these are intentional realtime
extensions if you keep them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 491-563: The GenAI operation names used in _hooks.py
(GenAIAttributes.GEN_AI_OPERATION_NAME values "speech", "transcription",
"speech_group" set in the speech_attributes, transcription_attributes, and
speech_group_attributes objects and span names "openai.realtime.speech",
"openai.realtime.transcription", "openai.realtime.speech_group" in
tracer.start_span) are non-standard; either map them to standard GenAI semantic
convention values (e.g., use "chat" or another accepted value) or explicitly
document these as OpenAI realtime extensions. Update the attribute assignments
and span naming to use the chosen standardized token(s) (or add constants
documenting the extension), and add a concise comment/docstring near the
tracer.start_span calls explaining that these are intentional realtime
extensions if you keep them.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e0cf5a2d-869d-40a2-9e91-3ccda106c0df

📥 Commits

Reviewing files that changed from the base of the PR and between fe44b61 and aadaa1a.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
💤 Files with no reviewable changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
✅ Files skipped from review due to trivial changes (2)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1928-openai-agents-insturmentation branch from aadaa1a to a3bed2a Compare March 30, 2026 13:28
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

460-463: ⚠️ Potential issue | 🔴 Critical

Replace deprecated gen_ai.system with gen_ai.provider.name for semantic conventions compliance.

The OpenTelemetry GenAI semantic conventions (post-2025) deprecated gen_ai.system in favor of gen_ai.provider.name. Line 462 and all other response/generation span attributes in this instrumentation currently use the deprecated GEN_AI_SYSTEM constant with value "openai". Since GEN_AI_PROVIDER_NAME is already available in the imported GenAIAttributes and the project targets a 2026 codebase, update to use GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai" instead.

This applies to all response/generation/realtime operation spans (lines 462, 480, 502, 528 in _hooks.py and lines 263, 356 in _realtime_wrappers.py). Agent-level spans correctly use gen_ai.system: "openai_agents" (per the learning, framework-level spans should distinguish themselves), but API response spans should reflect the underlying provider, not the framework.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 460 - 463, Replace uses of the deprecated
GenAIAttributes.GEN_AI_SYSTEM with GenAIAttributes.GEN_AI_PROVIDER_NAME for all
provider-level response/generation/realtime span attributes: update the
dictionary named response_attributes (and analogous generation_/realtime_
attribute dicts) to set GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai" instead
of GenAIAttributes.GEN_AI_SYSTEM: "openai"; keep agent-level spans that
intentionally use GEN_AI_SYSTEM as-is. Locate occurrences by searching for
GenAIAttributes.GEN_AI_SYSTEM and the attribute dicts (e.g.,
response_attributes, generation_attributes, realtime_*_attributes) in the
instrumentation hooks and realtime wrappers and swap the constant to
GenAIAttributes.GEN_AI_PROVIDER_NAME with the same "openai" value.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

636-645: Consider extracting message JSON construction to a helper.

The pattern json.dumps([{"role": ..., "content": ...}]) for single-element message arrays is repeated across multiple realtime span handlers (lines 636-638, 642-645, 655-658, 662-665, 676-678). While acceptable given the package independence requirement, consider a small local helper if this pattern grows further.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 636 - 645, The repeated pattern json.dumps([{"role": ...,
"content": ...}]) for constructing single-element message arrays should be moved
into a small local helper function (e.g., to_message_json(role: str, content:
Any) -> str) and used where the current code sets
GenAIAttributes.GEN_AI_INPUT_MESSAGES and GenAIAttributes.GEN_AI_OUTPUT_MESSAGES
in the realtime span handlers; update the calls that build the input JSON (using
input_text) and the output JSON (using getattr(span_data, "output", None) /
output_audio) to call the helper before passing the string to
otel_span.set_attribute so the logic is centralized and easier to maintain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 260-265: The span attributes currently set
GenAIAttributes.GEN_AI_OPERATION_NAME to "realtime"; update the attribute value
to the canonical "chat" to comply with OpenTelemetry GenAI semantic conventions.
Locate the span creation in the realtime wrappers (the code that calls
start_span or creates attributes with GenAIAttributes.GEN_AI_OPERATION_NAME in
_realtime_wrappers.py) and replace the string "realtime" with "chat", keeping
the surrounding attributes and parent_context intact so the span continues to be
created the same way.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 460-463: Replace uses of the deprecated
GenAIAttributes.GEN_AI_SYSTEM with GenAIAttributes.GEN_AI_PROVIDER_NAME for all
provider-level response/generation/realtime span attributes: update the
dictionary named response_attributes (and analogous generation_/realtime_
attribute dicts) to set GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai" instead
of GenAIAttributes.GEN_AI_SYSTEM: "openai"; keep agent-level spans that
intentionally use GEN_AI_SYSTEM as-is. Locate occurrences by searching for
GenAIAttributes.GEN_AI_SYSTEM and the attribute dicts (e.g.,
response_attributes, generation_attributes, realtime_*_attributes) in the
instrumentation hooks and realtime wrappers and swap the constant to
GenAIAttributes.GEN_AI_PROVIDER_NAME with the same "openai" value.

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 636-645: The repeated pattern json.dumps([{"role": ..., "content":
...}]) for constructing single-element message arrays should be moved into a
small local helper function (e.g., to_message_json(role: str, content: Any) ->
str) and used where the current code sets GenAIAttributes.GEN_AI_INPUT_MESSAGES
and GenAIAttributes.GEN_AI_OUTPUT_MESSAGES in the realtime span handlers; update
the calls that build the input JSON (using input_text) and the output JSON
(using getattr(span_data, "output", None) / output_audio) to call the helper
before passing the string to otel_span.set_attribute so the logic is centralized
and easier to maintain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 652d6dab-a4a7-466e-8849-b8f91aeb4ef9

📥 Commits

Reviewing files that changed from the base of the PR and between aadaa1a and a3bed2a.

📒 Files selected for processing (7)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
💤 Files with no reviewable changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
✅ Files skipped from review due to trivial changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

max-deygin-traceloop and others added 7 commits March 31, 2026 11:28
…LM_ → GEN_AI_

- Replace SpanAttributes.LLM_REQUEST_TYPE with GenAIAttributes.GEN_AI_OPERATION_NAME
- Replace SpanAttributes.LLM_REQUEST_FUNCTIONS with GenAIAttributes.GEN_AI_TOOL_DEFINITIONS
- Replace SpanAttributes.LLM_SYSTEM with GenAIAttributes.GEN_AI_SYSTEM
- Replace SpanAttributes.LLM_USAGE_TOTAL_TOKENS with SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS
- Add test_semconv_compliance.py and [tool.uv.sources] for local semconv_ai

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l definitions to JSON array

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…en test constants, remove duplicate GEN_AI_SYSTEM key

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ssage format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ve orphaned completion.tool.* attrs

- _realtime_wrappers.py: replace flat gen_ai.prompt.*/gen_ai.completion.* in
  create_llm_span() with GEN_AI_INPUT_MESSAGES/GEN_AI_OUTPUT_MESSAGES JSON arrays
- _hooks.py: remove gen_ai.completion.tool.{name,type,strict_json_schema} sub-attributes
  from FunctionSpanData handler (tool name already captured via GEN_AI_TOOL_NAME)
- tests: update test_realtime_session.py assertions to parse JSON array attributes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, remove AgentSpanData misfeature

- _hooks.py: remove misplaced catch-all 'elif span_data:' that was shadowing
  SpeechSpanData, TranscriptionSpanData, SpeechGroupSpanData, and AgentSpanData branches
- _hooks.py: remove AgentSpanData handler that incorrectly propagated model settings
  to agent spans (test spec: agent spans must NOT carry gen_ai.request.* params)
- _hooks.py: replace hardcoded "openai.agent.model.frequency_penalty" with
  GenAIAttributes.GEN_AI_REQUEST_FREQUENCY_PENALTY constant
- tests: fix dead "llm.usage.*" prefix check, fix vestigial "gen_ai.prompt" scan,
  fix hardcoded frequency_penalty string, fix long line in test_realtime_session.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1928-openai-agents-insturmentation branch from a3bed2a to 741bdce Compare March 31, 2026 08:29
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 255-260: The current branch stringifies any assistant `content`
before adding tool-call parts, which flattens block-list structures; update the
logic in the block that handles `content` and `tool_calls` (variables/functions:
parts, content, tool_calls, _stringify_content, _tool_call_to_part) to preserve
structured content: if `content` is a list (block list) extend `parts` with its
elements, if it is a dict append it as-is, otherwise fall back to calling
`_stringify_content` and appending a {"type": "text", "content": text} part;
after preserving/appending the structured content, then extend with
`_tool_call_to_part(tc)` for each tool call.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py`:
- Around line 900-906: Tests currently mutate MagicMock.__name__ to impersonate
ResponseSpanData which affects global state; instead create a lightweight stub
(e.g., a small class or types.SimpleNamespace) named ResponseSpanData and
instantiate it for response_data, set input and response on that instance, and
pass it to MockAgentSpan so you do not modify MagicMock.__name__ or global mock
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc14880d-9335-4d6f-a5a4-fcdc890ef878

📥 Commits

Reviewing files that changed from the base of the PR and between a3bed2a and 741bdce.

📒 Files selected for processing (8)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
✅ Files skipped from review due to trivial changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1928-openai-agents-insturmentation branch from 5fa66cb to 826a2a7 Compare March 31, 2026 10:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

53-70: ⚠️ Potential issue | 🟡 Minor

Actually parse these message attributes as JSON.

This test now targets gen_ai.input.messages / gen_ai.output.messages, but it still swallows JSONDecodeError and only attempts parsing for object-looking payloads. Because the new attrs are JSON arrays, malformed message payloads will still pass here.

Suggested fix
             if prompt_content_check:
                 # All content attributes should be strings, not dicts
                 error_msg = f"Attribute {attr_name} should be a string, got {type(attr_value)}: {attr_value}"
                 assert isinstance(attr_value, str), error_msg
 
-                # If it looks like JSON, verify it can be parsed
-                if attr_value.startswith("{") and attr_value.endswith("}"):
-                    try:
-                        json.loads(attr_value)
-                    except json.JSONDecodeError:
-                        # If it fails to parse, that's still fine - just not JSON
-                        pass
+                json.loads(attr_value)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`
around lines 53 - 70, The test currently checks gen_ai.input.messages /
gen_ai.output.messages only superficially and swallows JSONDecodeError; update
the loop that iterates spans → span.attributes (using variables span, attr_name,
attr_value and the keys "gen_ai.input.messages" and "gen_ai.output.messages") so
that after asserting attr_value is a str you always attempt to parse it with
json.loads (do not limit parsing to strings that start/end with "{" / "}") and
fail the test on json.JSONDecodeError (e.g., let the exception propagate or
assert False with the error), ensuring malformed JSON arrays do not pass
silently.
♻️ Duplicate comments (3)
packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py (1)

1047-1056: ⚠️ Potential issue | 🟠 Major

Remove the if raw_* guards so these semconv tests can actually fail.

A missing gen_ai.tool.definitions, gen_ai.input.messages, or gen_ai.output.messages currently turns into a pass. That makes the suite false-green on exactly the attributes this PR is supposed to lock down.

Suggested fix
         raw_defs = resp_span.attributes.get(GenAIAttributes.GEN_AI_TOOL_DEFINITIONS)
-        if raw_defs:
-            defs = json.loads(raw_defs)
-            assert len(defs) >= 1
-            tool_def = defs[0]
-            # Per spec: preserve source system's representation
-            assert "type" in tool_def, "Tool definition should preserve 'type' field"
-            assert tool_def["type"] == "function"
-            assert "function" in tool_def, "Tool definition should preserve 'function' wrapper"
+        assert raw_defs is not None
+        defs = json.loads(raw_defs)
+        assert len(defs) >= 1
+        tool_def = defs[0]
+        # Per spec: preserve source system's representation
+        assert "type" in tool_def, "Tool definition should preserve 'type' field"
+        assert tool_def["type"] == "function"
+        assert "function" in tool_def, "Tool definition should preserve 'function' wrapper"

Apply the same pattern to the realtime raw_input and raw_output checks below.

Also applies to: 1095-1100, 1131-1135, 1167-1175

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py`
around lines 1047 - 1056, The tests currently short-circuit when attributes are
missing because of conditional guards like "if raw_defs:"; remove those guards
and instead assert presence of the raw attribute before parsing (e.g., assert
raw_defs is not None or use assert raw_defs, then call json.loads(raw_defs) and
continue validating tool_def fields), and apply the same change for raw_input
and raw_output checks (references: raw_defs, raw_input, raw_output,
GenAIAttributes.GEN_AI_TOOL_DEFINITIONS, and the message assertions) so the
tests fail when the semconv attributes are absent; also make the same
replacement for the realtime raw_input/raw_output checks mentioned.
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

253-261: ⚠️ Potential issue | 🟠 Major

Preserve structured content when assistant messages also include tool_calls.

This branch stringifies content before appending tool-call parts. If an assistant message carries a block list here, the multimodal structure is flattened into one text blob and gen_ai.input.messages no longer matches the parts schema this PR is migrating to.

Suggested fix
     parts = []
     if tool_calls:
         if content is not None:
-            text = _stringify_content(content)
-            if text:
-                parts.append({"type": "text", "content": text})
+            if isinstance(content, list):
+                parts.extend(_content_to_parts(content))
+            elif isinstance(content, dict):
+                parts.append(_content_block_to_part(content))
+            else:
+                text = _stringify_content(content)
+                if text:
+                    parts.append({"type": "text", "content": text})
         parts.extend(_tool_call_to_part(tc) for tc in tool_calls)

Based on learnings: In the traceloop/openllmetry repository, the OTel GenAI semantic convention formal JSON schema for gen_ai.input.messages and gen_ai.output.messages uses {role, parts} messages.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 253 - 261, The code flattens structured assistant `content` into a
single text part when `tool_calls` exist by using _stringify_content, breaking
the intended {role, parts} schema; change the branch that handles tool_calls so
that if `content` is not None you call _content_to_parts(content) (or fall back
to _stringify_content only if that returns empty) and extend `parts` with those
parts before extending with _tool_call_to_part for each `tc`; update the block
around parts, tool_calls, content to use _content_to_parts, preserve original
structured parts, and keep _tool_call_to_part usage unchanged.
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)

349-358: ⚠️ Potential issue | 🟠 Major

Use "chat" for the realtime prompt/completion span.

create_llm_span() is the actual conversational model invocation. Emitting gen_ai.operation.name="realtime" here makes these spans semconv-incompatible and inconsistent with the rest of the package’s response spans.

Suggested fix
         span = self.tracer.start_span(
             "openai.realtime",
             kind=SpanKind.CLIENT,
             context=parent_context,
             start_time=start_time,
             attributes={
-                GenAIAttributes.GEN_AI_OPERATION_NAME: "realtime",
+                GenAIAttributes.GEN_AI_OPERATION_NAME: "chat",
                 GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai",
                 GenAIAttributes.GEN_AI_REQUEST_MODEL: model_name_str,
             },
         )

Based on learnings: In the OpenTelemetry GenAI semantic conventions, gen_ai.operation.name uses standard predefined values such as chat, text_completion, and embeddings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`
around lines 349 - 358, The span started in _realtime_wrappers.py is emitting
gen_ai.operation.name="realtime" which is semconv-incompatible; update the
attributes passed to self.tracer.start_span (the span variable created in that
block) to set GenAIAttributes.GEN_AI_OPERATION_NAME to "chat" instead of
"realtime" so the realtime prompt/completion span matches the package’s response
spans and OpenTelemetry GenAI conventions (the conversational invocation remains
handled by create_llm_span()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py`:
- Around line 53-70: The test currently checks gen_ai.input.messages /
gen_ai.output.messages only superficially and swallows JSONDecodeError; update
the loop that iterates spans → span.attributes (using variables span, attr_name,
attr_value and the keys "gen_ai.input.messages" and "gen_ai.output.messages") so
that after asserting attr_value is a str you always attempt to parse it with
json.loads (do not limit parsing to strings that start/end with "{" / "}") and
fail the test on json.JSONDecodeError (e.g., let the exception propagate or
assert False with the error), ensuring malformed JSON arrays do not pass
silently.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 253-261: The code flattens structured assistant `content` into a
single text part when `tool_calls` exist by using _stringify_content, breaking
the intended {role, parts} schema; change the branch that handles tool_calls so
that if `content` is not None you call _content_to_parts(content) (or fall back
to _stringify_content only if that returns empty) and extend `parts` with those
parts before extending with _tool_call_to_part for each `tc`; update the block
around parts, tool_calls, content to use _content_to_parts, preserve original
structured parts, and keep _tool_call_to_part usage unchanged.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 349-358: The span started in _realtime_wrappers.py is emitting
gen_ai.operation.name="realtime" which is semconv-incompatible; update the
attributes passed to self.tracer.start_span (the span variable created in that
block) to set GenAIAttributes.GEN_AI_OPERATION_NAME to "chat" instead of
"realtime" so the realtime prompt/completion span matches the package’s response
spans and OpenTelemetry GenAI conventions (the conversational invocation remains
handled by create_llm_span()).

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py`:
- Around line 1047-1056: The tests currently short-circuit when attributes are
missing because of conditional guards like "if raw_defs:"; remove those guards
and instead assert presence of the raw attribute before parsing (e.g., assert
raw_defs is not None or use assert raw_defs, then call json.loads(raw_defs) and
continue validating tool_def fields), and apply the same change for raw_input
and raw_output checks (references: raw_defs, raw_input, raw_output,
GenAIAttributes.GEN_AI_TOOL_DEFINITIONS, and the message assertions) so the
tests fail when the semconv attributes are absent; also make the same
replacement for the realtime raw_input/raw_output checks mentioned.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e89a76c-8bd4-4afb-a6d1-3ee1010f47ab

📥 Commits

Reviewing files that changed from the base of the PR and between a3bed2a and 6ea5fc0.

📒 Files selected for processing (9)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_processor_refactor.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
✅ Files skipped from review due to trivial changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_compliance.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_recipe_agents_hierarchy.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 61-66: The mapper _map_finish_reason currently returns the raw
unknown value (return _FINISH_REASON_MAP.get(raw, raw)); change the fallback to
an empty string so unknown/non-enum finish reasons do not leak raw values (e.g.,
use _FINISH_REASON_MAP.get(raw, "") instead), keep the existing None -> None
behavior if desired, and apply the same change to the other similar mappers
referenced in the file (the ones at the regions around lines 373-382 and
432-433) so all finish-reason mapping consistently defaults unknowns to "".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5b79bc03-1f47-4dee-a82b-e8e5b386a6c6

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea5fc0 and 4f95dfe.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py
✅ Files skipped from review due to trivial changes (3)
  • packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/_testing.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)

355-356: ⚠️ Potential issue | 🟠 Major

Use the canonical chat operation for the prompt/completion span.

This span is emitting gen_ai.input.messages and gen_ai.output.messages for a conversational exchange, so "realtime" is the wrong gen_ai.operation.name here. It should use the well-known chat operation value instead.

OpenTelemetry GenAI semantic conventions: what are the valid values for `gen_ai.operation.name`, and should a span representing a prompt/completion conversational exchange use `chat` rather than `realtime`?

Based on learnings: In the OpenTelemetry GenAI semantic conventions, gen_ai.operation.name uses standard predefined values such as "chat", "text_completion", "embeddings", etc. — NOT method-specific names.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`
around lines 355 - 356, The span in _realtime_wrappers.py is using a
non-canonical operation name ("realtime") for a prompt/completion conversational
exchange; update the mapping for GenAIAttributes.GEN_AI_OPERATION_NAME to use
the OpenTelemetry GenAI canonical chat value ("chat") wherever the span emits
gen_ai.input.messages and gen_ai.output.messages (look for the dict setting that
includes GenAIAttributes.GEN_AI_OPERATION_NAME and
GenAIAttributes.GEN_AI_PROVIDER_NAME) so the span correctly identifies the
operation as a chat prompt/completion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 218-219: The code in _realtime_wrappers.py sets GenAI tool call
results using str(output), producing Python reprs; change this to serialize
output as JSON (matching _end_function_span in _hooks.py) by JSON-encoding the
output before calling span.set_attribute (handle non-serializable objects safely
or fall back to a string). Locate the
span.set_attribute(GenAIAttributes.GEN_AI_TOOL_CALL_RESULT, str(output)) call
and replace str(output) with a json.dumps-based serialization (import json) that
mirrors the serialization approach used in _end_function_span, ensuring
consistent downstream parsing.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_processor_refactor.py`:
- Around line 154-229: The tests expect _start_handoff_span to create a span
named "{from} → {to}.handoff" with SpanKind.INTERNAL and to register reverse
handoffs in _reverse_handoffs_dict using the key "{to}:{trace_id}"; update the
implementation in _hooks.py (function _start_handoff_span) to (1) build the span
name using the " → " arrow and suffix ".handoff" (falling back to "unknown" for
None names), (2) create the span with kind SpanKind.INTERNAL instead of
SpanKind.CLIENT, (3) set the handoff-specific attributes
(SpanAttributes.TRACELOOP_SPAN_KIND="handoff", GEN_AI_HANDOFF_FROM_AGENT and
GEN_AI_HANDOFF_TO_AGENT, and GenAIAttributes.GEN_AI_PROVIDER_NAME="openai"), and
(4) register the reverse mapping into _reverse_handoffs_dict as
"{to}:{trace_id}" -> from_agent.
- Around line 69-143: Tests expect _start_agent_span (in _hooks.py) to create
spans named "{agent_name}.agent" (default "unknown_agent.agent"), set
SpanKind.INTERNAL, set GenAIAttributes.GEN_AI_AGENT_NAME to the agent name and
GenAIAttributes.GEN_AI_PROVIDER_NAME to "openai", serialize each handoff into
attributes "openai.agent.handoff{n}" (JSON), and if
processor._reverse_handoffs_dict contains "AgentName:trace_id" add and then
remove attribute "gen_ai.agent.handoff_parent" with that value; update
_start_agent_span to implement these behaviors and remove the old behavior that
created "agent_{agent_name}" spans, used GEN_AI_SYSTEM for agent name, and used
GenAIAttributes.GEN_AI_HANDOFF_FROM_AGENT for reverse handoffs so the function
matches the new test contract.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 355-356: The span in _realtime_wrappers.py is using a
non-canonical operation name ("realtime") for a prompt/completion conversational
exchange; update the mapping for GenAIAttributes.GEN_AI_OPERATION_NAME to use
the OpenTelemetry GenAI canonical chat value ("chat") wherever the span emits
gen_ai.input.messages and gen_ai.output.messages (look for the dict setting that
includes GenAIAttributes.GEN_AI_OPERATION_NAME and
GenAIAttributes.GEN_AI_PROVIDER_NAME) so the span correctly identifies the
operation as a chat prompt/completion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d70f79c0-b9c4-4f36-9b5f-a4bfbb081843

📥 Commits

Reviewing files that changed from the base of the PR and between 4f95dfe and 1a0bc38.

📒 Files selected for processing (5)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_processor_refactor.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
✅ Files skipped from review due to trivial changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1928-openai-agents-insturmentation branch from 8b617a0 to aa8707c Compare April 19, 2026 07:45
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 501-504: The code currently sets the non-upstream attribute
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS using usage.total_tokens in _hooks.py;
remove or stop emitting that upstream key and instead either (a) emit total
tokens under a custom/traceloop namespace (e.g. use a custom attribute name like
"traceloop.gen_ai.usage.total_tokens") when calling otel_span.set_attribute, or
(b) omit setting total_tokens entirely and rely on existing upstream attributes
SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS and
SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS so total can be derived downstream;
update the usage.total_tokens check and the otel_span.set_attribute call (the
lines referencing usage.total_tokens and
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS) accordingly.
- Around line 518-540: The _extract_tool_definitions function currently drops
tools that lack .function or .name; update it to first detect and preserve dicts
and Pydantic models (use isinstance(tool, dict) and check for pydantic.BaseModel
as in utils.py serialization helper) by appending their raw dict form (for
models call .dict() or the same serializer used in utils.py) into tool_defs
before doing attribute fallbacks; also accept params_json_schema as an
alternative to parameters (map params_json_schema -> parameters if present) and
ensure both function-form tools (function.parameters /
function.params_json_schema) and direct tools (parameters / params_json_schema)
are handled so no tool definitions are silently dropped in
_extract_tool_definitions.
- Around line 297-320: In _extract_prompt_attributes, handle the case where
ResponseSpanData.input is a string by converting string input into a single chat
message before iterating: detect if input_data is an instance of str and replace
it with a list containing one element representing a user message (so subsequent
_msg_to_dict/_convert_chat_message paths work); keep existing behavior for list
and None. Update references in _extract_prompt_attributes (and ensure
compatibility with _msg_to_dict, _convert_chat_message, and
_convert_agents_sdk_message) so gen_ai.input.messages is emitted for string
inputs the same way realtime spans do.
- Around line 151-155: The helper _stringify_content can raise via json.dumps
for arbitrary objects, causing on_span_end to abort before otel_span.end(); make
_stringify_content exception-safe by wrapping json.dumps(content) in a
try/except that catches Exception and returns a safe fallback (e.g.,
repr(content) or a short "<unserializable: ...>" string), and ensure the
function always returns a str and never raises; update references to
_stringify_content in on_span_end to rely on this non-throwing behavior so spans
are always closed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c066b7d6-fb7a-4100-acd8-948f0dc7701f

📥 Commits

Reviewing files that changed from the base of the PR and between 1a0bc38 and aa8707c.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (4)

310-323: ⚠️ Potential issue | 🟠 Major

Handle string ResponseSpanData.input before iterating.

A plain string input is iterated character-by-character here, so no gen_ai.input.messages is emitted for common Runner.run(..., "prompt") inputs. Normalize strings into a single user message first.

🐛 Proposed fix
+def _iter_input_messages(input_data):
+    if isinstance(input_data, str):
+        return [{"role": "user", "content": input_data}]
+    if isinstance(input_data, dict):
+        return [input_data]
+    return input_data
+
+
 def _extract_prompt_attributes(otel_span, input_data, trace_content: bool):
@@
     messages = []
-    for message in input_data:
+    for message in _iter_input_messages(input_data):
         msg = _msg_to_dict(message)
#!/bin/bash
set -euo pipefail
python - <<'PY'
input_data = "What is AI?"
print("Current loop items:", list(input_data[:5]))
print("Expected normalized message:", [{"role": "user", "content": input_data}])
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 310 - 323, _input_data may be a plain string (e.g.
ResponseSpanData.input) and the current loop iterates characters; update
_extract_prompt_attributes to detect when input_data is a str and normalize it
into a list containing a single message dict (e.g. {"role": "user", "content":
input_data"}) before the for message in input_data loop so _msg_to_dict and
subsequent logic operate on message objects rather than characters; preserve the
existing early return when not input_data or not trace_content and ensure other
callers of _extract_prompt_attributes still receive the same attribute output
shape.

535-565: ⚠️ Potential issue | 🟠 Major

Preserve dict/Pydantic tool definitions instead of dropping them.

_extract_tool_definitions() only handles .function and .name attributes, so source-system dict or model-shaped tool definitions are silently omitted from gen_ai.tool.definitions. Preserve already-structured tool definitions before falling back to attribute extraction. Based on learnings, instrumentation packages should leverage semantic conventions to generate spans and tracing data compliant with OpenTelemetry semantic conventions.

🐛 Proposed fix
 def _extract_tool_definitions(tools):
@@
     tool_defs = []
     for tool in tools:
-        if hasattr(tool, "function"):
+        if isinstance(tool, dict):
+            tool_defs.append(tool)
+            continue
+        if hasattr(tool, "model_dump"):
+            tool_defs.append(tool.model_dump(exclude_none=True))
+            continue
+        if hasattr(tool, "dict"):
+            tool_defs.append(tool.dict(exclude_none=True))
+            continue
+
+        if hasattr(tool, "function"):
             function = tool.function
             func_def = {
                 "name": getattr(function, "name", ""),
                 "description": getattr(function, "description", ""),
             }
             if hasattr(function, "parameters"):
                 func_def["parameters"] = function.parameters
+            elif hasattr(function, "params_json_schema"):
+                func_def["parameters"] = function.params_json_schema
@@
             if hasattr(tool, "parameters"):
                 func_def["parameters"] = tool.parameters
+            elif hasattr(tool, "params_json_schema"):
+                func_def["parameters"] = tool.params_json_schema
OpenTelemetry GenAI semantic conventions gen_ai.tool.definitions source system tool definition format
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 535 - 565, The _extract_tool_definitions function currently only
extracts tools via .function or .name and thus drops tools that are already
dicts or Pydantic/model-shaped objects; update _extract_tool_definitions to
first detect and preserve already-structured definitions (e.g., if tool is
instance of dict or has a .dict() / .model_dump() / .model_dump_json() method)
and append that serialized representation directly to tool_defs before falling
back to attribute-based extraction (the code paths referencing tool.function,
tool.name, function.parameters, and building func_def/tool_def should remain for
non-structured tools).

152-156: ⚠️ Potential issue | 🟠 Major

Make JSON serialization exception-safe before ending spans.

json.dumps() can raise for arbitrary tool inputs/outputs or content objects; in on_span_end, that skips otel_span.end() and leaks the span because @dont_throw catches the method-level exception. Use a safe serializer everywhere these content attributes are built.

🐛 Proposed fix
+def _safe_json_dumps(value) -> str:
+    try:
+        return json.dumps(value, default=str)
+    except (TypeError, ValueError):
+        return str(value)
+
+
 def _stringify_content(content) -> str:
     """Coerce non-string content to a string for simple text parts."""
     if isinstance(content, str):
         return content
-    return json.dumps(content)
+    return _safe_json_dumps(content)
@@
         if tool_input is not None:
             otel_span.set_attribute(
                 GenAIAttributes.GEN_AI_TOOL_CALL_ARGUMENTS,
-                tool_input if isinstance(tool_input, str) else json.dumps(tool_input),
+                tool_input if isinstance(tool_input, str) else _safe_json_dumps(tool_input),
             )
@@
         if tool_output is not None:
             otel_span.set_attribute(
                 GenAIAttributes.GEN_AI_TOOL_CALL_RESULT,
-                tool_output if isinstance(tool_output, str) else json.dumps(tool_output),
+                tool_output if isinstance(tool_output, str) else _safe_json_dumps(tool_output),
             )
#!/bin/bash
set -euo pipefail
python - <<'PY'
import json

class NonSerializable:
    pass

for value in [NonSerializable(), {"bad": NonSerializable()}, {1, 2, 3}]:
    try:
        json.dumps(value)
    except Exception as exc:
        print(type(value).__name__, "raises", type(exc).__name__)
PY

Also applies to: 912-924

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 152 - 156, The _stringify_content helper currently calls json.dumps
directly which can raise and prevent otel spans from ending; create a small safe
serializer (e.g., safe_json_serialize(value) or _safe_stringify) that wraps
json.dumps in a try/except and falls back to repr(value) or a fixed
"<unserializable>" string on error, then replace json.dumps calls in
_stringify_content and all places that build content for spans (notably
on_span_end) to use this safe serializer so serialization failures never
propagate and spans are always ended.

67-71: ⚠️ Potential issue | 🟠 Major

Do not emit raw unknown finish reasons.

_map_finish_reason() still returns unmapped provider values as-is, so unknown values can leak into gen_ai.response.finish_reasons and output-message finish_reason. Default unknowns to None/empty string instead. Based on learnings, follow the OpenTelemetry GenAI semantic specification at https://opentelemetry.io/docs/specs/semconv/gen-ai/.

🐛 Proposed fix
 def _map_finish_reason(raw):
     """Map a provider-specific finish reason to the OTel enum value."""
     if raw is None:
         return None
-    return _FINISH_REASON_MAP.get(raw, raw)
+    return _FINISH_REASON_MAP.get(raw)
OpenTelemetry GenAI semantic conventions gen_ai.response.finish_reasons enum values unknown finish_reason

Also applies to: 390-501

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 67 - 71, The _map_finish_reason function currently returns unmapped
provider values as-is; change it so that if raw is not in _FINISH_REASON_MAP it
returns None (or an empty string where callers expect strings) instead of
returning raw, ensuring unknown provider finish reasons are not emitted into
gen_ai.response.finish_reasons or output-message finish_reason; update the
_map_finish_reason(raw) implementation to use _FINISH_REASON_MAP.get(raw) and
default to None, and audit the other uses referenced (the same mapping logic
around the sections noted ~390-501) to apply the same defaulting behavior so
only known enum values from _FINISH_REASON_MAP are emitted.
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)

266-266: ⚠️ Potential issue | 🟠 Major

Use a well-known GenAI operation name for realtime LLM spans.

These spans serialize chat-style input/output messages, so "chat" is the closest well-known gen_ai.operation.name; "realtime" remains a custom value and was already flagged previously. Based on learnings, gen_ai.operation.name uses standard predefined values such as "chat", "text_completion", "embeddings", etc., and similar chat-like operations should use "chat".

OpenTelemetry GenAI semantic conventions gen_ai.operation.name well-known values chat realtime

Also applies to: 365-365

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`
at line 266, Replace the custom operation name "realtime" with the well-known
GenAI semantic convention value "chat" for realtime LLM spans: update the
assignment setting GenAIAttributes.GEN_AI_OPERATION_NAME in
_realtime_wrappers.py (the occurrences that currently set it to "realtime") to
"chat" so spans use the standard gen_ai.operation.name; apply the same change to
the other occurrence in the file (the second
GenAIAttributes.GEN_AI_OPERATION_NAME instance).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 761-770: The handoff instructions are being added unconditionally
to attributes["openai.agent.handoffs"], which leaks prompt content; update the
loop that builds handoffs_list so it only includes the "instructions" field when
should_send_prompts() returns true—otherwise omit that key entirely (i.e., add
"instructions" only if should_send_prompts() is enabled for each handoff_agent).
Ensure you call should_send_prompts() (import or reference the existing helper)
and then json.dumps the resulting handoffs_list as before.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 325-330: The dedup logic currently hashes only content[:100],
causing false positives; update the logic in the RealTime wrapper that sets
content_hash (refer to the variables content, content_hash,
self._seen_completions and self._seen_completions_max) to compute a stable
digest of the entire content instead of the 100-char prefix (e.g., use
hashlib.sha256 on content.encode("utf-8") and take hexdigest() or a truncated
stable digest) and use that digest as the key when checking/adding to
self._seen_completions and evicting oldest entries.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 310-323: _input_data may be a plain string (e.g.
ResponseSpanData.input) and the current loop iterates characters; update
_extract_prompt_attributes to detect when input_data is a str and normalize it
into a list containing a single message dict (e.g. {"role": "user", "content":
input_data"}) before the for message in input_data loop so _msg_to_dict and
subsequent logic operate on message objects rather than characters; preserve the
existing early return when not input_data or not trace_content and ensure other
callers of _extract_prompt_attributes still receive the same attribute output
shape.
- Around line 535-565: The _extract_tool_definitions function currently only
extracts tools via .function or .name and thus drops tools that are already
dicts or Pydantic/model-shaped objects; update _extract_tool_definitions to
first detect and preserve already-structured definitions (e.g., if tool is
instance of dict or has a .dict() / .model_dump() / .model_dump_json() method)
and append that serialized representation directly to tool_defs before falling
back to attribute-based extraction (the code paths referencing tool.function,
tool.name, function.parameters, and building func_def/tool_def should remain for
non-structured tools).
- Around line 152-156: The _stringify_content helper currently calls json.dumps
directly which can raise and prevent otel spans from ending; create a small safe
serializer (e.g., safe_json_serialize(value) or _safe_stringify) that wraps
json.dumps in a try/except and falls back to repr(value) or a fixed
"<unserializable>" string on error, then replace json.dumps calls in
_stringify_content and all places that build content for spans (notably
on_span_end) to use this safe serializer so serialization failures never
propagate and spans are always ended.
- Around line 67-71: The _map_finish_reason function currently returns unmapped
provider values as-is; change it so that if raw is not in _FINISH_REASON_MAP it
returns None (or an empty string where callers expect strings) instead of
returning raw, ensuring unknown provider finish reasons are not emitted into
gen_ai.response.finish_reasons or output-message finish_reason; update the
_map_finish_reason(raw) implementation to use _FINISH_REASON_MAP.get(raw) and
default to None, and audit the other uses referenced (the same mapping logic
around the sections noted ~390-501) to apply the same defaulting behavior so
only known enum values from _FINISH_REASON_MAP are emitted.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Line 266: Replace the custom operation name "realtime" with the well-known
GenAI semantic convention value "chat" for realtime LLM spans: update the
assignment setting GenAIAttributes.GEN_AI_OPERATION_NAME in
_realtime_wrappers.py (the occurrences that currently set it to "realtime") to
"chat" so spans use the standard gen_ai.operation.name; apply the same change to
the other occurrence in the file (the second
GenAIAttributes.GEN_AI_OPERATION_NAME instance).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a8c4e28-e177-4197-82c7-5b09688d908c

📥 Commits

Reviewing files that changed from the base of the PR and between aa8707c and b6b02aa.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_tracing_processor.py
✅ Files skipped from review due to trivial changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_realtime_session.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

693-723: ⚠️ Potential issue | 🟠 Major

Guarantee span cleanup with finally.

Any exception from _end_generation_span, _end_function_span, realtime attribute extraction, or serialization exits before Line 719 and Line 722. Since @dont_throw swallows it, the OTel span remains open and the context token stays attached.

Proposed structure
         if span in self._otel_spans:
             otel_span = self._otel_spans[span]
-            span_data = getattr(span, "span_data", None)
-            trace_content = should_send_prompts()
-            if span_data and (
-                type(span_data).__name__ == "ResponseSpanData"
-                or isinstance(span_data, GenerationSpanData)
-            ):
-                self._end_generation_span(otel_span, span_data, trace_content)
-
-            elif span_data and isinstance(span_data, FunctionSpanData):
-                self._end_function_span(otel_span, span_data, trace_content)
-
-            elif trace_content and span_data and _has_realtime_spans:
-                ...
-
-            if hasattr(span, "error") and span.error:
-                otel_span.set_status(Status(StatusCode.ERROR, str(span.error)))
-            else:
-                otel_span.set_status(Status(StatusCode.OK))
-
-            otel_span.end()
-            del self._otel_spans[span]
-            if span in self._span_contexts:
-                context.detach(self._span_contexts[span])
-                del self._span_contexts[span]
+            try:
+                span_data = getattr(span, "span_data", None)
+                trace_content = should_send_prompts()
+                if span_data and (
+                    type(span_data).__name__ == "ResponseSpanData"
+                    or isinstance(span_data, GenerationSpanData)
+                ):
+                    self._end_generation_span(otel_span, span_data, trace_content)
+                elif span_data and isinstance(span_data, FunctionSpanData):
+                    self._end_function_span(otel_span, span_data, trace_content)
+                elif trace_content and span_data and _has_realtime_spans:
+                    ...
+
+                if hasattr(span, "error") and span.error:
+                    otel_span.set_status(Status(StatusCode.ERROR, str(span.error)))
+                else:
+                    otel_span.set_status(Status(StatusCode.OK))
+            except Exception as exc:
+                otel_span.set_status(Status(StatusCode.ERROR, str(exc)))
+            finally:
+                otel_span.end()
+                del self._otel_spans[span]
+                if span in self._span_contexts:
+                    context.detach(self._span_contexts[span])
+                    del self._span_contexts[span]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 693 - 723, The span finalization and context cleanup must run in a
finally block to guarantee otel_span.end() and removal from self._otel_spans /
self._span_contexts even if _end_generation_span, _end_function_span,
_set_realtime_io_attributes, serialization, or other calls throw; refactor the
logic inside the if span in self._otel_spans: block so that all processing
(inspecting span_data, calling
_end_generation_span/_end_function_span/_set_realtime_io_attributes and setting
otel_span status) happens in try and the otel_span.set_status(...),
otel_span.end(), del self._otel_spans[span], and context.detach(...) / del
self._span_contexts[span] happen in finally, using the existing symbols
(self._otel_spans, self._span_contexts, _end_generation_span,
_end_function_span, _set_realtime_io_attributes, should_send_prompts, span_data,
otel_span) to locate and modify the code.
♻️ Duplicate comments (7)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (5)

67-71: ⚠️ Potential issue | 🟠 Major

Do not emit raw unknown finish reasons.

Line 71 still returns unmapped provider values as-is, which can put non-enum values into gen_ai.response.finish_reasons and output-message finish_reason. Return None/empty instead so unknowns are omitted or serialized as "".

Proposed fix
 def _map_finish_reason(raw):
     """Map a provider-specific finish reason to the OTel enum value."""
     if raw is None:
         return None
-    return _FINISH_REASON_MAP.get(raw, raw)
+    return _FINISH_REASON_MAP.get(raw)

The OpenTelemetry registry describes gen_ai.response.finish_reasons as the array of reasons received and examples use the defined finish-reason values rather than provider-specific fallbacks: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. Based on learnings, Follow the OpenTelemetry GenAI semantic specification at https://opentelemetry.io/docs/specs/semconv/gen-ai/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 67 - 71, The _map_finish_reason function currently returns unmapped
provider-specific values (returning raw) which can violate the GenAI enum for
gen_ai.response.finish_reasons; update _map_finish_reason to only return mapped
values from _FINISH_REASON_MAP and return None (or empty string) when raw is not
found or is unknown so unknown finish reasons are omitted/serialized as empty,
ensuring only defined enum values are emitted for gen_ai.response.finish_reasons
and related finish_reason output fields.

310-323: ⚠️ Potential issue | 🟠 Major

Handle string input_data before iterating.

If ResponseSpanData.input is a string, Line 322 iterates characters and emits no gen_ai.input.messages. Normalize strings to a single user message first, matching the realtime path.

Proposed fix
+def _iter_input_messages(input_data):
+    if isinstance(input_data, str):
+        return [{"role": "user", "content": input_data}]
+    if isinstance(input_data, dict):
+        return [input_data]
+    return input_data
+
+
 def _extract_prompt_attributes(otel_span, input_data, trace_content: bool):
@@
     messages = []
-    for message in input_data:
+    for message in _iter_input_messages(input_data):
         msg = _msg_to_dict(message)

Based on learnings, the OTel GenAI semantic convention formal JSON schema for gen_ai.input.messages uses {"role": "...", "parts": [{"type": "text", "content": "..."}]} format.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 310 - 323, In _extract_prompt_attributes, handle the case where
input_data is a plain string before the loop: if isinstance(input_data, str)
convert it into a single-message structure matching the GenAI schema (e.g., a
user message with role "user" and parts = [{"type":"text","content":
input_data}]) so that the subsequent for message in input_data and calls to
_msg_to_dict produce a single gen_ai.input.messages entry; add this
normalization immediately before the messages = [] / for message in input_data
block.

527-530: ⚠️ Potential issue | 🟡 Minor

Avoid emitting non-upstream gen_ai.usage.total_tokens.

The migration goal is upstream GenAI semconv compliance, but Line 529 still emits SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS. The current OpenTelemetry GenAI registry lists input/output usage attributes (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) and not a total-token span attribute; total can be derived downstream or emitted under a custom namespace.

Reference: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. Based on learnings, Follow the OpenTelemetry GenAI semantic specification at https://opentelemetry.io/docs/specs/semconv/gen-ai/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 527 - 530, The code currently sets a non-upstream attribute
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS using usage.total_tokens in the
otel_span.set_attribute call; remove the emission of GEN_AI_USAGE_TOTAL_TOKENS
and instead emit only the upstream-compliant attributes (e.g.,
gen_ai.usage.input_tokens and gen_ai.usage.output_tokens) when available via
usage.input_tokens / usage.output_tokens, or omit emitting total tokens entirely
(or emit it under a custom namespace) to comply with the OpenTelemetry GenAI
semconv; update the logic around usage.total_tokens, otel_span.set_attribute,
and SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS accordingly.

152-156: ⚠️ Potential issue | 🟠 Major

Make content/tool serialization non-throwing.

json.dumps() can raise for arbitrary SDK/tool objects. Because these paths run inside on_span_end, a serialization failure can skip attributes and, without stronger cleanup, interrupt span lifecycle handling.

Proposed fix
+def _safe_json_dumps(value) -> str:
+    try:
+        return json.dumps(value, default=str)
+    except (TypeError, ValueError):
+        return str(value)
+
+
 def _stringify_content(content) -> str:
     """Coerce non-string content to a string for simple text parts."""
     if isinstance(content, str):
         return content
-    return json.dumps(content)
+    return _safe_json_dumps(content)
@@
         if tool_input is not None:
             otel_span.set_attribute(
                 GenAIAttributes.GEN_AI_TOOL_CALL_ARGUMENTS,
-                tool_input if isinstance(tool_input, str) else json.dumps(tool_input),
+                tool_input if isinstance(tool_input, str) else _safe_json_dumps(tool_input),
             )
@@
         if tool_output is not None:
             otel_span.set_attribute(
                 GenAIAttributes.GEN_AI_TOOL_CALL_RESULT,
-                tool_output if isinstance(tool_output, str) else json.dumps(tool_output),
+                tool_output if isinstance(tool_output, str) else _safe_json_dumps(tool_output),
             )

Also applies to: 913-925

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 152 - 156, The _stringify_content function currently uses
json.dumps(content) which can raise for arbitrary SDK/tool objects and thereby
disrupt on_span_end handling; update _stringify_content (and the analogous
serialization in the other block around the 913-925 range) to catch exceptions
from json.dumps and fall back to a safe, non-throwing representation (e.g., try
json.dumps with default=str inside a try/except, on exception return
repr(content) or str(content) and if that also fails return a fixed placeholder
like "<unserializable>"); ensure the function returns a string in all cases so
span attribute setting cannot throw.

535-565: ⚠️ Potential issue | 🟠 Major

Preserve dict/Pydantic tool definitions.

This only handles objects with .function or .name; dict/Pydantic tool definitions are silently dropped. gen_ai.tool.definitions should preserve the source-system tool definition objects when available.

Also handle params_json_schema as an alias for parameters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`
around lines 535 - 565, _extract_tool_definitions currently only handles objects
with .function or .name and drops dict/Pydantic tool definitions; update that
function to also detect and preserve dict-like or Pydantic definitions by
returning them as-is (or their .dict() for Pydantic models) into the tool_defs
list instead of ignoring them, and treat a params_json_schema key as an alias
for parameters (copy params_json_schema into parameters when parameters is
missing). Locate the code in _extract_tool_definitions, add an early branch like
"if isinstance(tool, dict) or has attr 'dict' (Pydantic): append the original
dict/pydantic dict to tool_defs", and when building func_def handle both
"parameters" and "params_json_schema" keys (prefer parameters, fallback to
params_json_schema) so gen_ai.tool.definitions preserves source-system
definitions.
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)

386-390: ⚠️ Potential issue | 🟡 Minor

Avoid emitting non-upstream gen_ai.usage.total_tokens here too.

Realtime spans still set SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS. For upstream GenAI semconv compliance, prefer only gen_ai.usage.input_tokens and gen_ai.usage.output_tokens, or move total tokens under a custom Traceloop namespace.

Reference: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. Based on learnings, Follow the OpenTelemetry GenAI semantic specification at https://opentelemetry.io/docs/specs/semconv/gen-ai/.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`
around lines 386 - 390, The realtime wrapper currently sets the non-upstream
attribute SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS via span.set_attribute;
remove that call and instead emit only upstream GenAI semconv attributes
(SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS and
SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) when those values exist in
self.pending_usage, or if you must keep a total, write it under a Traceloop
custom namespace (e.g., "traceloop.gen_ai.usage.total_tokens") rather than using
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS; update the code around the
span.set_attribute call where self.pending_usage["total_tokens"] is referenced
to perform these replacements and guard emissions behind existence checks.

256-269: ⚠️ Potential issue | 🟠 Major

Use the canonical operation name for realtime LLM calls.

Both realtime spans still emit gen_ai.operation.name = "realtime". The OpenTelemetry GenAI registry lists predefined values and says applicable predefined values should be used; for OpenAI conversational realtime LLM calls, use "chat" unless this package documents a system-specific operation.

Proposed fix
-                GenAIAttributes.GEN_AI_OPERATION_NAME: "realtime",
+                GenAIAttributes.GEN_AI_OPERATION_NAME: "chat",

Reference: https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/. Based on learnings, In the OpenTelemetry GenAI semantic conventions, gen_ai.operation.name uses standard predefined values such as "chat", "text_completion", "embeddings", etc.

Also applies to: 359-369

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`
around lines 256 - 269, The span attribute GenAIAttributes.GEN_AI_OPERATION_NAME
is set to the non-canonical value "realtime"; update it to the canonical
OpenTelemetry GenAI operation name "chat" for OpenAI conversational realtime LLM
calls. Locate the span creation in start_audio_span (and the analogous realtime
span creation around lines 359-369) and change the attribute value from
"realtime" to "chat" so the attributes dict uses
GenAIAttributes.GEN_AI_OPERATION_NAME: "chat" while keeping other attributes and
span configuration unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 321-331: The record_completion method currently deduplicates
completions solely by hash(content) using _seen_completions, which can drop
legitimate later replies and cause prompt/completion misalignment; change the
dedupe key to include a unique response or turn identifier (if available from
the event) or include the turn/session id so duplicates are scoped per turn.
Update record_completion to build a composite key (e.g., (response_id or
turn_id, hash(content))) when checking and inserting into _seen_completions,
keep the eviction logic with _seen_completions_max, and ensure
create_llm_span(content) is still called for non-duplicate composite keys.
Reference: record_completion, _seen_completions, _seen_completions_max,
create_llm_span.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 693-723: The span finalization and context cleanup must run in a
finally block to guarantee otel_span.end() and removal from self._otel_spans /
self._span_contexts even if _end_generation_span, _end_function_span,
_set_realtime_io_attributes, serialization, or other calls throw; refactor the
logic inside the if span in self._otel_spans: block so that all processing
(inspecting span_data, calling
_end_generation_span/_end_function_span/_set_realtime_io_attributes and setting
otel_span status) happens in try and the otel_span.set_status(...),
otel_span.end(), del self._otel_spans[span], and context.detach(...) / del
self._span_contexts[span] happen in finally, using the existing symbols
(self._otel_spans, self._span_contexts, _end_generation_span,
_end_function_span, _set_realtime_io_attributes, should_send_prompts, span_data,
otel_span) to locate and modify the code.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py`:
- Around line 67-71: The _map_finish_reason function currently returns unmapped
provider-specific values (returning raw) which can violate the GenAI enum for
gen_ai.response.finish_reasons; update _map_finish_reason to only return mapped
values from _FINISH_REASON_MAP and return None (or empty string) when raw is not
found or is unknown so unknown finish reasons are omitted/serialized as empty,
ensuring only defined enum values are emitted for gen_ai.response.finish_reasons
and related finish_reason output fields.
- Around line 310-323: In _extract_prompt_attributes, handle the case where
input_data is a plain string before the loop: if isinstance(input_data, str)
convert it into a single-message structure matching the GenAI schema (e.g., a
user message with role "user" and parts = [{"type":"text","content":
input_data}]) so that the subsequent for message in input_data and calls to
_msg_to_dict produce a single gen_ai.input.messages entry; add this
normalization immediately before the messages = [] / for message in input_data
block.
- Around line 527-530: The code currently sets a non-upstream attribute
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS using usage.total_tokens in the
otel_span.set_attribute call; remove the emission of GEN_AI_USAGE_TOTAL_TOKENS
and instead emit only the upstream-compliant attributes (e.g.,
gen_ai.usage.input_tokens and gen_ai.usage.output_tokens) when available via
usage.input_tokens / usage.output_tokens, or omit emitting total tokens entirely
(or emit it under a custom namespace) to comply with the OpenTelemetry GenAI
semconv; update the logic around usage.total_tokens, otel_span.set_attribute,
and SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS accordingly.
- Around line 152-156: The _stringify_content function currently uses
json.dumps(content) which can raise for arbitrary SDK/tool objects and thereby
disrupt on_span_end handling; update _stringify_content (and the analogous
serialization in the other block around the 913-925 range) to catch exceptions
from json.dumps and fall back to a safe, non-throwing representation (e.g., try
json.dumps with default=str inside a try/except, on exception return
repr(content) or str(content) and if that also fails return a fixed placeholder
like "<unserializable>"); ensure the function returns a string in all cases so
span attribute setting cannot throw.
- Around line 535-565: _extract_tool_definitions currently only handles objects
with .function or .name and drops dict/Pydantic tool definitions; update that
function to also detect and preserve dict-like or Pydantic definitions by
returning them as-is (or their .dict() for Pydantic models) into the tool_defs
list instead of ignoring them, and treat a params_json_schema key as an alias
for parameters (copy params_json_schema into parameters when parameters is
missing). Locate the code in _extract_tool_definitions, add an early branch like
"if isinstance(tool, dict) or has attr 'dict' (Pydantic): append the original
dict/pydantic dict to tool_defs", and when building func_def handle both
"parameters" and "params_json_schema" keys (prefer parameters, fallback to
params_json_schema) so gen_ai.tool.definitions preserves source-system
definitions.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 386-390: The realtime wrapper currently sets the non-upstream
attribute SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS via span.set_attribute;
remove that call and instead emit only upstream GenAI semconv attributes
(SpanAttributes.GEN_AI_USAGE_INPUT_TOKENS and
SpanAttributes.GEN_AI_USAGE_OUTPUT_TOKENS) when those values exist in
self.pending_usage, or if you must keep a total, write it under a Traceloop
custom namespace (e.g., "traceloop.gen_ai.usage.total_tokens") rather than using
SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS; update the code around the
span.set_attribute call where self.pending_usage["total_tokens"] is referenced
to perform these replacements and guard emissions behind existence checks.
- Around line 256-269: The span attribute GenAIAttributes.GEN_AI_OPERATION_NAME
is set to the non-canonical value "realtime"; update it to the canonical
OpenTelemetry GenAI operation name "chat" for OpenAI conversational realtime LLM
calls. Locate the span creation in start_audio_span (and the analogous realtime
span creation around lines 359-369) and change the attribute value from
"realtime" to "chat" so the attributes dict uses
GenAIAttributes.GEN_AI_OPERATION_NAME: "chat" while keeping other attributes and
span configuration unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1239d1f-34d5-4765-b850-b8d0c166fc83

📥 Commits

Reviewing files that changed from the base of the PR and between b6b02aa and 8190972.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py

@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1928-openai-agents-insturmentation branch from ac4d39d to 6b78202 Compare April 20, 2026 13:35
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (2)

323-333: ⚠️ Potential issue | 🟠 Major

Deduping by content text alone can drop legitimate later turns.

Switching to hash(content) over the full string (Line 327) fixes the earlier prefix-collision bug, but the dedupe key is still text-only. A valid later turn returning identical content (e.g., "OK", "Got it", "Yes") is silently dropped, and create_llm_span is never called — leaving the next unique completion to pair with a stale pending prompt and producing misaligned input/output spans.

Scope the dedupe by a turn/response identifier (e.g., response_id or item_id from the raw model event) so identical content across separate turns isn't conflated.

🔧 Proposed approach
-    def record_completion(self, role: str, content: str):
+    def record_completion(self, role: str, content: str, response_id: Optional[str] = None):
         """Record a completion message - creates an LLM span with prompt and completion."""
         if not content:
             return
-        content_hash = hash(content)
-        if content_hash in self._seen_completions:
+        dedupe_key = (response_id, hash(content)) if response_id else hash(content)
+        if dedupe_key in self._seen_completions:
             return
-        self._seen_completions[content_hash] = None
+        self._seen_completions[dedupe_key] = None
         if len(self._seen_completions) > self._seen_completions_max:
             self._seen_completions.popitem(last=False)
         self.create_llm_span(content)

Then thread the response/item id through the call sites in traced_put_event (e.g., response.done provides a response.id).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`
around lines 323 - 333, The dedupe currently uses only hash(content) in
record_completion which will drop legitimate later turns with identical text;
change the dedupe key to include the response/item identifier so identical
content from different turns aren’t conflated (e.g., use a tuple like
(response_id, content_hash) or (response_id, content) as the key in
self._seen_completions and keep the existing LRU eviction logic), and thread
that response/item id from traced_put_event (response.done -> response.id or
item_id) into the record_completion call so create_llm_span is invoked per
response instance.

268-269: ⚠️ Potential issue | 🟠 Major

gen_ai.operation.name = "realtime" is not a valid well-known value.

Both the audio span (Line 268) and the realtime LLM span (Line 367) still emit "realtime". The OTel GenAI semconv defines well-known values: chat, create_agent, embeddings, execute_tool, generate_content, invoke_agent, retrieval, text_completion. Since the OpenAI Realtime API is a multi-turn conversational surface, "chat" is the correct value (mirroring how _start_generation_span uses "chat" for ResponseSpanData per the test on Line 425/444 in test_tracing_processor.py).

Based on learnings: In the OpenTelemetry GenAI semantic conventions, gen_ai.operation.name uses standard predefined values such as "chat", "text_completion", "embeddings", etc. — NOT method-specific names.

OpenTelemetry GenAI semantic conventions gen_ai.operation.name well-known values realtime
🔧 Proposed fix
@@ Line 267-270 (start_audio_span)
             attributes={
-                GenAIAttributes.GEN_AI_OPERATION_NAME: "realtime",
+                GenAIAttributes.GEN_AI_OPERATION_NAME: "chat",
                 GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai",
             },
@@ Line 366-370 (create_llm_span)
             attributes={
-                GenAIAttributes.GEN_AI_OPERATION_NAME: "realtime",
+                GenAIAttributes.GEN_AI_OPERATION_NAME: "chat",
                 GenAIAttributes.GEN_AI_PROVIDER_NAME: "openai",
                 GenAIAttributes.GEN_AI_REQUEST_MODEL: model_name_str,
             },

Note: corresponding assertions in test_realtime_session.py / test_realtime.py will need updating accordingly.

Also applies to: 367-368

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`
around lines 268 - 269, Replace the invalid gen_ai.operation.name value
"realtime" with the well-known value "chat" where
GenAIAttributes.GEN_AI_OPERATION_NAME is set for the realtime audio span and the
realtime LLM span; update the assignments in the functions that create those
spans (e.g., the methods that start the audio span and the realtime-LLM span
where GenAIAttributes.GEN_AI_OPERATION_NAME is currently "realtime") to use
"chat" and adjust the corresponding assertions in test_realtime_session.py /
test_realtime.py to expect "chat".
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py (1)

416-419: Stale comment block — consider removing.

These three comment lines describe behavior that's no longer present in the surrounding code (no top-level gen_ai.response.finish_reasons is being conditionally set here). Either drop the comment or move it to where the omission actually happens, to avoid confusing future readers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`
around lines 416 - 419, Remove the stale three-line comment block in
_realtime_wrappers.py (the lines referencing "Realtime API does not provide
finish reasons" and top-level gen_ai.response.finish_reasons) because it no
longer matches the surrounding code; either delete these lines or move them to
the actual location where finish_reasons is conditionally omitted (refer to the
handling in _hooks.py) so comments match the implementation.
packages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py (1)

304-306: Assertion logic could be tightened for clarity.

assert "id" not in part or part["id"] is correct but a bit indirect — it allows both "key missing" and "key present with truthy value". Given the implementation (if tool_call_id: part["id"] = tool_call_id), the stronger and clearer assertion is simply assert "id" not in part, which directly encodes the spec intent ("id must be omitted when call_id is absent").

🔧 Proposed change
-        assert "id" not in part or part["id"], (
-            f"id must be absent or non-empty when call_id not provided, got: {part}"
-        )
+        assert "id" not in part, (
+            f"id must be omitted when call_id not provided, got: {part}"
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py`
around lines 304 - 306, The assertion in the test uses a permissive condition
(`assert "id" not in part or part["id"]`) which is indirect; update the test to
assert the stricter intended behavior by replacing that condition with a direct
check that the "id" key is absent (i.e., assert "id" not in part) so it matches
the implementation where `part["id"]` is only set when `tool_call_id` is truthy
and must be omitted otherwise; locate the assertion referencing `part` in the
test and change it accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_content_parts.py`:
- Around line 40-46: The test
test_data_url_no_mime_produces_blob_without_mime_type currently uses
"data:base64,abc123" which yields a non-empty mime extracted by _url_to_part;
update the test to use a data URL that produces an empty mime_part (e.g.
"data:,abc123" or "data:;base64,abc123") so the code path that omits mime_type
is exercised, and add an assertion that "mime_type" not in part (referencing
_dict_block and _url_to_part to locate the logic).

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 323-333: The dedupe currently uses only hash(content) in
record_completion which will drop legitimate later turns with identical text;
change the dedupe key to include the response/item identifier so identical
content from different turns aren’t conflated (e.g., use a tuple like
(response_id, content_hash) or (response_id, content) as the key in
self._seen_completions and keep the existing LRU eviction logic), and thread
that response/item id from traced_put_event (response.done -> response.id or
item_id) into the record_completion call so create_llm_span is invoked per
response instance.
- Around line 268-269: Replace the invalid gen_ai.operation.name value
"realtime" with the well-known value "chat" where
GenAIAttributes.GEN_AI_OPERATION_NAME is set for the realtime audio span and the
realtime LLM span; update the assignments in the functions that create those
spans (e.g., the methods that start the audio span and the realtime-LLM span
where GenAIAttributes.GEN_AI_OPERATION_NAME is currently "realtime") to use
"chat" and adjust the corresponding assertions in test_realtime_session.py /
test_realtime.py to expect "chat".

---

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py`:
- Around line 416-419: Remove the stale three-line comment block in
_realtime_wrappers.py (the lines referencing "Realtime API does not provide
finish reasons" and top-level gen_ai.response.finish_reasons) because it no
longer matches the surrounding code; either delete these lines or move them to
the actual location where finish_reasons is conditionally omitted (refer to the
handling in _hooks.py) so comments match the implementation.

In
`@packages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py`:
- Around line 304-306: The assertion in the test uses a permissive condition
(`assert "id" not in part or part["id"]`) which is indirect; update the test to
assert the stricter intended behavior by replacing that condition with a direct
check that the "id" key is absent (i.e., assert "id" not in part) so it matches
the implementation where `part["id"]` is only set when `tool_call_id` is truthy
and must be omitted otherwise; locate the assertion referencing `part` in the
test and change it accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2c3f06f6-66f0-4361-acaf-c99cdbd033ef

📥 Commits

Reviewing files that changed from the base of the PR and between b6b02aa and 1beb398.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai-agents/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (11)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_handoff_span_operation_name.yaml
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_span_operation_name.yaml
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_content_parts.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_finish_reasons.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_semconv_messages.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_tracing_processor.py
✅ Files skipped from review due to trivial changes (5)
  • packages/opentelemetry-instrumentation-openai-agents/pyproject.toml
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_handoff_span_operation_name.yaml
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_span_operation_name.yaml
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

@max-deygin-traceloop max-deygin-traceloop merged commit abbcc11 into main Apr 30, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants