Skip to content

fix(openai-agents): span attribute handling for tool calls and results#3422

Merged
doronkopit5 merged 4 commits intomainfrom
dk/openai-agents-tool-responses
Oct 29, 2025
Merged

fix(openai-agents): span attribute handling for tool calls and results#3422
doronkopit5 merged 4 commits intomainfrom
dk/openai-agents-tool-responses

Conversation

@doronkopit5
Copy link
Copy Markdown
Member

@doronkopit5 doronkopit5 commented Oct 26, 2025

image
  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Unified agent message normalization and added tests for tool call and result span attributes in OpenAI agents.

  • Behavior:
    • Unified normalization of agent messages in _hooks.py to consistently capture roles, message content, tool call IDs, and tool call details across OpenAI chat and Agents formats.
  • Tests:
    • Added test_tool_call_and_result_attributes in test_openai_agents.py to validate that agent spans record tool calls and tool results, including tool_call IDs and content.
    • New test cassette test_tool_call_and_result_attributes.yaml added to support the new test.

This description was created by Ellipsis for adbc139. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • New Features

    • Unified normalization of agent messages so telemetry consistently captures roles, message content (JSON-serialized when needed), tool call IDs, and tool call details (id/name/arguments) across OpenAI chat and Agents formats; nested tool call details are now extracted and emitted.
  • Tests

    • Added tests and cassette validating spans record tool calls and tool results (including tool_call IDs, names, arguments, and content).

- Refactored the OpenTelemetryTracingProcessor to support unified handling of message formats, including both standard OpenAI chat and OpenAI Agents SDK formats.
- Added logic to extract and set attributes for tool calls and results, ensuring proper recording in span attributes.
- Introduced a new test to validate the correct propagation of tool call and result attributes in spans.
- Added a YAML cassette for testing tool call interactions with the OpenAI API.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Oct 26, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Normalize OpenAI Agent messages into a unified dict format and emit hierarchical span attributes (LLM_PROMPTS.{i}.*) for role, content, tool_call_id, and tool_calls; add tests and a cassette verifying tool_call and tool result attributes in response spans.

Changes

Cohort / File(s) Summary
Message Normalization & Attribute Handling
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
Replace per-message ad-hoc handling with a unified normalization flow that converts messages to dicts, supports standard chat and OpenAI Agents SDK formats (function_call, function_call_output), and emits hierarchical attributes under LLM_PROMPTS.{i} for role, content, tool_call_id, and tool_calls. Normalizes tool_calls (id, name, arguments), JSON-stringifies non-string content, and applies logic uniformly across main and legacy paths.
Test Cassette
packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_call_and_result_attributes.yaml
Add cassette capturing an agent interaction sequence: initial /v1/responses request, trace ingest (/v1/traces/ingest) with function-call spans, and a final /v1/responses including function_call and function_call_output, plus headers and compressed payloads.
Test Suite Updates
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
Import function_tool from agents; add test_tool_call_and_result_attributes which asserts that response spans include assistant messages with tool_calls and subsequent tool result messages with tool_call_id and content. Note: the test appears duplicated in the file (duplicate insertion).

Sequence Diagram(s)

sequenceDiagram
    participant Agent as OpenAI Agent
    participant Span as Instrumentation Span
    participant Normalizer as Message Normalizer

    Agent->>Span: Response messages arrive
    Span->>Normalizer: on_span_end receives messages
    Normalizer->>Normalizer: Normalize each message to dict
    alt standard chat message
        Normalizer->>Normalizer: extract role, content
    else function_call (Agents SDK)
        Normalizer->>Normalizer: map to assistant role, build tool_calls (id,name,arguments)
    else function_call_output (tool result)
        Normalizer->>Normalizer: map to tool role, set tool_call_id + content
    end
    Normalizer->>Span: set attributes LLM_PROMPTS.{i}.role/content/tool_call_id/tool_calls
    note over Span: Tool calls normalized (id,name,args stringified) and emitted
    Span->>Span: emit final span attributes
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect _hooks.py normalization branches for correctness and edge cases (nested function objects, non-dict tool_call items).
  • Verify JSON/stringification behavior and consistent attribute naming (LLM_PROMPTS.{i}).
  • Review the new test and cassette for correctness and remove duplicate test entry.

Possibly related PRs

Suggested reviewers

  • nirga
  • galkleinman

Poem

🐰 Hop! I tidy messages, big and small,

I stitch their roles and tool calls into a hall.
IDs and args now line in neat rows,
Spans hum softly where the carrot grows.
A hopped-up test and cassette that glows. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "fix(openai-agents): span attribute handling for tool calls and results" directly and comprehensively captures the primary change in this pull request. The changeset is fundamentally about improving how span attributes are handled for tool calls and results in OpenAI agent interactions, which is exactly what the title conveys. The title is specific (identifying both the package scope and the exact aspect being fixed), avoids vague or generic language, includes no extraneous noise, and uses proper conventional commit formatting. A teammate scanning the repository history would immediately understand that this fix improves telemetry instrumentation for tool interactions in OpenAI agents.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dk/openai-agents-tool-responses

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 2ba1445 in 1 minute and 36 seconds. Click for details.
  • Reviewed 523 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:267
  • Draft comment:
    The branching for handling tool call messages based on the 'type' field is clear, but consider adding inline comments to emphasize how each branch maps to the respective OpenAI formats. This minor improvement could aid future maintainers.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_54Qm2El5iGf4Xcvn

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

content = message.content
prefix = f"{SpanAttributes.LLM_PROMPTS}.{i}"

# Convert message to dict for unified handling
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting the object-to-dict conversion (lines 244–253) into a helper function to improve readability and reduce duplication. Using something like vars(message) or a dedicated converter might simplify the logic.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

452-471: Apply normalization logic to legacy fallback path for consistency across all span types.

The legacy fallback path (lines 452-500+) handles AgentSpanData, HandoffSpanData, and FunctionSpanData but only extracts role and content from messages, missing tool_call_id and tool_calls that the main normalization path (lines 244-335) handles. This creates inconsistent span attributes across different span data types. Apply the same message normalization and tool extraction logic from lines 244-335 to the legacy fallback block.

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (2)

242-253: Consider adding error handling for attribute extraction.

The attribute extraction loop uses hasattr() and getattr() to convert message objects to dicts. If attribute access raises an exception (e.g., property with side effects), the code will fail. Consider wrapping in a try-except to handle unexpected attribute access errors gracefully.

</parameter_end -->


260-282: Silent fallthrough when message format is unrecognized.

If a message has neither 'role' nor 'type' keys, the code silently skips it without setting role, content, tool_call_id, or tool_calls. This could make debugging difficult if unexpected message formats appear.

Consider logging a debug message when an unrecognized message format is encountered:

                        elif 'type' in msg:
                            # OpenAI Agents SDK format
                            msg_type = msg['type']
                            if msg_type == 'function_call':
                                # ... existing code ...
                            elif msg_type == 'function_call_output':
                                # ... existing code ...
+                       else:
+                           # Unrecognized format - log for debugging
+                           import logging
+                           logging.debug(f"Unrecognized message format: {msg.keys()}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a693d39 and 2ba1445.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_call_and_result_attributes.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_call_and_result_attributes.yaml
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (3)
packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (1)
  • exporter (55-65)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
  • get_finished_spans (40-43)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Lint
🔇 Additional comments (4)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

299-335: Robust tool_calls normalization and attribute population.

The code handles multiple tool_call formats comprehensively:

  • Converts objects to dicts by extracting attributes
  • Handles nested function details (standard OpenAI format)
  • Flattens to a consistent structure
  • Properly JSON-stringifies non-string arguments

The implementation correctly accommodates both OpenAI SDK format (nested function) and direct format.

packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_call_and_result_attributes.yaml (1)

202-203: Verify cookie values are scrubbed or expired.

The cassette contains cookie values (__cf_bm and _cfuvid) that could be sensitive session tokens. Ensure these are either scrubbed placeholders or naturally expired tokens before committing.

As per coding guidelines.

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2)

6-6: Import addition looks good.

The function_tool import is correctly added and used in the new test case below.


459-543: Comprehensive test for tool_call and tool_result attributes.

The test validates:

  • Tool call attributes (name, arguments) in assistant messages
  • Tool result attributes (content, tool_call_id) in tool messages
  • Both attributes are found in the second response span's conversation history

The test logic is thorough and correctly structured. The iteration limit of 20 messages should be sufficient for typical agent conversations.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed b0d7b2b in 45 seconds. Click for details.
  • Reviewed 52 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:250
  • Draft comment:
    Improved readability: converting the inline list of attribute names to a multi-line format makes the code clearer. Ensure that all necessary attributes are still captured and no key is accidentally omitted.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py:467
  • Draft comment:
    Refactored get_city_info: Using multi-line string literals via implicit concatenation improves readability. Verify that the resulting string maintains the intended format without unintended whitespace changes.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_IJUIrfUag1j34B6x

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

279-286: Critical: Missing 'id' field in tool_calls created from function_call format.

When converting function_call messages to the tool_calls structure, the code creates a dict with only name and arguments but omits the id field. However, line 337 expects tool_call.get('id') to exist. This means tool call IDs won't be recorded in span attributes for messages in the function_call format, resulting in incomplete observability data.

This issue was previously flagged and remains unresolved.

Apply this diff to include the call_id:

                             elif msg_type == 'function_call':
                                 # Tool calls are assistant messages
                                 role = 'assistant'
                                 # Create tool_calls structure matching OpenAI SDK format
                                 tool_calls = [{
+                                    'id': msg.get('call_id', ''),
                                     'name': msg.get('name', ''),
                                     'arguments': msg.get('arguments', '')
                                 }]
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

502-502: Replace magic number with a documented constant.

The value 20 appears arbitrary. Consider extracting it as a named constant (e.g., MAX_CONVERSATION_HISTORY_ENTRIES = 20) with a comment explaining why this limit is chosen, improving code maintainability.

Apply this diff:

+    # Maximum conversation history entries to check (sufficient for typical tool call scenarios)
+    MAX_CONVERSATION_HISTORY_ENTRIES = 20
+
     # The tool call and result appear in the SECOND response span as part of conversation history
     # Find the assistant message with tool call
     tool_call_found = False
     tool_result_found = False
 
-    for i in range(20):  # Check conversation history
+    for i in range(MAX_CONVERSATION_HISTORY_ENTRIES):
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

244-262: Consider extracting message-to-dict conversion to a helper function.

The object-to-dict conversion logic is repeated inline and could benefit from extraction to a dedicated helper function (e.g., _message_to_dict(message)) for improved readability and reusability. This aligns with feedback from previous reviews.

Example helper function:

def _message_to_dict(message) -> dict:
    """Convert a message object to a dict for unified processing."""
    if isinstance(message, dict):
        return message
    
    msg = {}
    for attr in [
        "role", "content", "tool_call_id", "tool_calls",
        "type", "name", "arguments", "call_id", "output",
    ]:
        if hasattr(message, attr):
            msg[attr] = getattr(message, attr)
    return msg

Then replace lines 244-262 with:

-                        # Convert message to dict for unified handling
-                        if isinstance(message, dict):
-                            msg = message
-                        else:
-                            # Convert object to dict
-                            msg = {}
-                            for attr in [
-                                "role",
-                                "content",
-                                "tool_call_id",
-                                "tool_calls",
-                                "type",
-                                "name",
-                                "arguments",
-                                "call_id",
-                                "output",
-                            ]:
-                                if hasattr(message, attr):
-                                    msg[attr] = getattr(message, attr)
+                        msg = self._message_to_dict(message)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2ba1445 and b0d7b2b.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
🧬 Code graph analysis (2)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (3)
packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (1)
  • exporter (55-65)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
  • get_finished_spans (40-43)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
  • SpanAttributes (64-261)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Lint
🔇 Additional comments (4)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2)

6-6: LGTM: Import addition supports new test.

The function_tool import is appropriately added to support the new test case.


459-550: Well-structured test with comprehensive assertions.

The test effectively validates that tool calls and tool results are properly recorded in span attributes. The logic correctly identifies the second response span (which contains conversation history) and verifies both tool call attributes (name, arguments) and tool result attributes (content, tool_call_id).

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (2)

293-305: LGTM: Proper attribute handling with correct null checks.

The role, content, and tool_call_id attribute handling is well-implemented. The check for content is not None (line 298) correctly allows empty strings while preventing null values, and JSON encoding for non-string content ensures proper serialization.


307-345: Well-structured tool_calls processing with comprehensive format support.

The tool_calls handling logic correctly converts various object formats to dicts, extracts nested function details, and sets appropriate span attributes. The JSON encoding for non-string arguments (lines 342-344) is appropriate.

Note: This logic depends on the id field being present in tool_calls. Ensure the critical issue at lines 283-286 is addressed so that tool call IDs are properly recorded for all message formats.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed adbc139 in 1 minute and 35 seconds. Click for details.
  • Reviewed 269 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py:284
  • Draft comment:
    Added the 'id' attribute for tool calls. Consider adding an inline comment explaining that this field is used to correlate tool call requests with their results. Also, if a tool call message might sometimes lack an 'id', you might consider generating a fallback (instead of defaulting to empty string) to satisfy test assertions expecting a non‐empty value.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment has two parts. The first part about adding documentation is purely informative and not actionable. The second part about generating a fallback ID is speculative - it assumes there might be test assertions expecting non-empty values, but there's no evidence of this. The code is clearly trying to match OpenAI SDK format, and empty string is a valid default there. The suggestion about fallback IDs could be valid if there are indeed test assertions requiring non-empty values. I may be too quick to dismiss this possibility. Without seeing actual test failures or requirements around non-empty IDs, making this change would be premature optimization. The current code matches the OpenAI SDK format which allows empty IDs. The comment should be deleted as it combines non-actionable documentation requests with speculative suggestions about potential test issues that aren't evidenced.
2. packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py:519
  • Draft comment:
    The test now verifies that the tool call ID attribute exists and is non-empty. This helps ensure consistency between the instrumentation and expected behavior. No changes needed, but double-check that all tool call messages consistently provide an 'id'.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_call_and_result_attributes.yaml:294
  • Draft comment:
    The test cassette now includes tool call IDs in the recorded interactions. Ensure that these cassette values remain stable across runs, or consider marking them as dynamic if they can change.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_CT9NorqIp4rWSaQK

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1)

244-263: Consider extracting object-to-dict conversion to a helper function.

The manual attribute extraction is verbose and reduces readability. Extracting this logic into a helper function would improve maintainability and make the code more testable.

This aligns with the previous review suggestion from ellipsis-dev[bot].

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (2)

310-326: Consider simplifying tool_call object-to-dict conversion.

The nested conditionals for converting tool_call objects to dictionaries are complex and could be more maintainable. Consider extracting this logic into a small helper function.

Example helper function:

def _tool_call_to_dict(tool_call):
    """Convert a tool_call object to dict format."""
    if isinstance(tool_call, dict):
        return tool_call
    
    tc_dict = {}
    if hasattr(tool_call, 'id'):
        tc_dict['id'] = tool_call.id
    if hasattr(tool_call, 'function'):
        func = tool_call.function
        if hasattr(func, 'name'):
            tc_dict['name'] = func.name
        if hasattr(func, 'arguments'):
            tc_dict['arguments'] = func.arguments
    elif hasattr(tool_call, 'name'):
        tc_dict['name'] = tool_call.name
    if hasattr(tool_call, 'arguments'):
        tc_dict['arguments'] = tool_call.arguments
    return tc_dict

This would simplify the loop to:

for j, tool_call in enumerate(tool_calls):
    tool_call = _tool_call_to_dict(tool_call)
    # ... rest of processing

463-481: Legacy path doesn't use unified message normalization.

The legacy fallback path (lines 463-481) uses simpler role/content extraction and doesn't leverage the new unified normalization logic introduced in lines 244-293. This creates inconsistent message handling between response spans and other span types.

If legacy spans also need to support tool calls and the OpenAI Agents SDK format, consider reusing the normalization logic. If they don't, adding a comment explaining why the simpler handling is appropriate would improve clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b0d7b2b and adbc139.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py (1 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_call_and_result_attributes.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_tool_call_and_result_attributes.yaml
  • packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/_hooks.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Lint

@doronkopit5 doronkopit5 merged commit 049f7e3 into main Oct 29, 2025
12 checks passed
@doronkopit5 doronkopit5 deleted the dk/openai-agents-tool-responses branch October 29, 2025 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants