fix(openai-agents): fix broken traces with agents handoff on run_stream by galkleinman · Pull Request #3143 · traceloop/openllmetry

galkleinman · 2025-07-16T08:09:03Z

Important

Enhances OpenAI Agents SDK with improved tracing, multi-agent handoffs, and expanded test coverage, including a new sample app for recipe management.

Behavior:
- Enhanced tracing and telemetry for OpenAI Agent workflows, supporting streaming execution and multi-agent handoffs.
- Added a sample streaming chat application for AI-powered recipe search and editing with multi-agent collaboration.
- Expanded test coverage for agent handoffs and function tool usage.
Models:
- Added Recipe, SearchResponse, EditResponse, and RecipeModificationResult models in openai_agents_example.py.
Functions:
- Added plan_and_apply_recipe_modifications() and search_recipes() functions in openai_agents_example.py.
- Implemented JSONEncoder in utils.py for customized JSON serialization.
Tests:
- Added test_recipe_workflow_agent_handoffs_with_function_tools.yaml for testing agent handoffs.
- Updated conftest.py with fixtures for recipe workflow agents.
- Enhanced test cases in test_openai_agents.py for new functionalities.
Misc:
- Updated dependencies in pyproject.toml to include openai-agents and local OpenTelemetry packages.

^{This description was created by}^{for 1673c03. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Introduced enhanced tracing and telemetry for OpenAI Agent workflows, including support for streaming execution and multi-agent handoffs with improved trace continuity.
- Added a sample streaming chat application showcasing AI-powered recipe search and editing with multi-agent collaboration and real-time output.
- Expanded test coverage for complex agent handoffs and function tool usage in multi-agent scenarios.
- Added a new multi-agent recipe management workflow with function tools for searching and modifying recipes.
- Implemented customized JSON serialization to improve handling of complex data structures during telemetry capture.
Bug Fixes
- Improved trace linkage and context propagation to ensure accurate parent-child relationships across agent runs and tool invocations.
Chores
- Updated sample app dependencies to include the latest OpenAI Agents SDK and local OpenTelemetry instrumentation package.

coderabbitai · 2025-07-16T11:45:17Z

Warning

Rate limit exceeded

@galkleinman has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 14 minutes and 42 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between e93d8ea and 1673c03.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (10 hunks)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (6 hunks)

Walkthrough

The changes introduce advanced tracing and streaming support for OpenAI Agents SDK instrumentation, including thread-local span management for trace continuity, enhanced tool and agent telemetry, and comprehensive multi-agent workflow testing. A new streaming recipe management chat application is added as a sample, featuring multi-agent handoffs, function tool integration, and OpenAI-powered recipe editing. Additionally, JSON serialization is improved with a custom encoder handling dataclasses and special cases.

Changes

File(s)	Change Summary
.../openai_agents/init.py	Added async streaming execution tracing wrapper with thread-local root span management; enhanced input/output attribute extraction; wrapped function tool invocations with spans; updated instrumentation/uninstrumentation methods.
.../tests/conftest.py	Added `recipe_workflow_agents` fixture with multi-agent recipe workflow and function tools; enhanced `clear_exporter` to reset global root span storage and instrumented tools registry between tests.
.../tests/test_openai_agents.py	Reformatted assertions to single-line style; reduced expected span counts; added async test for multi-agent handoffs with function tool tracing and detailed span attribute verification.
.../sample-app/pyproject.toml	Added `openai-agents` dependency and local development dependency for `opentelemetry-instrumentation-openai-agents`.
.../sample_app/openai_agents_example.py	Added streaming chat sample app for recipe management: Pydantic models, recipe search and modification async function tools, multi-agent setup with handoffs, streaming output handling, and main runner.
.../openai_agents/utils.py	Added `JSONEncoder` class to customize JSON serialization for dataclasses, objects with `to_json`/`json` methods, and special dictionary keys.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant MainChatAgent
    participant RecipeEditorAgent
    participant FunctionTool
    participant OpenAIAPI

    User->>MainChatAgent: User query (e.g. "Make carbonara vegetarian")
    MainChatAgent->>RecipeEditorAgent: Handoff with context
    RecipeEditorAgent->>FunctionTool: plan_and_apply_recipe_modifications
    FunctionTool->>OpenAIAPI: ChatCompletion (modification plan)
    OpenAIAPI-->>FunctionTool: Plan response
    FunctionTool-->>RecipeEditorAgent: Modified recipe result
    RecipeEditorAgent-->>MainChatAgent: Edited recipe output
    MainChatAgent-->>User: Streaming assistant response

Poem

In the garden of code, where the agents convene,
Spans now flow like rivers, linking every scene.
Recipes get edited, tools join the dance,
Streaming and handoffs—rabbits hop with chance!
Trace the journey, byte by byte—
Telemetry carrots, shining bright! 🥕

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to dc1247b in 3 minutes and 16 seconds. Click for details.

Reviewed 2815 lines of code in 6 files
Skipped 2 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/tests/test_openai_agents.py:39

Draft comment:
Instead of comparing span.kind with itself (e.g. span.kind == span.kind.CLIENT), use the constant from opentelemetry (e.g. SpanKind.CLIENT) for clarity and correctness.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. packages/tests/test_openai_agents.py:110

Draft comment:
Similarly, when verifying tool spans, avoid using 'tool_span.kind == tool_span.kind.INTERNAL'. Instead, use the Opentelemetry constant (SpanKind.INTERNAL).
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py:49

Draft comment:
Consider whether directly clearing internal globals (_root_span_storage and _instrumented_tools) in tests is acceptable; if these are internal details, they may need to be encapsulated to avoid accidental misuse.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

4.

packages/opentelemetry-instrumentation-openai-agents/tests/cassettes/test_openai_agents/test_recipe_workflow_agent_handoffs_with_function_tools.yaml:5

Draft comment:
Typo: The phrase "Recipe Editor Agent agent" appears redundant. Consider changing it to "Recipe Editor Agent".
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Looking at line 5, there is indeed a redundant "agent" in "Recipe Editor Agent agent". However, this is a minor text issue in a function description. The rules state not to make comments that are obvious or unimportant. This seems like a very minor stylistic issue that doesn't affect functionality. The redundancy could theoretically cause confusion for developers reading the API documentation. Maybe it's worth fixing for clarity? While clear documentation is important, this is an extremely minor text issue that doesn't impact functionality or understanding. The meaning is still clear despite the redundancy. This comment should be deleted as it points out a trivial text issue that doesn't meaningfully impact the code or documentation.

5. packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py:193

Draft comment:
Typographical suggestion: Consider changing the message from "Found 1 recipes matching "{query}"" to "Found 1 recipe matching "{query}"" to maintain number agreement.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a minor grammatical issue in a test file. While technically correct, it's a very minor change that doesn't affect functionality. The message is only used for testing purposes and doesn't impact production code or user experience. According to the rules, we should not make purely informative comments or comments about unimportant issues. The grammar is technically incorrect, and fixing it would make the code more polished. Some might argue that maintaining proper grammar everywhere helps maintain code quality. While correct grammar is good, this is a test file and the message is only used for testing. The benefit of fixing this is minimal and doesn't justify the overhead of a PR comment. Delete this comment as it points out an unimportant grammatical issue in test code that doesn't impact functionality or readability significantly.

Workflow ID: wflow_EmuMroB15GV0pCcj

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-07-16T15:34:05Z

+                    if content_parts:
+                        print("".join(content_parts), end="", flush=True)
+
+            elif event.name == "handoff_occured":


Typographical error: 'handoff_occured' is misspelled. Consider changing it to 'handoff_occurred' to reflect correct spelling.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (1)
21-21: Consider using modern type union syntax.

Since the project requires Python >=3.10 (as specified in pyproject.toml), you can use the modern union syntax with | instead of Union.
-from typing import List, Dict, Union
+from typing import List, Dict
Then update the type annotations:
-        recipes: Union[Dict[str, Recipe], None] = None
-        recipe_count: Union[int, None] = None
-        query: Union[str, None] = None
+        recipes: Dict[str, Recipe] | None = None
+        recipe_count: int | None = None
+        query: str | None = None
-        modified_recipe: Union[Recipe, None] = None
-        changes_made: Union[List[str], None] = None
-        original_recipe: Union[Recipe, None] = None
+        modified_recipe: Recipe | None = None
+        changes_made: List[str] | None = None
+        original_recipe: Recipe | None = None
Also applies to: 161-163, 168-170
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)
299-302: Remove or use the unused loop variable.

The loop variable event is not used within the loop body.
-        async for event in recipe_runner.stream_events():
-            pass  # Just consume the events to let the execution complete
+        async for _ in recipe_runner.stream_events():
+            pass  # Just consume the events to let the execution complete
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (2)
132-132: Simplify .get() calls as suggested by static analysis.

The default value None is redundant when using .get().
-    root_span = _root_span_storage.get(thread_id, None)
+    root_span = _root_span_storage.get(thread_id)
Also applies to: 292-292, 527-527

163-188: Consider extracting common input capture logic.

The input capture logic is duplicated between _wrap_agent_run_streamed and _wrap_agent_run. Consider extracting this into a shared helper function to improve maintainability.

Would you like me to provide a refactored version that extracts the common input/output capture logic into reusable helper functions?

Also applies to: 323-347
packages/sample-app/sample_app/openai_agents_example.py (1)
579-587: Simplify nested if statements.

The nested if statements can be combined for better readability.
-        if event.type == "raw_response_event":
-            if isinstance(event.data, ResponseTextDeltaEvent):
-                # Stream the assistant's token delta
-                print(event.data.delta, end="", flush=True)
-            elif isinstance(event.data, ResponseOutputItemAddedEvent):
-                if isinstance(event.data.item, ResponseFunctionToolCall):
-                    # Tool call metadata becomes available here
-                    print(f"\n[Calling tool: {event.data.item.name}]\n")
+        if event.type == "raw_response_event":
+            if isinstance(event.data, ResponseTextDeltaEvent):
+                # Stream the assistant's token delta
+                print(event.data.delta, end="", flush=True)
+            elif isinstance(event.data, ResponseOutputItemAddedEvent) and isinstance(event.data.item, ResponseFunctionToolCall):
+                # Tool call metadata becomes available here
+                print(f"\n[Calling tool: {event.data.item.name}]\n")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e22a488 and dc1247b.

⛔ Files ignored due to path filters (2)

package-lock.json is excluded by !**/package-lock.json
packages/sample-app/poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (7 hunks)
packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (3 hunks)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2 hunks)
packages/sample-app/pyproject.toml (2 hunks)
packages/sample-app/sample_app/openai_agents_example.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (1)

packages/sample-app/sample_app/openai_agents_example.py (5)

Recipe (46-53)

SearchResponse (65-70)

EditResponse (56-62)

search_recipes (357-484)

plan_and_apply_recipe_modifications (255-353)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (3)

packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (2)

exporter (27-37)

recipe_workflow_agents (145-249)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

TraceloopSpanKindValues (269-274)

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

301-301: Loop control variable event not used within loop body

(B007)

packages/sample-app/sample_app/openai_agents_example.py

583-584: Use a single if statement instead of nested if statements

(SIM102)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py

132-132: Use _root_span_storage.get(thread_id) instead of _root_span_storage.get(thread_id, None)

Replace _root_span_storage.get(thread_id, None) with _root_span_storage.get(thread_id)

(SIM910)

292-292: Use _root_span_storage.get(thread_id) instead of _root_span_storage.get(thread_id, None)

Replace _root_span_storage.get(thread_id, None) with _root_span_storage.get(thread_id)

(SIM910)

527-527: Use _root_span_storage.get(thread_id) instead of _root_span_storage.get(thread_id, None)

Replace _root_span_storage.get(thread_id, None) with _root_span_storage.get(thread_id)

(SIM910)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test Packages (3.12)

🔇 Additional comments (7)

packages/sample-app/pyproject.toml (1)

58-58: LGTM! Dependencies properly configured.

The addition of openai-agents package and the local development dependency for the instrumentation package are correctly configured to support the new OpenAI Agents SDK integration.

Also applies to: 97-99

packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (2)

49-52: Excellent addition for test isolation!

Clearing the global span storage and instrumented tools between tests ensures clean state and prevents test interference. This is a critical fix for reliable test execution.

144-249: Well-structured test fixture for multi-agent workflows.

The recipe_workflow_agents fixture provides comprehensive mock data and properly configured agents with function tools and handoff capabilities. The mock implementations of search_recipes and plan_and_apply_recipe_modifications are realistic and will enable thorough testing of the multi-agent tracing functionality.

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2)

275-398: Comprehensive test coverage for multi-agent workflows!

This test thoroughly validates:

Agent handoffs between Main Chat Agent and Recipe Editor Agent

Function tool invocations with proper span attributes

Parent-child span relationships

Trace continuity across multiple agents

Input/output attribute capture

The test structure and assertions provide excellent coverage for the new streaming and handoff instrumentation features.

101-101: Span count reduction verified as intentional

No TODO/FIXME comments or related references to span-count changes exist in the repository, and the current instrumentation emits exactly three spans (agent, tool, and underlying internal span) for the function-tool test. The updated assertion assert len(spans) == 3 correctly reflects the actual behavior.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)

108-270: Excellent implementation of streaming and trace continuity!

The implementation properly handles:

Thread-local root span storage for maintaining trace context across agent handoffs

Comprehensive attribute capture for both streaming and non-streaming execution

Proper error handling and status setting

Metrics recording for duration and token usage

This effectively solves the broken traces issue with agent handoffs.

Also applies to: 272-428

packages/sample-app/sample_app/openai_agents_example.py (1)

1-695: Excellent sample application demonstrating multi-agent workflows!

This comprehensive example effectively showcases:

Multi-agent architecture with specialized roles

Function tools for recipe search and AI-powered modifications

Streaming responses with proper event handling

Agent handoffs based on conversation context

Well-structured data models using Pydantic

The application serves as a valuable reference for users implementing similar multi-agent systems with the OpenAI Agents SDK.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 6ea260a in 2 minutes and 33 seconds. Click for details.

Reviewed 463 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:123

Draft comment:
Consider using a context-based solution (e.g. contextvars) instead of threading.get_ident() for storing the root span. Relying on thread IDs may be problematic in asynchronous scenarios where the thread context can change.
Reason this comment was not posted:
Comment was on unchanged code.

2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:166

Draft comment:
Avoid using eval() to parse input data. This approach can be a security risk if untrusted data is processed; consider replacing it with a safer alternative such as json.loads().
Reason this comment was not posted:
Comment was on unchanged code.

3. packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py:364

Draft comment:
The new trace ID consistency check using the 'all_trace_ids' set in the handoff test is a solid improvement. It robustly ensures that all spans belong to the same trace.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_uaqoait34LkSeMVP

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (2)
103-245: Well-structured streaming wrapper with security concern.

The new streaming wrapper function provides comprehensive tracing for streaming execution with excellent thread-local context management and detailed attribute extraction.

However, there's a critical security issue:

Avoid using eval() for security reasons.

Using eval() on string input is a security risk and should be avoided. Consider using json.loads() or ast.literal_eval() for safer parsing.
-                                    item_dict = eval(item) if isinstance(item, str) else item
+                                    item_dict = json.loads(item) if isinstance(item, str) else item
The function excels in:

Thread-local root span management

Comprehensive input/output serialization

Proper error handling and status setting

Metrics recording with detailed attributes

262-359: Enhanced wrapper with thread-local storage support and minor issues.

The wrapper now properly supports thread-local root span storage for trace continuity across handoffs, with improved input/output attribute handling.

Issues to address:

Avoid using eval() for security reasons.

Using eval() on string input is a security risk. Consider using json.loads() or ast.literal_eval() for safer parsing.

Simplify dictionary get() call:
-    root_span = _root_span_storage.get(thread_id, None)
+    root_span = _root_span_storage.get(thread_id)
The enhanced functionality is excellent:

Thread-local root span context propagation

Comprehensive input/output serialization

Proper context management for trace continuity

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)
479-566: Excellent tool instrumentation with thread-local context support.

The enhanced tool details extraction provides comprehensive instrumentation with proper thread-local context management and duplicate prevention.

Simplify dictionary get() call:
-    root_span = _root_span_storage.get(thread_id, None)
+    root_span = _root_span_storage.get(thread_id)
Key improvements:

Thread-local root span context propagation

Registry-based duplicate prevention for tool instrumentation

Proper wrapping of FunctionTool execution with spans

Immediate span creation for hosted tools with detailed attributes

Maintained parent-child relationships through context management

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc1247b and 6ea260a.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (7 hunks)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

297-297: Loop control variable event not used within loop body

(B007)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py

125-125: Use _root_span_storage.get(thread_id) instead of _root_span_storage.get(thread_id, None)

Replace _root_span_storage.get(thread_id, None) with _root_span_storage.get(thread_id)

(SIM910)

263-263: Use _root_span_storage.get(thread_id) instead of _root_span_storage.get(thread_id, None)

Replace _root_span_storage.get(thread_id, None) with _root_span_storage.get(thread_id)

(SIM910)

481-481: Use _root_span_storage.get(thread_id) instead of _root_span_storage.get(thread_id, None)

Replace _root_span_storage.get(thread_id, None) with _root_span_storage.get(thread_id)

(SIM910)

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

101-101: LGTM: Updated span count expectation aligns with instrumentation changes.

The reduction from 4 to 3 expected spans correctly reflects the optimized span management introduced by the new thread-local root span storage system.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (3)

5-5: LGTM: Essential imports and global state for thread-local span management.

The new imports and global variables are well-designed for the enhanced instrumentation:

threading for thread identification

context and set_span_in_context for proper span context propagation

_root_span_storage for thread-local root span tracking

_instrumented_tools for preventing duplicate tool instrumentation

Also applies to: 8-8, 10-10, 29-30

80-81: LGTM: Proper cleanup of new instrumentation components.

The uninstrumentation method correctly unwraps the new streaming method and clears the instrumented tools registry, ensuring proper cleanup.

67-76: LGTM: Proper registration of streaming method wrapper.

The wrapper registration for _run_single_turn_streamed follows the established pattern and correctly integrates the new streaming instrumentation.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 7d221ab in 1 minute and 11 seconds. Click for details.

Reviewed 31 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:122

Draft comment:
Simplify dict.get usage: specifying None is redundant since get() returns None by default.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:166

Draft comment:
Replaced eval with json.loads to safely parse JSON strings and avoid security risks.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining a change that was made. It does not provide a suggestion, ask for confirmation, or point out a potential issue. According to the rules, purely informative comments should be removed.

3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:300

Draft comment:
Using json.loads instead of eval enhances security by preventing arbitrary code execution.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining why json.loads is preferred over eval. It doesn't suggest a change or ask for confirmation about the code's intention.

Workflow ID: wflow_t2llUm84pNJaRuDo

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (2)
262-269: Thread-local span management implemented correctly.

The root span storage logic ensures trace continuity. Minor suggestion: line 263 can be simplified.

Apply this diff to simplify the code (as suggested by static analysis):
-    root_span = _root_span_storage.get(thread_id, None)
+    root_span = _root_span_storage.get(thread_id)
478-531: Excellent refactoring for FunctionTool instrumentation.

The implementation:

Correctly prevents duplicate instrumentation using _instrumented_tools set

Properly wraps async tool invocations with tracing spans

Maintains context propagation from root span

Captures input/output for observability

Minor suggestion for line 481 (from static analysis).

Apply this diff to simplify:
-    root_span = _root_span_storage.get(thread_id, None)
+    root_span = _root_span_storage.get(thread_id)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea260a and 7d221ab.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (7 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py

263-263: Use _root_span_storage.get(thread_id) instead of _root_span_storage.get(thread_id, None)

Replace _root_span_storage.get(thread_id, None) with _root_span_storage.get(thread_id)

(SIM910)

481-481: Use _root_span_storage.get(thread_id) instead of _root_span_storage.get(thread_id, None)

Replace _root_span_storage.get(thread_id, None) with _root_span_storage.get(thread_id)

(SIM910)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.9)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (4)

67-76: LGTM! Streaming instrumentation added correctly.

The wrapper for _run_single_turn_streamed follows the established pattern and properly passes the required parameters.

80-82: Proper cleanup in uninstrumentation.

Good addition of clearing the _instrumented_tools set to prevent memory leaks and ensure clean re-instrumentation.

103-245: Well-implemented streaming wrapper with proper trace continuity.

The implementation correctly:

Uses thread-local storage for maintaining root span context across async operations

Handles various input/output data formats with appropriate serialization

Records duration metrics and sets proper span attributes

Uses safe json.loads() for parsing (not eval())

532-567: Proper handling of hosted tools with immediate span creation.

The implementation correctly:

Creates and ends spans immediately for hosted tools

Sets tool-specific attributes for observability

Maintains trace context from root span

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 76dcbcb in 2 minutes and 33 seconds. Click for details.

Reviewed 652 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 6 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:124

Draft comment:
Consider cleaning up the stored root span (in _root_span_storage) once the span’s work is complete. In long-running applications, keeping stale span references may lead to memory leaks. A finally block to remove the entry after span completion might be beneficial.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:158

Draft comment:
The change converting single-quoted attribute names to double quotes improves consistency. Ensure that this styling change is aligned with project conventions and does not inadvertently alter expected key names.
Reason this comment was not posted:
Confidence changes required: 20% <= threshold 50% None

3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:136

Draft comment:
The use of trailing commas in multi-line dictionary literals enhances readability and minimizes diff noise. This looks good; ensure consistency throughout the code base.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

4. packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py:56

Draft comment:
The reformatting of assert statements to remove redundant parentheses improves readability. Verify that all assertions evaluate correctly.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

5. packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py:257

Draft comment:
In the test for recipe workflow agent handoffs, the expectation of exactly 3 'Recipe Editor Agent.agent' spans enforces a specific behavior. Consider abstracting these repeated span attribute checks into helper methods for better maintainability as the handoff logic evolves.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

6. packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py:215

Draft comment:
It appears that the assertion starting on this line is incomplete. The diff shows an unfinished expression (e.g., a missing attribute key or closing bracket) at line 216. Please complete this assertion to ensure the test is valid.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_RsqQwqBuQ6KeAjp5

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)
226-366: Comprehensive test coverage with unused loop variable issue.

This test provides excellent coverage of the multi-agent workflow with streaming, handoffs, and function tools. The assertions thoroughly validate span attributes, relationships, and trace continuity.

However, there's an unused loop variable that should be addressed:
-        async for event in recipe_runner.stream_events():
-            pass
+        async for _ in recipe_runner.stream_events():
+            pass
The test design is well-structured and validates key aspects:

Span counts and names

Agent and tool attributes

Parent-child relationships

Trace ID continuity across handoffs

Proper status codes

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d221ab and 76dcbcb.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (10 hunks)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (7 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

250-250: Loop control variable event not used within loop body

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.9)

🔇 Additional comments (12)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (6)

57-68: Assertion refactoring improves readability.

The conversion from multi-line span attribute checks to single-line assertions makes the test code more concise and easier to read while maintaining the same validation logic.

97-105: Assertions correctly validate tool attributes.

The single-line assertions properly validate the tool name, type, and schema attributes for the FunctionTool.

134-139: Tool type validation is correct.

The assertions properly validate WebSearchTool attributes and confirm the absence of optional user_location attribute.

159-167: Handoff attribute parsing is properly tested.

The test correctly validates that handoff information is stored as JSON strings and can be parsed back to verify the agent names and instructions.

196-214: Metrics validation is comprehensive.

The refactored assertions properly validate token usage and duration metrics with appropriate attribute checks.

82-82: Span count matches updated instrumentation

I’ve confirmed that the agent instrumentation now emits exactly three spans—one agent span, one tool span, and one OpenAI API (LLM) span—so reducing the test assertion from 4 to 3 is correct and no spans are missing.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (6)

102-255: Streaming wrapper implementation is comprehensive.

The new _wrap_agent_run_streamed function provides excellent support for streaming execution with proper context propagation and detailed input/output attribute extraction. The implementation correctly:

Maintains thread-local root span storage for trace continuity

Handles various input/output data formats with fallback error handling

Records metrics and maintains proper span lifecycle

The structured approach to handling different data types (model_dump, dict, string) with graceful fallbacks ensures robustness across various agent configurations.

272-396: Enhanced agent run wrapper maintains trace continuity.

The modifications to _wrap_agent_run properly implement thread-local context propagation to ensure trace continuity across agent handoffs. The input/output attribute extraction logic mirrors the streaming wrapper for consistency.

The thread-aware context management ensures that child spans maintain proper parent-child relationships even in complex multi-agent workflows.

489-602: Tool instrumentation provides comprehensive telemetry.

The enhanced extract_tool_details function properly wraps FunctionTool execution with tracing spans while creating immediate spans for hosted tools. The implementation correctly:

Avoids duplicate instrumentation through _instrumented_tools tracking

Maintains proper context propagation for tool spans

Captures detailed attributes for different tool types

Handles both sync and async tool execution patterns

The closure-based wrapping pattern for FunctionTool ensures proper scope capture while maintaining the original tool interface.

6-6: Threading import supports new thread-aware features.

The addition of the threading import is necessary for the new thread-local span management functionality.

9-11: Context imports enable proper span propagation.

The additional imports for set_span_in_context and context are essential for the new thread-aware context propagation mechanisms.

67-80: Instrumentation registration includes streaming wrapper.

The addition of the streaming wrapper registration and cleanup in _uninstrument properly manages the new instrumentation lifecycle.

coderabbitai · 2025-07-17T16:13:25Z

+_root_span_storage = {}
+_instrumented_tools = set()


🛠️ Refactor suggestion

Global state management needs thread safety consideration.

The global dictionaries _root_span_storage and _instrumented_tools are accessed from multiple threads but lack explicit synchronization. While _root_span_storage uses thread ID as key (inherently thread-safe for access), _instrumented_tools could have race conditions during concurrent tool instrumentation.

Consider adding thread-safe access patterns:

+import threading + +_storage_lock = threading.Lock() _root_span_storage = {} _instrumented_tools = set()

Then protect critical sections:

- if tool_id in _instrumented_tools: - continue - _instrumented_tools.add(tool_id) + with _storage_lock: + if tool_id in _instrumented_tools: + continue + _instrumented_tools.add(tool_id)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_root_span_storage = {}

_instrumented_tools = set()

# At the top of the file, add the lock alongside the globals

import threading

_storage_lock = threading.Lock()

_root_span_storage = {}

_instrumented_tools = set()

# … rest of imports and code …

# Then in the tool-instrumentation section, replace the non-atomic check/add:

# if tool_id in _instrumented_tools:

# continue

# _instrumented_tools.add(tool_id)

# with a locked critical section:

with _storage_lock:

if tool_id in _instrumented_tools:

continue

_instrumented_tools.add(tool_id)

# … rest of the code …

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py around lines 30 to 31, the global variables _root_span_storage and _instrumented_tools are accessed concurrently without synchronization, risking race conditions especially on _instrumented_tools. To fix this, introduce a threading.Lock to protect access to _instrumented_tools and wrap all modifications and reads of this set within lock acquire/release blocks to ensure thread safety.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed bafbc91 in 1 minute and 36 seconds. Click for details.

Reviewed 518 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py:247

Draft comment:
Use '_' for the unused loop variable in async iteration to clearly indicate it's not needed.
Reason this comment was not posted:
Comment looked like it was already resolved.

2. packages/sample-app/sample_app/openai_agents_example.py:97

Draft comment:
Added trailing commas to list literals for consistency and better diffs in future edits.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining why trailing commas were added. It doesn't provide a suggestion or ask for confirmation about the code's functionality.

3. packages/sample-app/sample_app/openai_agents_example.py:256

Draft comment:
Flatten function parameters into a single line in plan_and_apply_recipe_modifications for conciseness without losing clarity.
Reason this comment was not posted:
Comment looked like it was already resolved.

4. packages/sample-app/sample_app/openai_agents_example.py:604

Draft comment:
Correct the event name typo from 'handoff_occured' to 'handoff_occurred' to match expected trigger names.
Reason this comment was not posted:
Comment looked like it was already resolved.

5. packages/sample-app/sample_app/openai_agents_example.py:304

Draft comment:
Consistent dictionary formatting with trailing commas in the messages list improves maintainability.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide a specific suggestion or ask for a change. It simply states a general benefit of a coding style without addressing a specific issue in the code.

Workflow ID: wflow_Ljko8daf2FsaXKhB

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

250-250: Use underscore for unused loop variable.

This issue has already been identified in a previous review. Replace _ with _ to indicate the variable is intentionally unused.
packages/sample-app/sample_app/openai_agents_example.py (1)
269-270: Consider using logging framework instead of print statements.

For production-ready sample applications, replace print statements with proper logging to allow better control over log levels and formatting.

Example refactor:
+import logging
+
+logger = logging.getLogger(__name__)
+
 @function_tool
 async def plan_and_apply_recipe_modifications(
     cw: RunContextWrapper[ChatContext], recipe: Recipe, modification_request: str
 ) -> EditResponse:
-    print(f"Planning and applying modifications for recipe: {recipe.name}")
-    print(f"Modification request: {modification_request}")
+    logger.info(f"Planning and applying modifications for recipe: {recipe.name}")
+    logger.info(f"Modification request: {modification_request}")
Also applies to: 474-474

🧹 Nitpick comments (1)

packages/sample-app/sample_app/openai_agents_example.py (1)

569-572: Simplify nested if statements.

Combine the nested conditions for cleaner code.

-            elif isinstance(event.data, ResponseOutputItemAddedEvent):
-                if isinstance(event.data.item, ResponseFunctionToolCall):
-                    print(f"\n[Calling tool: {event.data.item.name}]\n")
+            elif isinstance(event.data, ResponseOutputItemAddedEvent) and isinstance(event.data.item, ResponseFunctionToolCall):
+                print(f"\n[Calling tool: {event.data.item.name}]\n")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76dcbcb and bafbc91.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (7 hunks)
packages/sample-app/sample_app/openai_agents_example.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

packages/sample-app/sample_app/openai_agents_example.py

569-570: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.9)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (6)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (1)

57-57: Clean formatting improvements.

The simplified single-line assertions improve readability without changing functionality.

Also applies to: 60-60, 63-68, 82-82, 97-98, 104-104, 134-134, 139-139, 159-159, 164-164, 196-199, 206-206, 209-209, 212-212

packages/sample-app/sample_app/openai_agents_example.py (5)

1-23: Well-organized imports and proper initialization.

The imports are cleanly organized and Traceloop is correctly initialized with batch disabled for streaming support.

25-88: Well-structured Pydantic models.

The data models provide excellent type safety and clear API contracts for the recipe management system.

349-478: Excellent implementation with semantic search and fallback strategy.

The search function provides intelligent semantic search using AI with a robust keyword-based fallback, ensuring reliability even when AI parsing fails.

480-548: Well-designed agent architecture with clear separation of concerns.

The two-agent system effectively separates general conversation handling from specialized recipe management, with clear handoff triggers.

612-663: Clean implementation of the streaming chat workflow.

The main execution flow effectively demonstrates multi-agent handoffs and streaming responses with a practical recipe editing example.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed eb1377b in 1 minute and 47 seconds. Click for details.

Reviewed 556 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:148

Draft comment:
The new JSON serialization logic for inputs (converting each arg with json.dumps and json.loads using a custom JSONEncoder) is a clear improvement in ensuring that the span attribute for input is standardized. However, note that performing a double JSON conversion on every argument can incur performance overhead. Consider checking whether the argument is already a basic type or pre-serializable before applying the conversion.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:183

Draft comment:
The output handling logic has been simplified by using a try/catch block that wraps json.dumps(result, cls=JSONEncoder). This unifies processing and reduces complexity compared to the previous multi-branch logic. Ensure the custom JSONEncoder covers all cases; otherwise, the fallback to converting to string is a reasonable backup.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

3. packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py:282

Draft comment:
The removal of assertions that check for the SpanAttributes.LLM_SYSTEM attribute in the agent spans suggests an intentional change in metadata recording. Confirm that downstream systems do not rely on this attribute. If it is no longer needed, this simplification is acceptable.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

4. packages/sample-app/sample_app/openai_agents_example.py:477

Draft comment:
The removal of the @agent() decorators for the RecipeEditorAgent and MainChatAgent classes simplifies the code and avoids redundant decoration. Verify that this change does not affect the agent registration or behavior, and that the agents still initialize as expected.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

Workflow ID: wflow_NL9Pyw7zfVVfcEbw

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)

402-515: Comprehensive tool instrumentation enhancement.

This enhancement provides excellent support for different tool types:

Dynamic wrapping of FunctionTool methods with proper span creation

Immediate span handling for hosted tools (WebSearchTool, etc.)

Comprehensive attribute setting for telemetry

Proper context propagation using thread-local storage

The implementation correctly handles the different tool execution patterns while maintaining trace continuity.

Note: The thread safety concern for _instrumented_tools mentioned in past reviews should still be addressed in a future iteration.

🧹 Nitpick comments (1)

packages/sample-app/sample_app/openai_agents_example.py (1)
479-657: Excellent agent definitions and application flow.

The implementation provides:

Well-defined agents with clear roles and comprehensive instructions

Good stream handling utility for real-time user interaction

Proper demonstration of agent handoffs and streaming execution

Practical example of the enhanced instrumentation in action

Consider simplifying the nested if statements in the stream handling (lines 566-567) as suggested by static analysis:
-            if isinstance(event.data, ResponseOutputItemAddedEvent):
-                if isinstance(event.data.item, ResponseFunctionToolCall):
+            if isinstance(event.data, ResponseOutputItemAddedEvent) and isinstance(event.data.item, ResponseFunctionToolCall):

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bafbc91 and eb1377b.

📒 Files selected for processing (4)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (10 hunks)
packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (5 hunks)
packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (7 hunks)
packages/sample-app/sample_app/openai_agents_example.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (3)

packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (1)

packages/sample-app/sample_app/openai_agents_example.py (5)

Recipe (45-52)

SearchResponse (64-69)

EditResponse (55-61)

search_recipes (349-476)

plan_and_apply_recipe_modifications (255-345)

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (3)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (3)

SpanAttributes (36-229)

Meters (6-33)

TraceloopSpanKindValues (269-274)

packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (2)

exporter (27-37)

recipe_workflow_agents (148-269)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

packages/sample-app/sample_app/openai_agents_example.py (2)

packages/traceloop-sdk/traceloop/sdk/__init__.py (2)

Traceloop (37-227)

init (49-202)

packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (5)

Recipe (151-158)

EditResponse (167-172)

SearchResponse (160-165)

plan_and_apply_recipe_modifications (218-253)

search_recipes (197-215)

🪛 Ruff (0.12.2)

packages/sample-app/sample_app/openai_agents_example.py

566-567: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.9)
GitHub Check: Test Packages (3.12)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (12)

packages/opentelemetry-instrumentation-openai-agents/tests/conftest.py (3)

47-55: Good enhancement for test isolation.

The fixture now properly clears the global instrumentation state between tests, preventing test interference. The local imports and explicit clearing of _root_span_storage and _instrumented_tools ensure a clean state for each test run.

147-270: Excellent test infrastructure for multi-agent workflows.

This fixture provides comprehensive support for testing multi-agent scenarios with:

Well-defined Pydantic models for structured data

Realistic function tools with proper async implementations

Mock recipe data that supports test scenarios

Proper agent configuration with tools and handoffs

The implementation effectively mirrors the sample application structure while providing a controlled test environment.

99-99: Minor formatting improvements.

These cosmetic changes improve code readability and consistency without affecting functionality.

Also applies to: 118-120, 142-142

packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py (2)

57-68: Test formatting improvements with span count adjustment.

The formatting changes improve code readability by consolidating assertions. However, note the span count change from 4 to 3 in line 82 - ensure this reflects the actual instrumentation behavior change and is not a regression.

Also applies to: 82-82, 97-98, 104-104, 134-134, 139-139, 159-159, 164-164, 196-213

226-380: Comprehensive test for multi-agent workflows.

This test provides excellent coverage of the enhanced instrumentation features:

Validates streaming execution and agent handoffs

Checks span counts, attributes, and relationships

Verifies trace continuity across agent boundaries

Tests both agent and tool span creation

Validates JSON input/output serialization

The test structure is well-organized and the assertions are thorough, providing confidence in the instrumentation's correctness.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (3)

104-208: Well-implemented streaming wrapper.

This new wrapper provides comprehensive support for streaming execution:

Proper thread-local span storage for trace continuity

Consistent input/output serialization using JSONEncoder

Comprehensive error handling with appropriate span status

Maintains compatibility with existing instrumentation patterns

The implementation correctly handles the async nature of streaming calls while preserving trace context.

224-244: Enhanced agent run wrapper with improved context propagation.

The enhancements provide:

Consistent thread-local span storage for trace continuity

Improved input/output serialization with fallback handling

Better context propagation across agent calls

These changes align with the streaming wrapper implementation and improve overall instrumentation quality.

Also applies to: 249-275

6-6: Supporting changes for enhanced instrumentation.

These changes provide necessary infrastructure:

New imports for threading and context management

Global state storage for trace continuity

Enhanced uninstrumentation with proper cleanup

Minor improvements to agent details extraction

The changes are well-integrated and support the main functionality enhancements.

Also applies to: 9-9, 11-11, 25-25, 31-32, 67-83, 313-313, 336-336, 345-345

packages/sample-app/sample_app/openai_agents_example.py (4)

1-87: Well-structured data models and initialization.

The implementation provides:

Comprehensive Pydantic models with proper type hints

Clean import organization

Appropriate Traceloop initialization for sample usage

Well-designed data structures for recipe management

The foundation is solid for demonstrating the instrumentation capabilities.

88-240: Comprehensive recipe database for testing.

The sample recipe data provides excellent variety:

Multiple cuisines and cooking styles

Complete recipe information with ingredients, instructions, and timing

Good coverage for testing search and modification scenarios

This data structure effectively supports the demonstration of the application's capabilities.

254-346: Well-implemented AI-powered recipe modification.

This function demonstrates excellent AI integration:

Comprehensive prompt engineering covering various modification scenarios

Proper structured output parsing with Pydantic models

Robust error handling and validation

Appropriate global context updates

The implementation shows practical usage of OpenAI's structured output capabilities.

348-477: Sophisticated AI-powered recipe search.

This function provides advanced search capabilities:

AI-powered semantic search with intelligent recipe matching

Robust fallback to keyword search when AI parsing fails

Comprehensive error handling and logging

Well-structured response formatting

The implementation demonstrates practical AI search techniques with proper error resilience.

ellipsis-dev

Caution

Changes requested ❌

Reviewed fe4fa8f in 1 minute and 53 seconds. Click for details.

Reviewed 49 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:21

Draft comment:
Consolidated import from local utils replaces the external JSONEncoder. Confirm that the local version meets all previous expectations.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 50% None

Workflow ID: wflow_Hu5dYCLm3nFFdgmL

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)
30-31: Thread safety concerns with global state management.

The global dictionaries are accessed from multiple threads but lack explicit synchronization. This is a known issue from previous reviews that still needs to be addressed.

Consider adding thread-safe access patterns as suggested in the previous review:
+import threading
+
+_storage_lock = threading.Lock()
 _root_span_storage = {}
 _instrumented_tools = set()
Then protect critical sections where _instrumented_tools is accessed.

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py (1)
24-42: Consider consolidating nested if statements and handling dictionary mutation.

The JSONEncoder implementation looks solid for handling various object types, but there are two improvements to consider:

Dictionary mutation: The code directly modifies the input dictionary by deleting the "callbacks" key, which could have side effects on the original object.

Nested if statements: As noted by static analysis, the nested if statements can be combined.

Apply this diff to address both issues:
-        if isinstance(o, dict):
-            if "callbacks" in o:
-                del o["callbacks"]
-                return o
+        if isinstance(o, dict) and "callbacks" in o:
+            # Create a copy to avoid mutating the original dictionary
+            o_copy = dict(o)
+            del o_copy["callbacks"]
+            return o_copy
This prevents unintended side effects while also simplifying the conditional logic.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb1377b and fe4fa8f.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (10 hunks)
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py

26-27: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.9)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (6)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py (1)

1-2: LGTM on the necessary imports.

The dataclasses and json imports are properly added to support the new JSONEncoder functionality.

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (5)

67-75: LGTM on the streaming wrapper integration.

The addition of the streaming wrapper properly extends the instrumentation to support _run_single_turn_streamed method, maintaining consistency with the existing wrapper pattern.

103-206: Excellent implementation of streaming agent tracing.

The _wrap_agent_run_streamed function provides comprehensive tracing for streaming execution with proper:

Thread-local root span management

Input/output serialization with fallback handling

Duration metrics recording

Error handling and span status management

The implementation correctly mirrors the non-streaming wrapper while adding streaming-specific considerations.

223-275: Enhanced tracing with consistent thread-local management.

The modifications to _wrap_agent_run properly integrate thread-local root span storage and improved input/output serialization, ensuring consistency with the streaming wrapper.

79-81: Good cleanup implementation in uninstrumentation.

The cleanup of global state in _uninstrument properly prevents memory leaks and ensures clean test isolation.

462-513: Proper hosted tool span creation with context propagation.

The implementation correctly creates and immediately ends spans for hosted tools (WebSearchTool, FileSearchTool, ComputerTool) with proper context propagation from the root span. The tool-specific attribute extraction is comprehensive and well-structured.

coderabbitai · 2025-07-17T18:38:38Z

+            tool_id = id(tool)
+            if tool_id in _instrumented_tools:
+                continue
+
+            _instrumented_tools.add(tool_id)
+
+            original_on_invoke_tool = tool.on_invoke_tool
+
+            def create_wrapped_tool(original_tool, original_func):
+                async def wrapped_on_invoke_tool(tool_context, args_json):
+                    tool_name = getattr(original_tool, "name", "tool")
+                    if root_span:
+                        ctx = set_span_in_context(root_span, context.get_current())
+                    else:
+                        ctx = context.get_current()
+
+                    with tracer.start_as_current_span(
+                        f"{tool_name}.tool",
+                        kind=SpanKind.INTERNAL,
+                        attributes={
+                            SpanAttributes.TRACELOOP_SPAN_KIND: (
+                                TraceloopSpanKindValues.TOOL.value
+                            )
+                        },
+                        context=ctx,
+                    ) as span:
+                        try:
                            span.set_attribute(
-                                f"{GEN_AI_COMPLETION}.tool.user_location",
-                                str(tool.user_location)
+                                f"{GEN_AI_COMPLETION}.tool.name", tool_name
                            )
-                    elif isinstance(tool, ComputerTool):
-                        span.set_attribute(
-                            f"{GEN_AI_COMPLETION}.tool.type", "ComputerTool"
-                        )
-                        span.set_attribute(
-                            f"{GEN_AI_COMPLETION}.tool.computer",
-                            str(tool.computer)
-                        )
-                span.set_status(Status(StatusCode.OK))
-            except Exception as e:
-                span.set_status(Status(StatusCode.ERROR, str(e)))
-                raise
+                            span.set_attribute(
+                                f"{GEN_AI_COMPLETION}.tool.type", "FunctionTool"
+                            )
+                            span.set_attribute(
+                                f"{GEN_AI_COMPLETION}.tool.description",
+                                original_tool.description,
+                            )
+                            span.set_attribute(
+                                f"{GEN_AI_COMPLETION}.tool.strict_json_schema",
+                                original_tool.strict_json_schema,
+                            )
+                            span.set_attribute("traceloop.entity.input", args_json)
+                            result = await original_func(tool_context, args_json)
+                            span.set_attribute("traceloop.entity.output", str(result))
+                            span.set_status(Status(StatusCode.OK))
+                            return result
+                        except Exception as e:
+                            span.set_status(Status(StatusCode.ERROR, str(e)))
+                            raise
+
+                return wrapped_on_invoke_tool
+
+            tool.on_invoke_tool = create_wrapped_tool(tool, original_on_invoke_tool)
+


🛠️ Refactor suggestion

Potential memory leak with function tool instrumentation.

The current implementation stores tool IDs in _instrumented_tools but never removes them, which could lead to memory leaks in long-running applications where tools are dynamically created and destroyed.

Consider implementing a cleanup mechanism or using weak references:

import weakref # Option 1: Use WeakSet for automatic cleanup _instrumented_tools = weakref.WeakSet() # Option 2: Add cleanup method to be called when tools are no longer needed def cleanup_tool_instrumentation(tool): tool_id = id(tool) _instrumented_tools.discard(tool_id)

Additionally, the function wrapping approach modifies the original tool object, which could have side effects. Consider using a more isolated approach if possible.

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py between lines 408 and 461, the _instrumented_tools set stores tool IDs indefinitely, risking memory leaks in long-running processes with dynamic tool lifecycles. Replace _instrumented_tools with a weakref.WeakSet to allow automatic cleanup of tool references when they are garbage collected. Additionally, consider adding a cleanup function to explicitly remove tool IDs from _instrumented_tools when tools are no longer needed. Also, review the function wrapping approach to avoid modifying the original tool object directly, possibly by using a proxy or decorator pattern to isolate instrumentation side effects.

…etry/instrumentation/openai_agents/utils.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe4fa8f and 1f7f90f.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py

26-27: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.9)
GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Lint

🔇 Additional comments (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py (1)

1-2: LGTM! Imports are appropriate for the JSONEncoder implementation.

The dataclasses and json imports are necessary for the new JSONEncoder class functionality.

coderabbitai · 2025-07-17T18:45:07Z

+class JSONEncoder(json.JSONEncoder):
+    def default(self, o):
+        if isinstance(o, dict):
+            if "callbacks" in o:
+                o = {k: v for k, v in o.items() if k != "callbacks"}
+                return o
+        if dataclasses.is_dataclass(o):
+            return dataclasses.asdict(o)
+
+        if hasattr(o, "to_json"):
+            return o.to_json()
+
+        if hasattr(o, "json"):
+            return o.json()
+
+        if hasattr(o, "__class__"):
+            return o.__class__.__name__
+
+        return super().default(o)


⚠️ Potential issue

Fix the dict handling logic and consider simplifying nested conditions.

The JSONEncoder implementation has a logical flaw in the dict handling section. Currently, it only returns the filtered dict if "callbacks" is present, but it should process all dicts consistently.

Apply this diff to fix the dict handling logic:

- if isinstance(o, dict): - if "callbacks" in o: - o = {k: v for k, v in o.items() if k != "callbacks"} - return o + if isinstance(o, dict): + return {k: v for k, v in o.items() if k != "callbacks"}

Additionally, consider the static analysis suggestion to combine the nested if statements in the set_span_attribute function (lines 8-10), though this is a minor improvement:

- if value is not None: - if value != "": - span.set_attribute(name, value) + if value is not None and value != "": + span.set_attribute(name, value)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.12.2)

26-27: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py lines 24-42, the JSONEncoder's dict handling only returns a filtered dict when "callbacks" is present, skipping other dicts. Modify the logic to always process dicts by filtering out "callbacks" if present, otherwise returning the dict as is. Also, simplify the nested if statements in the set_span_attribute function (lines 8-10) by combining conditions into a single if statement for clarity.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed ca0b49e in 45 seconds. Click for details.

Reviewed 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py:28

Draft comment:
Nice improvement avoiding in-place mutation. Ensure returning a new dict with callbacks removed aligns with expected JSON encoding semantics.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_Bta6yLadqWDavHGo

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)
402-405: Add synchronization around _instrumented_tools to prevent race conditions

Your extract_tool_details logic instruments FunctionTool instances without any lock, which can lead to double-wrapping or missed instrumentation when multiple threads race through the membership check and add. Introduce a module-level lock and wrap the check/add in a critical section:

• File:
packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/init.py
Function: extract_tool_details

• Changes needed:
– Define a lock alongside _instrumented_tools
– Enclose the if tool in … and .add(tool) calls in with _instrumentation_lock:

Example diff snippet:
--- a/.../openai_agents/__init__.py
+++ b/.../openai_agents/__init__.py
@@
-_instrumented_tools = weakref.WeakSet()
+_instrumented_tools = weakref.WeakSet()
+_instrumentation_lock = threading.Lock()

 def extract_tool_details(tracer: Tracer, tools):
     """Create spans for hosted tools and wrap FunctionTool execution."""
     thread_id = threading.get_ident()
     root_span = _root_span_storage.get(thread_id)
@@
     for tool in tools:
         if isinstance(tool, FunctionTool):
-            if tool in _instrumented_tools:
-                continue
-            _instrumented_tools.add(tool)
+            with _instrumentation_lock:
+                if tool in _instrumented_tools:
+                    continue
+                _instrumented_tools.add(tool)
             original_on_invoke_tool = tool.on_invoke_tool
             …
This ensures that membership checks and additions to the WeakSet are atomic and thread-safe.

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (1)

312-318: Model name extraction needs error handling.

The nested attribute access in get_model_name could raise AttributeError if the agent structure is unexpected. Consider adding proper error handling.

 def get_model_name(agent):
-    model_attr = getattr(getattr(agent, "model", None), "model", "unknown_model")
-    if model_attr == "unknown_model":
-        model_attr = getattr(agent, "model", None)
-        return model_attr
-    else:
-        return model_attr
+    try:
+        model = getattr(agent, "model", None)
+        if model is None:
+            return "unknown_model"
+        return getattr(model, "model", str(model))
+    except (AttributeError, TypeError):
+        return "unknown_model"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f7f90f and ca0b49e.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (10 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.9)
GitHub Check: Lint

🔇 Additional comments (10)

packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py (10)

6-7: Threading and weakref imports support the new concurrency model.

The addition of threading and weakref imports properly supports the thread-local span storage and memory-safe tool tracking introduced in this PR.

31-32: Thread-local span storage addresses trace continuity issues.

The implementation uses thread ID as the key for span storage, which is appropriate for maintaining trace context across async operations within the same thread. The use of weakref.WeakSet for _instrumented_tools helps prevent memory leaks by allowing garbage collection of tool references.

68-76: Streaming instrumentation wrapper correctly added.

The new wrapper for _run_single_turn_streamed properly extends tracing support to streaming execution paths, maintaining consistency with the existing non-streaming wrapper pattern.

80-82: Proper cleanup in uninstrumentation.

The cleanup of both _instrumented_tools and _root_span_storage in the uninstrumentation method ensures no memory leaks when the instrumentation is disabled.

104-207: Streaming wrapper implementation looks comprehensive.

The streaming wrapper correctly:

Manages thread-local root span storage

Handles context propagation with proper fallback

Captures input/output with JSON serialization and fallback

Records metrics and span attributes

Handles errors appropriately

The implementation mirrors the non-streaming wrapper while adding streaming-specific handling.

224-244: Enhanced context propagation in non-streaming wrapper.

The addition of thread-local span storage and context propagation ensures proper trace linkage across agent handoffs. The logic correctly falls back to current context when no root span exists.

250-275: JSON serialization with proper fallback handling.

The input serialization logic correctly uses the custom JSONEncoder with proper exception handling and fallback to string representation. This ensures robust attribute capture even for complex objects.

284-288: Consistent output serialization pattern.

The output serialization follows the same pattern as input serialization, ensuring consistency in attribute capture across the instrumentation.

402-461: Function tool instrumentation with proper span wrapping.

The implementation correctly:

Checks for already instrumented tools to avoid duplicate wrapping

Uses closure to preserve original function reference

Maintains proper context propagation from root span

Captures comprehensive tool attributes and I/O

Handles errors appropriately

The use of weakref.WeakSet for _instrumented_tools addresses the memory leak concern from past reviews.

462-513: Hosted tool span creation with proper context.

The implementation for hosted tools (WebSearchTool, FileSearchTool, ComputerTool) correctly:

Uses the same context propagation pattern as function tools

Creates spans with appropriate attributes

Immediately ends spans to preserve original behavior

Captures tool-specific configuration attributes

ellipsis-dev

Important

Looks good to me! 👍

Reviewed e93d8ea in 1 minute and 13 seconds. Click for details.

Reviewed 15 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 1 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py:24

Draft comment:
Removed filtering for 'callbacks' from dicts. Ensure that including 'callbacks' in the serialized output won't cause non-serializable or sensitive data issues.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_p4FMCg5DEgCVloWZ

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed 634af4b in 1 minute and 59 seconds. Click for details.

Reviewed 97 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:32

Draft comment:
Changed _instrumented_tools from a WeakSet to a plain set (storing tool IDs). Ensure this change is intentional and that it won’t cause memory retention issues.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The change actually seems well thought out. By storing IDs instead of objects, we avoid any memory retention issues that could have existed with the WeakSet. The _uninstrument() method properly clears the set. The WeakSet was likely unnecessary since we have explicit cleanup. The comment seems to raise a non-issue and could mislead developers. Could there be edge cases where tools need to be garbage collected before _uninstrument() is called? Am I missing some threading concerns? The tools are only used during instrumentation and explicitly cleaned up. Even if there are edge cases, storing IDs is actually safer than storing object references. The comment should be deleted. The code change is an improvement that avoids potential memory issues by storing IDs instead of object references, and has proper cleanup in place.

2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:193

Draft comment:
Removed LLM_SYSTEM and LLM_RESPONSE_MODEL attributes from histogram record calls. Confirm that this removal is intended and that downstream metrics processing doesn’t rely on these attributes.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_ZaKPi4gKKFjXDgX4

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-07-17T19:20:26Z

+            except Exception:
+                span.set_attribute("traceloop.entity.output", json.dumps(str(response)))
            if duration_histogram:
                duration_histogram.record(


Note the inconsistency: the streamed agent run attaches an attribute ('gen_ai.agent.name') while the non-streamed run does not pass any attributes when recording duration. Confirm if this difference is intentional.

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 993b90e in 1 minute and 6 seconds. Click for details.

Reviewed 20 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:6

Draft comment:
Removed unused import 'weakref'.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:120

Draft comment:
Removed unused variable 'model_name' in _wrap_agent_run_streamed.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_SGJxCr7dxqyCbTKE

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 1673c03 in 1 minute and 21 seconds. Click for details.

Reviewed 72 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:165

Draft comment:
Good refactor: using SpanAttributes constants (TRACELOOP_ENTITY_INPUT) instead of hardcoded strings for input attributes improves maintainability and consistency. Verify that these constants are defined as expected.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment starts with a positive note about the refactor, which is informative. However, it ends with a request to verify that the constants are defined as expected, which is asking the author to double-check something. This violates the rule against asking the author to confirm or verify things.

2. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:183

Draft comment:
Similarly, replacing the hardcoded output attribute key with SpanAttributes.TRACELOOP_ENTITY_OUTPUT enhances consistency across streaming and single-turn execution paths.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment seems to be purely informative, explaining the benefit of a change without suggesting any action or asking for confirmation. It doesn't align with the rules for useful comments.

3. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:264

Draft comment:
The same improvements are applied in the _wrap_agent_run function for both input and output attribute settings using SpanAttributes constants. This reinforces uniform telemetry tagging.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states what has been done in the code without offering any critique or asking for confirmation of intent.

4. packages/opentelemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py:441

Draft comment:
In the wrapped tool invocation, replacing hardcoded attribute keys with SpanAttributes constants for input and output further standardizes telemetry data tagging across tools.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining what a change does without suggesting any action or asking for clarification. It doesn't fit the criteria for a useful comment as per the rules.

Workflow ID: wflow_tb3BBxwvBzR3HCYJ

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

…am (#3143)

galkleinman added 2 commits July 16, 2025 11:08

fix(openai-agents): fix broken traces with agents handoff

b7dd9cf

add streaming support

6748531

galkleinman added 5 commits July 16, 2025 14:47

missing files

14ffe2f

fix: duplicate tool call spans

b469f03

remove unwanted change

28be464

make python 3.9 test pass

cab6f34

fix linting

dc1247b

galkleinman marked this pull request as ready for review July 16, 2025 15:30

ellipsis-dev Bot reviewed Jul 16, 2025

View reviewed changes

coderabbitai Bot reviewed Jul 16, 2025

View reviewed changes

Comment thread ...emetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/__init__.py Outdated

remove redundant comments

6ea260a

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

coderabbitai Bot reviewed Jul 17, 2025

View reviewed changes

Comment thread packages/opentelemetry-instrumentation-openai-agents/tests/test_openai_agents.py

code rabbit comments

7d221ab

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

coderabbitai Bot reviewed Jul 17, 2025

View reviewed changes

lint

76dcbcb

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

coderabbitai Bot reviewed Jul 17, 2025

View reviewed changes

fixing more comments

bafbc91

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

coderabbitai Bot reviewed Jul 17, 2025

View reviewed changes

fix all issues

eb1377b

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

coderabbitai Bot reviewed Jul 17, 2025

View reviewed changes

fix jsonencoder issue

fe4fa8f

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

Comment thread ...telemetry-instrumentation-openai-agents/opentelemetry/instrumentation/openai_agents/utils.py Outdated

coderabbitai Bot reviewed Jul 17, 2025

View reviewed changes

Update packages/opentelemetry-instrumentation-openai-agents/opentelem…

1f7f90f

…etry/instrumentation/openai_agents/utils.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

replace tools set with memory leak safe one

6ad26cd

coderabbitai Bot reviewed Jul 17, 2025

View reviewed changes

merging

ca0b49e

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

coderabbitai Bot reviewed Jul 17, 2025

View reviewed changes

encoder cleanup

e93d8ea

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

galkleinman changed the title ~~fix(openai-agents): fix broken traces with agents handoff~~ fix(openai-agents): fix broken traces with agents handoff on run_stream Jul 17, 2025

nirga approved these changes Jul 17, 2025

View reviewed changes

galkleinman added 2 commits July 17, 2025 22:18

fix tests

634af4b

lint

993b90e

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

use semconv to inputs and outputs

1673c03

ellipsis-dev Bot reviewed Jul 17, 2025

View reviewed changes

galkleinman merged commit 67b93f5 into main Jul 17, 2025
10 checks passed

coderabbitai Bot mentioned this pull request Jul 18, 2025

fix(openai_agents):Added the attributes to the duration histogram #3150

Open

4 tasks

coderabbitai Bot mentioned this pull request Aug 2, 2025

fix(openai-agents): use framework's context to infer trace #3215

Merged

4 tasks

nina-kollman pushed a commit that referenced this pull request Aug 11, 2025

fix(openai-agents): fix broken traces with agents handoff on run_stre…

86c7f03

…am (#3143)

coderabbitai Bot mentioned this pull request Aug 15, 2025

fix(openai-agents): switch to hook-based instrumentation #3283

Merged

-_root_span_storage = {}
-_instrumented_tools = set()
+# At the top of the file, add the lock alongside the globals
+import threading
+_storage_lock = threading.Lock()
+_root_span_storage = {}
+_instrumented_tools = set()
+# … rest of imports and code …
+# Then in the tool-instrumentation section, replace the non-atomic check/add:
+#    if tool_id in _instrumented_tools:
+#        continue
+#    _instrumented_tools.add(tool_id)
+# with a locked critical section:
+            with _storage_lock:
+                if tool_id in _instrumented_tools:
+                    continue
+                _instrumented_tools.add(tool_id)
+# … rest of the code …

Conversation

galkleinman commented Jul 16, 2025 • edited by ellipsis-dev Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev Bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

galkleinman commented Jul 16, 2025 •

edited by ellipsis-dev Bot

Loading

coderabbitai Bot commented Jul 16, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)