fix(langchain): fix nesting of langgraph spans#3206
Conversation
WalkthroughAdds a LangGraph + OpenTelemetry reproduction script and test; hardens LangChain callback context attach/detach for async scenarios; updates VCR redaction and recorded test cassettes; adjusts several test expectations and tool_call IDs. Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as Test/Script
participant OTel as OpenTelemetry SDK
participant Instr as LangChain Instrumentor
participant Graph as LangGraph Agent
participant HTTP as httpx.AsyncClient
participant Exporter as InMemorySpanExporter
Runner->>OTel: setup_tracing()
OTel->>Instr: instrument(LangChain)
Runner->>OTel: start root span (test_agent_execution_root)
Runner->>Graph: ainvoke(initial_state)
Graph->>OTel: start span (LangGraph.workflow)
Graph->>Graph: execute node http_call.task
Graph->>OTel: start span (http_call.task)
Graph->>HTTP: POST /sum
HTTP-->>Graph: response (35)
Graph->>OTel: end span (http_call.task)
Graph->>Graph: execute node otel_span.task
Graph->>OTel: start span (test_agent_span)
Graph->>OTel: end span (test_agent_span)
Graph->>OTel: end span (LangGraph.workflow)
Runner->>OTel: end root span
OTel->>Exporter: export spans
Runner->>Instr: uninstrument()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 5ff9bbf in 1 minute and 51 seconds. Click for details.
- Reviewed
507lines of code in4files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py:55
- Draft comment:
Consider using a descriptive variable name instead of '_' for the AsyncClient, to improve code clarity. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% 1. The _ is intentionally used here to indicate the variable is unused. 2. This follows Python conventions where _ indicates a throwaway variable. 3. Using a descriptive name for an unused variable could be misleading since it suggests the variable has a purpose. 4. The code is actually clearer with _ since it signals to readers that this variable isn't used. Maybe having a named client would make debugging easier if there are connection issues? Maybe the code will need to use the client in the future? The code is a test reproduction case demonstrating a specific issue - it's not production code that needs to handle real HTTP calls or be maintained long term. The _ is actually more appropriate for this context. The comment should be deleted. Using _ is the correct Python idiom here since the variable is intentionally unused, and a descriptive name would make the code less clear, not more clear.
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:55
- Draft comment:
Explicitly setting the global tracer provider (line 55–59) ensures that spans created by user code are properly nested. This is a solid improvement. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the change is a solid improvement, which violates the rule against making purely informative comments.
3. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:189
- Draft comment:
Wrapping context_api.detach(token) in a try/except block to handle ValueError in asynchronous scenarios is a good and necessary fix. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the change is good and necessary, which violates the rule against making purely informative comments.
4. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:227
- Draft comment:
Changing the conditional span context attachment to unconditional (removing the check for async callbacks) simplifies management. Verify that this change does not inadvertently affect synchronous callback behavior. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py:292
- Draft comment:
The tests, including the exact reproduction of GitHub Issue #3203, thoroughly assert span names and parent-child relationships. They provide good coverage to validate the corrected span nesting. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, providing details about the tests and their coverage. It does not suggest any changes or ask for confirmation on specific code behavior.
Workflow ID: wflow_TaEhtkkh8QGGmEE7
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 15476d1 in 46 seconds. Click for details.
- Reviewed
27lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:50
- Draft comment:
Removed global tracer provider setting. Please document the rationale behind removing the global setting, as it could affect span nesting behavior. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking for documentation of the rationale behind a change, which is similar to asking for an update to the PR description. This violates the rule against asking the PR author to update the PR description or provide explanations.
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:29
- Draft comment:
Minor formatting change: extra space added in the _instruments tuple. Confirm if this is intentional. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_yksaE0uDHchF83Hz
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
190-195: Consider usingcontextlib.suppressfor cleaner exception handlingWhile the current implementation is functionally correct, using
contextlib.suppresswould be more idiomatic and concise for ignoring exceptions.+from contextlib import suppressThen update the exception handling:
- try: - context_api.detach(token) - except ValueError: - # Context detach can fail in async scenarios when tokens are created in different contexts - # This is expected behavior and doesn't affect the correct span hierarchy - pass + # Context detach can fail in async scenarios when tokens are created in different contexts + # This is expected behavior and doesn't affect the correct span hierarchy + with suppress(ValueError): + context_api.detach(token)packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)
297-309: Consider extracting span debugging logic to a test utilityThe span hierarchy printing logic (lines 297-309) could be useful for debugging other tests. Consider extracting it to a shared test utility function.
Would you like me to create a utility function for debugging span hierarchies that could be reused across tests?
packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (1)
55-56: Unused httpx.AsyncClient context managerThe code creates an
httpx.AsyncClientcontext manager but doesn't actually use it (the variable is named_). Since this is meant to simulate an HTTP call, either use the client or add a comment explaining why it's not used.- async with httpx.AsyncClient() as _: + # Simulating HTTP call context without actual network request + async with httpx.AsyncClient() as _:Or if you want to make it more realistic:
- async with httpx.AsyncClient() as _: + async with httpx.AsyncClient() as client: + # In a real scenario, we would make an actual HTTP call: + # response = await client.post("https://httpbin.org/post", json=data)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py(1 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py(2 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py (1)
LangchainInstrumentor(32-185)packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)
export(45-51)InMemorySpanExporter(22-61)get_finished_spans(40-43)packages/opentelemetry-instrumentation-mcp/tests/conftest.py (2)
tracer_provider(32-44)tracer(48-49)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (3)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
instrument_legacy(77-93)packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (5)
TestAgentState(45-48)http_call_node(50-72)opentelemetry_span_node(74-99)create_test_agent(101-115)run_test_agent(117-148)packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
get_finished_spans(40-43)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
190-195: Use contextlib.suppress(ValueError) instead of try-except-pass
(SIM105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
236-236: Good fix for async context propagationThe unconditional attachment of the span context ensures that a token is always available for later detachment, which is critical for maintaining proper span hierarchy in async scenarios. This change, combined with the error handling in
_end_span, properly addresses the GitHub issue #3203.packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)
188-343: Comprehensive test coverage for the GitHub issue reproductionThe test thoroughly reproduces GitHub issue #3203 with:
- Proper async execution flow with two LangGraph nodes
- Correct verification of span hierarchy and parent-child relationships
- Detailed assertions for all expected spans
- Good debugging output for troubleshooting
This test will effectively prevent regression of the span nesting issue.
| #!/usr/bin/env python3 | ||
| """ | ||
| Reproduce the exact GitHub issue #3203 and show the trace waterfall. | ||
| This demonstrates the span hierarchy issue before and after the fix. | ||
| """ | ||
|
|
||
| import asyncio | ||
| import sys | ||
| from typing import TypedDict | ||
| import httpx | ||
| from langgraph.graph import END, START, StateGraph | ||
| from opentelemetry import trace | ||
| from opentelemetry.instrumentation.langchain import LangchainInstrumentor | ||
| from opentelemetry.sdk.trace import TracerProvider | ||
| from opentelemetry.sdk.trace.export import SimpleSpanProcessor | ||
| from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter | ||
|
|
||
| from waterfall_visualizer import visualize_trace_waterfall, print_raw_span_data | ||
|
|
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Missing import for waterfall_visualizer module
The script imports waterfall_visualizer (line 18) but this module doesn't appear to be part of the package. This will cause an ImportError when running the script.
🏁 Script executed:
#!/bin/bash
# Check if waterfall_visualizer module exists in the repository
fd -t f "waterfall_visualizer.py"
# Check if it's imported elsewhere in the codebase
rg "from waterfall_visualizer import" -A 2
# Check if there's any reference to this module
rg "waterfall_visualizer" --type pyLength of output: 580
Add the missing waterfall_visualizer module or correct the import
The script at packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (line 18) imports waterfall_visualizer, but no such file exists in the repository. This will raise an ImportError at runtime.
To resolve:
- Include
waterfall_visualizer.pyin the package (e.g., underpackages/opentelemetry-instrumentation-langchain/), definingvisualize_trace_waterfallandprint_raw_span_data. - Or, if it belongs in a different module, update the import path and add the dependency to
setup.pyorpyproject.toml.
🤖 Prompt for AI Agents
In packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py
around lines 1 to 19, the script imports waterfall_visualizer which does not
exist and will cause ImportError; either add a new file
packages/opentelemetry-instrumentation-langchain/waterfall_visualizer.py that
defines visualize_trace_waterfall and print_raw_span_data and ensure it is
included in the package manifest, or update the import to the correct existing
module path and add the dependency to setup.py/pyproject.toml so the module is
available at runtime.
| print("\n🎉 GitHub Issue #3203 has been FIXED!") | ||
| print("The visualization above shows the CORRECTED span hierarchy.") | ||
| print("Note how POST and test_agent_span are now properly nested") | ||
| print("under their respective task spans (http_call.task and otel_span.task)!") |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Misleading success message before fix verification
The script prints "GitHub Issue #3203 has been FIXED!" and describes the corrected hierarchy without actually verifying whether the fix is in place. This could be confusing if someone runs this script on a version without the fix.
Consider dynamically checking the span hierarchy to determine if the issue is fixed:
- print("\n🎉 GitHub Issue #3203 has been FIXED!")
- print("The visualization above shows the CORRECTED span hierarchy.")
- print("Note how POST and test_agent_span are now properly nested")
- print("under their respective task spans (http_call.task and otel_span.task)!")
+ # Verify if the hierarchy is correct
+ post_span = next((s for s in spans if s.name == "POST"), None)
+ http_call_task = next((s for s in spans if s.name == "http_call.task"), None)
+
+ if post_span and http_call_task and post_span.parent:
+ if post_span.parent.span_id == http_call_task.context.span_id:
+ print("\n🎉 GitHub Issue #3203 has been FIXED!")
+ print("The visualization above shows the CORRECTED span hierarchy.")
+ print("Note how POST and test_agent_span are now properly nested")
+ print("under their respective task spans (http_call.task and otel_span.task)!")
+ else:
+ print("\n⚠️ GitHub Issue #3203 is still present!")
+ print("The spans are not properly nested under their task spans.")
+ else:
+ print("\n⚠️ Unable to verify span hierarchy - expected spans not found")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print("\n🎉 GitHub Issue #3203 has been FIXED!") | |
| print("The visualization above shows the CORRECTED span hierarchy.") | |
| print("Note how POST and test_agent_span are now properly nested") | |
| print("under their respective task spans (http_call.task and otel_span.task)!") | |
| # Verify if the hierarchy is correct | |
| post_span = next((s for s in spans if s.name == "POST"), None) | |
| http_call_task = next((s for s in spans if s.name == "http_call.task"), None) | |
| if post_span and http_call_task and post_span.parent: | |
| if post_span.parent.span_id == http_call_task.context.span_id: | |
| print("\n🎉 GitHub Issue #3203 has been FIXED!") | |
| print("The visualization above shows the CORRECTED span hierarchy.") | |
| print("Note how POST and test_agent_span are now properly nested") | |
| print("under their respective task spans (http_call.task and otel_span.task)!") | |
| else: | |
| print("\n⚠️ GitHub Issue #3203 is still present!") | |
| print("The spans are not properly nested under their task spans.") | |
| else: | |
| print("\n⚠️ Unable to verify span hierarchy - expected spans not found") |
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed ec1b5b6 in 2 minutes and 29 seconds. Click for details.
- Reviewed
1149lines of code in3files - Skipped
0files when reviewing. - Skipped posting
8draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:150
- Draft comment:
Explicit 'match_on' keys added to vcr_config improve request matching. Confirm these keys cover all necessary URI components. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:122
- Draft comment:
The tool call ID has been updated to 'call_RQXCf1bdiBiFrtwfOTP3GCCd'. Ensure this change aligns with the underlying agent implementation. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to ensure that the change aligns with the underlying agent implementation, which is against the rules. It does not provide a specific suggestion or ask for a specific test to be written.
3. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:220
- Draft comment:
The tool call ID in the no-content scenario is now 'call_Joct6NnIJFGGWfvMqZJPRB4C'. Verify that this updated ID aligns with expected behavior. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to verify if the updated tool call ID aligns with expected behavior. This falls under asking the author to confirm their intention or ensure behavior, which is against the rules.
4. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:104
- Draft comment:
Log exporter assertions now expect 8 logs. Confirm that this new count accurately reflects the improved filtering/consolidation. - Reason this comment was not posted:
Comment looked like it was already resolved.
5. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:202
- Draft comment:
The assertion that all logs have GenAIAttributes.GEN_AI_SYSTEM equal to 'langchain' is helpful. Ensure this remains valid for future behavior. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml:19
- Draft comment:
Typographical: There appears to be an extra stray apostrophe on this line that may be unintentional. - Reason this comment was not posted:
Comment was on unchanged code.
7. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml:82
- Draft comment:
The JSON string for the "template" property now starts with "are a helpful assistant" instead of "You are a helpful assistant". This results in an incomplete sentence. Please verify if the leading "You" was inadvertently removed. - Reason this comment was not posted:
Comment was on unchanged code.
8. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml:256
- Draft comment:
Typo alert: The query string "OpenLLMetry" appears to have an extra 'L'. If the intent was to refer to OpenTelemetry, please update it accordingly. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_f2BGg6t3wtktllTA
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (2)
122-129: Don’t hardcode tool_call IDs; derive them from the logsTool call IDs are provider-generated and will change when cassettes are re-recorded. Hardcoding them forces frequent fixture updates. Extract the ID from the assistant message event and reuse it in subsequent assertions.
Apply this diff:
@@ def test_agents_with_events_with_content( - # Validate that the assistant message Event exists + # Extract tool_call id from assistant message for stable assertions + tool_call_id = None + for log in logs: + if log.log_record.attributes.get(EventAttributes.EVENT_NAME) == "gen_ai.assistant.message": + body = dict(log.log_record.body) + calls = body.get("tool_calls") or [] + if calls: + tool_call_id = calls[0].get("id") + break + assert tool_call_id, "assistant tool_call id not found in logs" + + # Validate that the assistant message Event exists assert_message_in_logs( logs, "gen_ai.assistant.message", { "content": "", "tool_calls": [ { - "id": "call_RQXCf1bdiBiFrtwfOTP3GCCd", + "id": tool_call_id, "function": { "name": "tavily_search_results_json", "arguments": {"query": "OpenLLMetry"}, }, "type": "function", } ], }, ) @@ choice_event = { "index": 0, "finish_reason": "tool_calls", "message": {"content": ""}, "tool_calls": [ { - "id": "call_RQXCf1bdiBiFrtwfOTP3GCCd", + "id": tool_call_id, "function": { "name": "tavily_search_results_json", "arguments": {"query": "OpenLLMetry"}, }, "type": "function", } ], }Repeat the same pattern for the no_content test (see next comment).
Also applies to: 140-148
220-226: Mirror dynamic tool_call ID extraction in the no-content testAvoid hardcoding the streaming tool_call id here as well; reuse the id derived from the assistant.message log.
Apply this diff:
@@ def test_agents_with_events_with_no_content( - # Validate that the assistant message Event exists + # Extract tool_call id from assistant message for stable assertions + tool_call_id = None + for log in logs: + if log.log_record.attributes.get(EventAttributes.EVENT_NAME) == "gen_ai.assistant.message": + body = dict(log.log_record.body) + calls = body.get("tool_calls") or [] + if calls: + tool_call_id = calls[0].get("id") + break + assert tool_call_id, "assistant tool_call id not found in logs" + + # Validate that the assistant message Event exists assert_message_in_logs( logs, "gen_ai.assistant.message", { "tool_calls": [ { - "id": "call_Joct6NnIJFGGWfvMqZJPRB4C", + "id": tool_call_id, "function": {"name": "tavily_search_results_json"}, "type": "function", } ] }, ) @@ choice_event = { "index": 0, "finish_reason": "tool_calls", "message": {}, "tool_calls": [ { - "id": "call_Joct6NnIJFGGWfvMqZJPRB4C", + "id": tool_call_id, "function": {"name": "tavily_search_results_json"}, "type": "function", } ], }Also applies to: 235-240
🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
149-156: Avoid weakening HTTP cassette matching by omitting the request bodyAdding match_on without "body" increases the risk of mismatching multiple requests to the same endpoint (e.g., two POSTs to /v1/chat/completions in a single test). This can hide regressions and cause flaky replays across tests that hit identical routes with different payloads.
Consider including "body" back in the matchers and scrubbing dynamic fields instead. Also scrub sensitive data in JSON bodies to prevent leaks (see separate comment on leaked API key). Example:
def vcr_config(): - return { - "filter_headers": ["authorization", "x-api-key"], - "filter_body": ["api_key"], - "match_on": ["method", "scheme", "host", "port", "path", "query"], - } + return { + "filter_headers": ["authorization", "x-api-key", "cookie", "set-cookie"], + # If you use pytest-recording, prefer JSON scrubbing via before_record_request (below) + "match_on": ["method", "scheme", "host", "port", "path", "query", "body"], + "before_record_request": _scrub_request_body, + "before_record_response": _scrub_response_headers, + }Add these helpers (outside this function):
# Add at the top of the file: import json def _scrub_request_body(request): # Scrub JSON api_key in request bodies try: if request.body and "application/json" in (request.headers or {}).get("Content-Type", ""): data = json.loads(request.body) if isinstance(data, dict) and "api_key" in data: data["api_key"] = "<REDACTED>" request.body = json.dumps(data) except Exception: pass # Remove cookies if request.headers: request.headers.pop("cookie", None) return request def _scrub_response_headers(response): # Remove set-cookie header to reduce churn headers = response["headers"] or {} headers.pop("Set-Cookie", None) headers.pop("set-cookie", None) return responsepackages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml (1)
380-383: Reduce cassette churn: filter volatile cookies and request IDsSet-Cookie, Cookie, CF-RAY, and x-request-id values are highly volatile and cause unnecessary cassette diffs. Filtering them improves stability, reduces noise, and avoids committing ephemeral identifiers.
In conftest vcr_config, add:
- filter_headers: ["cookie", "set-cookie"]
- before_record_response: drop Set-Cookie (see the helper suggested there).
Optionally also filter "CF-RAY" and "x-request-id" in response headers if you find them noisy.
Also applies to: 200-216, 995-1041
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (2)
104-104: Avoid brittle assertion on total log countAsserting an exact log count (8) is brittle across provider or semconv changes and can fail even if the required events are present. Prefer validating the presence of required events (which you already do) and, if needed, assert a minimum count.
For example:
- assert len(logs) == 8 + assert len(logs) >= 6
248-256: Use a broader type for logs to reflect actual usageassert_message_in_logs accepts a Tuple[LogData], but callers pass a list. Use Sequence[LogData] to match both lists and tuples.
-from typing import Tuple +from typing import Sequence @@ -def assert_message_in_logs( - logs: Tuple[LogData], event_name: str, expected_content: dict -): +def assert_message_in_logs( + logs: Sequence[LogData], event_name: str, expected_content: dict +):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml(16 hunks)packages/opentelemetry-instrumentation-langchain/tests/conftest.py(1 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_agents.py(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Lint
- GitHub Check: Build Packages (3.11)
ec1b5b6 to
a5296a4
Compare
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed a5296a4 in 1 minute and 59 seconds. Click for details.
- Reviewed
93lines of code in2files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:77
- Draft comment:
The multi-line formatting for the OpenAIInstrumentor.instrument call improves readability without altering functionality. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:151
- Draft comment:
The VCR configuration now uses a 'match_on' list and a before_record_request callback to filter the API key in JSON bodies. This stricter matching approach looks appropriate; just ensure non-JSON request bodies are handled as expected. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:104
- Draft comment:
In test_agents_with_events_with_content, the expected log count is updated from 15 to 8 and the tool call id is updated to 'call_RQXCf1bdiBiFrtwfOTP3GCCd', reflecting the corrected span nesting. Confirm that these new expectations are stable. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is asking the PR author to confirm the stability of new expectations, which violates the rule against asking for confirmation or verification. It does not provide a specific code suggestion or point out a clear issue that needs addressing.
4. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:219
- Draft comment:
In test_agents_with_events_with_no_content, the tool call id has been updated to 'call_Joct6NnIJFGGWfvMqZJPRB4C'. This change aligns with the fix; ensure these IDs remain consistent across test runs. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%The comment is informative and asks the author to ensure consistency across test runs, which violates the rules. It doesn't provide a specific code suggestion or ask for a specific test to be written.
5. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:143
- Draft comment:
Typographical error: The string "OpenLLMetry" on this line may be a typo. Did you mean "OpenTelemetry"? - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_HFjyES3oDMzdI1Bd
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
|
Generated with ❤️ by ellipsis.dev |
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (2)
114-131: Avoid brittle assertions on dynamic tool_call IDsHard-coding OpenAI tool_call ids makes tests tightly coupled to cassettes and upstream behavior. Prefer structural assertions and treat the id as “any value”.
Apply this diff to assert a structural subset and allow any id:
- assert_message_in_logs( + assert_message_in_logs_contains( logs, "gen_ai.assistant.message", { "content": "", "tool_calls": [ { - "id": "call_RQXCf1bdiBiFrtwfOTP3GCCd", + "id": ANY, "function": { "name": "tavily_search_results_json", "arguments": {"query": "OpenLLMetry"}, }, "type": "function", } ], }, ) @@ - assert_message_in_logs(logs, "gen_ai.choice", choice_event) + assert_message_in_logs_contains(logs, "gen_ai.choice", choice_event)Additionally add this helper and import to support partial matching and ANY outside this hunk:
from unittest.mock import ANY def _deep_contains(superset, subset): if subset is ANY: return True if isinstance(subset, dict): if not isinstance(superset, dict): return False for k, v in subset.items(): if k not in superset or not _deep_contains(superset[k], v): return False return True if isinstance(subset, list): if not isinstance(superset, list) or len(subset) > len(superset): return False # every expected item must match at least one actual item for expected in subset: if not any(_deep_contains(actual, expected) for actual in superset): return False return True return superset == subset def assert_message_in_logs_contains(logs: Tuple[LogData], event_name: str, expected_subset: dict): assert any( log.log_record.attributes.get(EventAttributes.EVENT_NAME) == event_name for log in logs ) assert any( _deep_contains(dict(log.log_record.body), expected_subset) for log in logs )Also applies to: 133-149
213-226: Apply the same robust matching for no-content flowMirror the structural assertion approach to avoid brittle id expectations here as well.
Apply this diff:
- assert_message_in_logs( + assert_message_in_logs_contains( logs, "gen_ai.assistant.message", { "tool_calls": [ { - "id": "call_Joct6NnIJFGGWfvMqZJPRB4C", + "id": ANY, "function": {"name": "tavily_search_results_json"}, "type": "function", } ] }, ) @@ - assert_message_in_logs(logs, "gen_ai.choice", choice_event) + assert_message_in_logs_contains(logs, "gen_ai.choice", choice_event)Also applies to: 228-241
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py(2 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_agents.py(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (3)
packages/opentelemetry-instrumentation-pinecone/tests/conftest.py (1)
instrument(44-46)packages/opentelemetry-instrumentation-mcp/tests/conftest.py (1)
tracer_provider(32-44)packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)
meter_provider(45-50)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
79-81: Formatting-only change — OKMulti-line call for OpenAIInstrumentor.instrument is functionally equivalent. No action needed.
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (1)
104-104: Confirm reduced log count is stable across providers/versionsThe expected log count changed to 8. If the intent is to assert “no extra logs”, this is fine. If providers or semconv evolve, consider asserting presence of required events instead of an exact count to avoid brittle failures.
| def before_record_request(request): | ||
| if hasattr(request, "body") and request.body: | ||
| import json | ||
|
|
||
| try: | ||
| if isinstance(request.body, (str, bytes)): | ||
| body_str = ( | ||
| request.body.decode("utf-8") | ||
| if isinstance(request.body, bytes) | ||
| else request.body | ||
| ) | ||
| body_data = json.loads(body_str) | ||
| if "api_key" in body_data: | ||
| body_data["api_key"] = "FILTERED" | ||
| request.body = json.dumps(body_data) | ||
| except (json.JSONDecodeError, UnicodeDecodeError, AttributeError): | ||
| pass | ||
| return request | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Preserve request.body type and redact nested JSON keys (avoid str/bytes regression)
Current logic may change a bytes body to str (Line 167), which can break VCR matching/serialization. It also redacts only top-level "api_key" and misses nested occurrences and other common variants.
Proposed fix: preserve original body type and perform recursive redaction with a small, conservative secret key set (case-insensitive). Optionally, also redact sensitive query params in the same hook.
Apply this diff:
def before_record_request(request):
- if hasattr(request, "body") and request.body:
- import json
-
- try:
- if isinstance(request.body, (str, bytes)):
- body_str = (
- request.body.decode("utf-8")
- if isinstance(request.body, bytes)
- else request.body
- )
- body_data = json.loads(body_str)
- if "api_key" in body_data:
- body_data["api_key"] = "FILTERED"
- request.body = json.dumps(body_data)
- except (json.JSONDecodeError, UnicodeDecodeError, AttributeError):
- pass
- return request
+ # Redact secrets in JSON body (preserve original type) and in query parameters.
+ if hasattr(request, "body") and request.body and isinstance(request.body, (str, bytes)):
+ import json
+ try:
+ orig_is_bytes = isinstance(request.body, bytes)
+ body_str = request.body.decode("utf-8") if orig_is_bytes else request.body
+ body_data = json.loads(body_str)
+
+ # Conservative, case-insensitive set of sensitive keys
+ secret_keys = {"api_key", "apikey", "api-key", "access_token"}
+
+ def _redact(obj):
+ if isinstance(obj, dict):
+ redacted = {}
+ for k, v in obj.items():
+ k_l = k.lower() if isinstance(k, str) else k
+ if isinstance(k_l, str) and k_l in secret_keys:
+ redacted[k] = "FILTERED"
+ else:
+ redacted[k] = _redact(v)
+ return redacted
+ if isinstance(obj, list):
+ return [_redact(i) for i in obj]
+ return obj
+
+ body_data = _redact(body_data)
+ new_body = json.dumps(body_data)
+ request.body = new_body.encode("utf-8") if orig_is_bytes else new_body
+ except (json.JSONDecodeError, UnicodeDecodeError, AttributeError, TypeError):
+ # Non-JSON or non-decodable bodies are left as-is
+ pass
+
+ # Redact secrets in query parameters as well
+ try:
+ from urllib.parse import urlsplit, urlunsplit, parse_qsl, urlencode
+ parts = urlsplit(request.uri)
+ qs = parse_qsl(parts.query, keep_blank_values=True)
+ redacted_qs = [(k, "FILTERED" if k.lower() in {"api_key", "apikey", "api-key", "access_token"} else v) for k, v in qs]
+ new_query = urlencode(redacted_qs)
+ request.uri = urlunsplit((parts.scheme, parts.netloc, parts.path, new_query, parts.fragment))
+ except Exception:
+ pass
+
+ return request📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def before_record_request(request): | |
| if hasattr(request, "body") and request.body: | |
| import json | |
| try: | |
| if isinstance(request.body, (str, bytes)): | |
| body_str = ( | |
| request.body.decode("utf-8") | |
| if isinstance(request.body, bytes) | |
| else request.body | |
| ) | |
| body_data = json.loads(body_str) | |
| if "api_key" in body_data: | |
| body_data["api_key"] = "FILTERED" | |
| request.body = json.dumps(body_data) | |
| except (json.JSONDecodeError, UnicodeDecodeError, AttributeError): | |
| pass | |
| return request | |
| def before_record_request(request): | |
| # Redact secrets in JSON body (preserve original type) and in query parameters. | |
| if hasattr(request, "body") and request.body and isinstance(request.body, (str, bytes)): | |
| import json | |
| try: | |
| orig_is_bytes = isinstance(request.body, bytes) | |
| body_str = request.body.decode("utf-8") if orig_is_bytes else request.body | |
| body_data = json.loads(body_str) | |
| # Conservative, case-insensitive set of sensitive keys | |
| secret_keys = {"api_key", "apikey", "api-key", "access_token"} | |
| def _redact(obj): | |
| if isinstance(obj, dict): | |
| redacted = {} | |
| for k, v in obj.items(): | |
| k_l = k.lower() if isinstance(k, str) else k | |
| if isinstance(k_l, str) and k_l in secret_keys: | |
| redacted[k] = "FILTERED" | |
| else: | |
| redacted[k] = _redact(v) | |
| return redacted | |
| if isinstance(obj, list): | |
| return [_redact(i) for i in obj] | |
| return obj | |
| body_data = _redact(body_data) | |
| new_body = json.dumps(body_data) | |
| request.body = new_body.encode("utf-8") if orig_is_bytes else new_body | |
| except (json.JSONDecodeError, UnicodeDecodeError, AttributeError, TypeError): | |
| # Non-JSON or non-decodable bodies are left as-is | |
| pass | |
| # Redact secrets in query parameters as well | |
| try: | |
| from urllib.parse import urlsplit, urlunsplit, parse_qsl, urlencode | |
| parts = urlsplit(request.uri) | |
| qs = parse_qsl(parts.query, keep_blank_values=True) | |
| redacted_qs = [ | |
| (k, "FILTERED" if k.lower() in {"api_key", "apikey", "api-key", "access_token"} else v) | |
| for k, v in qs | |
| ] | |
| new_query = urlencode(redacted_qs) | |
| request.uri = urlunsplit((parts.scheme, parts.netloc, parts.path, new_query, parts.fragment)) | |
| except Exception: | |
| pass | |
| return request |
🤖 Prompt for AI Agents
In packages/opentelemetry-instrumentation-langchain/tests/conftest.py around
lines 153 to 171, the before_record_request hook currently decodes bytes to str
and reassigns request.body as a str (breaking VCR matching/serialization) and
only redacts a top-level "api_key"; instead, retain the original body type (if
bytes, operate on a bytes-compatible copy and re-encode before assigning back),
parse JSON safely, walk the parsed structure recursively to redact any keys
matching a small case-insensitive set of sensitive names (e.g., api_key, apikey,
secret, token), and when done serialize back to the same type as the original
body; additionally, inspect and redact matching query params on request.url
using the same key set, and ensure all exceptions (JSON decode, Unicode) are
caught and leave request unchanged on error.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py(2 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_agents.py(5 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opentelemetry-instrumentation-langchain/tests/test_agents.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
- GitHub Check: Lint
🔇 Additional comments (6)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2)
79-81: Minor formatting improvement - good for readability.The multi-line formatting of the
OpenAIInstrumentor.instrument()call improves code readability.
153-176: Robust implementation for filtering sensitive data in VCR cassettes.The
before_record_requestfunction properly handles various edge cases:
- Handles both string and bytes request bodies
- Includes proper error handling for JSON decode errors
- Uses safe fallback on exceptions
- Filters API keys from JSON request bodies
The addition of explicit
match_onconfiguration will ensure more predictable cassette matching behavior.packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (4)
24-28: Minor formatting improvement with trailing comma.The addition of trailing commas and consistent indentation improves code maintainability.
38-41: Improved test assertion formatting.The single-line set comparison is more readable and concise.
85-89: Consistent formatting applied to async test.Same formatting improvements as the sync version for consistency.
99-101: Consistent assertion formatting in async test.Matches the formatting improvement made in the sync version.
| @pytest.mark.vcr | ||
| def test_langgraph_github_issue_3203_exact_reproduction( | ||
| instrument_legacy, span_exporter, tracer_provider | ||
| ): | ||
| """Test that exactly reproduces the GitHub issue #3203 with the exact same code structure.""" | ||
| from opentelemetry import trace | ||
| import asyncio | ||
| import httpx | ||
| from langgraph.graph import END, START, StateGraph | ||
|
|
||
| trace.set_tracer_provider(tracer_provider) | ||
| tracer = trace.get_tracer(__name__) | ||
|
|
||
| class TestAgentState(TypedDict): | ||
| http_result: str | ||
| span_result: str | ||
| messages: list | ||
|
|
||
| async def http_call_node(state: TestAgentState) -> dict: | ||
| try: | ||
| data = {"a": 10, "b": 25} | ||
| async with httpx.AsyncClient() as _: | ||
| with tracer.start_as_current_span("POST") as span: | ||
| span.set_attribute("http.method", "POST") | ||
| span.set_attribute("http.url", "https://httpbin.org/post") | ||
| sum_result = data.get("a", 0) + data.get("b", 0) | ||
| http_result = f"HTTP call successful! Sum of {data.get('a')} + {data.get('b')} = {sum_result}" | ||
|
|
||
| span.set_attribute("http.response.status_code", 200) | ||
| span.set_attribute("calculation.result", sum_result) | ||
|
|
||
| except Exception as e: | ||
| http_result = f"HTTP call error: {str(e)}" | ||
|
|
||
| return {"http_result": http_result} | ||
|
|
||
| async def opentelemetry_span_node(state: TestAgentState) -> dict: | ||
| with tracer.start_as_current_span("test_agent_span") as span: | ||
| span.set_attribute("node.name", "opentelemetry_span_node") | ||
| span.set_attribute("agent.type", "test_agent") | ||
| span.set_attribute("operation.type", "span_creation") | ||
|
|
||
| span.add_event("Starting span processing") | ||
|
|
||
| await asyncio.sleep(0.01) | ||
|
|
||
| http_result = state.get("http_result", "No HTTP result available") | ||
| span.set_attribute("previous.http_result", http_result) | ||
|
|
||
| span.add_event("Processing HTTP result from previous node") | ||
|
|
||
| span_result = f"OpenTelemetry span created successfully! Span ID: {span.get_span_context().span_id}" | ||
|
|
||
| span.add_event("Span processing completed") | ||
| span.set_attribute("processing.status", "completed") | ||
|
|
||
| return {"span_result": span_result} | ||
|
|
||
| def create_test_agent(): | ||
| """Create a simple LangGraph agent with 2 nodes matching the GitHub issue exactly.""" | ||
| builder = StateGraph(TestAgentState) | ||
|
|
||
| builder.add_node("http_call", http_call_node) | ||
| builder.add_node("otel_span", opentelemetry_span_node) | ||
|
|
||
| builder.add_edge(START, "http_call") | ||
| builder.add_edge("http_call", "otel_span") | ||
| builder.add_edge("otel_span", END) | ||
|
|
||
| agent = builder.compile() | ||
| return agent | ||
|
|
||
| async def run_test_agent(): | ||
| with tracer.start_as_current_span("test_agent_execution_root") as root_span: | ||
| root_span.set_attribute("agent.name", "test_agent") | ||
| root_span.set_attribute("agent.version", "1.0.0") | ||
| root_span.set_attribute("execution.type", "full_agent_run") | ||
|
|
||
| root_span.add_event("Agent execution started") | ||
|
|
||
| try: | ||
| root_span.add_event("Creating agent graph") | ||
| agent = create_test_agent() | ||
| root_span.set_attribute("agent.nodes_count", 2) | ||
|
|
||
| initial_state = {"http_result": "", "span_result": "", "messages": []} | ||
| root_span.add_event("Initial state prepared") | ||
|
|
||
| root_span.add_event("Starting agent invocation") | ||
| final_state = await agent.ainvoke(initial_state) | ||
|
|
||
| root_span.set_attribute("execution.status", "completed") | ||
| return final_state | ||
|
|
||
| except Exception as e: | ||
| root_span.set_attribute("execution.status", "failed") | ||
| root_span.set_attribute("error.type", type(e).__name__) | ||
| root_span.set_attribute("error.message", str(e)) | ||
| root_span.add_event("Agent execution failed", {"error": str(e)}) | ||
| raise | ||
|
|
||
| final_state = asyncio.run(run_test_agent()) | ||
|
|
||
| assert "http_result" in final_state | ||
| assert "span_result" in final_state | ||
| assert "Sum of 10 + 25 = 35" in final_state["http_result"] | ||
|
|
||
| spans = span_exporter.get_finished_spans() | ||
| span_names = [span.name for span in spans] | ||
|
|
||
| print(f"\nCaptured {len(spans)} spans:") | ||
| for span in spans: | ||
| parent_name = "None" | ||
| if span.parent: | ||
| parent_span = next( | ||
| (s for s in spans if s.context.span_id == span.parent.span_id), None | ||
| ) | ||
| if parent_span: | ||
| parent_name = parent_span.name | ||
| else: | ||
| parent_name = f"Unknown({span.parent.span_id})" | ||
| print(f" - {span.name} (parent: {parent_name})") | ||
|
|
||
| assert "test_agent_execution_root" in span_names | ||
| assert "POST" in span_names | ||
| assert "test_agent_span" in span_names | ||
| assert "http_call.task" in span_names | ||
| assert "otel_span.task" in span_names | ||
| assert "LangGraph.workflow" in span_names | ||
|
|
||
| root_span = next(span for span in spans if span.name == "test_agent_execution_root") | ||
| post_span = next(span for span in spans if span.name == "POST") | ||
| test_agent_span = next(span for span in spans if span.name == "test_agent_span") | ||
| http_call_task_span = next(span for span in spans if span.name == "http_call.task") | ||
| otel_span_task_span = next(span for span in spans if span.name == "otel_span.task") | ||
| workflow_span = next(span for span in spans if span.name == "LangGraph.workflow") | ||
|
|
||
| print("\nHierarchy check:") | ||
| print(f"POST parent: {post_span.parent.span_id if post_span.parent else 'None'}") | ||
| print(f"http_call.task ID: {http_call_task_span.context.span_id}") | ||
| print( | ||
| f"test_agent_span parent: {test_agent_span.parent.span_id if test_agent_span.parent else 'None'}" | ||
| ) | ||
| print(f"otel_span.task ID: {otel_span_task_span.context.span_id}") | ||
|
|
||
| assert ( | ||
| post_span.parent.span_id == http_call_task_span.context.span_id | ||
| ), "POST span should be child of http_call.task span" | ||
| assert ( | ||
| test_agent_span.parent.span_id == otel_span_task_span.context.span_id | ||
| ), "test_agent_span should be child of otel_span.task span" | ||
|
|
||
| assert http_call_task_span.parent.span_id == workflow_span.context.span_id | ||
| assert otel_span_task_span.parent.span_id == workflow_span.context.span_id | ||
| assert workflow_span.parent.span_id == root_span.context.span_id |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Comprehensive test reproduction for GitHub issue #3203.
This is an excellent reproduction test that validates the span hierarchy fix. The test:
- Creates a realistic scenario with HTTP calls and OpenTelemetry spans within LangGraph nodes
- Validates the complete span hierarchy ensuring proper parent-child relationships
- Includes detailed assertions for span nesting and context propagation
- Provides debugging output with span hierarchy visualization
The test structure effectively exercises the callback handler fixes for async context handling and span attachment.
However, there are a few considerations:
Let me verify if the async HTTP client usage could be simplified since it's not actually making real HTTP calls:
🏁 Script executed:
#!/bin/bash
# Check if httpx is being used elsewhere in the codebase and if it's a dependency
echo "Checking httpx usage in the codebase:"
rg -l "httpx" --type py
echo -e "\nChecking if httpx is declared as a dependency:"
fd -t f -e txt -e cfg -e toml -e yaml -e yml -e json | xargs grep -l "httpx" 2>/dev/null || echo "No httpx dependency declarations found"Length of output: 6561
Add httpx to the project’s test/dev dependencies
The tracing tests (e.g., packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py) rely on httpx under @pytest.mark.vcr to record and replay HTTP interactions. Although the client isn’t making “real” calls at runtime, it’s essential for the VCR setup and is used across many instrumentation tests.
• Confirm and update your dependency declarations (e.g., in pyproject.toml, setup.cfg/setup.py, or requirements-dev.txt) to include:
[tool.poetry.dev-dependencies]
httpx = "^YOUR_SUPPORTED_VERSION"—or—
[dev-packages]
httpx = version = "*"—or—
# requirements-dev.txt
httpx>=x.y.z
• You can ignore the suggestion to “simplify” the HTTPX usage here, as the real calls are captured/replayed via VCR.
🤖 Prompt for AI Agents
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py lines
188-342: the test uses httpx under pytest.mark.vcr but httpx is not declared in
the project test/dev dependencies, causing test runs to fail in CI; add httpx to
the repository's test/dev dependency declarations (pyproject.toml under
[tool.poetry.dev-dependencies], setup.cfg/setup.py extras or
requirements-dev.txt) with a supported version range consistent with other
tests, update lockfile or run your dependency manager (poetry lock/install or
pip install -r requirements-dev.txt) and re-run tests to ensure httpx is
available for VCR playback.
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 5bd2330 in 2 minutes and 25 seconds. Click for details.
- Reviewed
13lines of code in1files - Skipped
0files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:172
- Draft comment:
Removed 'body' from match_on. Ensure that request bodies are not needed for VCR matching in tests. - Reason this comment was not posted:
Comment looked like it was already resolved.
2. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:173
- Draft comment:
Removed filter_query_parameters for API keys. Confirm that query parameters do not leak sensitive info. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test configuration file, not production code. The code is specifically about VCR test recording configuration. The removal of filter_query_parameters suggests the author intentionally wants to record query parameters in tests. Security in test recordings is important, but the author has already demonstrated security awareness through other filters. The comment raises a valid security concern. Sensitive data in query parameters could potentially be recorded in tests. However, this is a test configuration where the author has already shown careful consideration of security by filtering headers and body content. The change appears intentional for test purposes. The comment should be deleted as it asks for confirmation of intention ("Confirm that...") which violates our rules, and it's speculative about potential issues rather than pointing out a definite problem.
Workflow ID: wflow_cBifz1dXyjBr0oE8
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)
188-342: Comprehensive test reproduction for GitHub issue #3203.This test effectively validates the span hierarchy fix by:
- Creating realistic scenarios with HTTP calls and OpenTelemetry spans within LangGraph nodes
- Validating the complete span hierarchy with proper parent-child relationships
- Including debugging output for span hierarchy visualization
The test structure properly exercises the callback handler fixes for async context handling and span attachment.
Verify that httpx is declared in the project's test/dev dependencies:
#!/bin/bash # Check if httpx is declared in dependency files echo "Checking for httpx in dependency declarations:" fd -t f "(pyproject.toml|setup.py|setup.cfg|requirements.*\.txt)" | xargs grep -l "httpx" 2>/dev/null # Check the specific package's dependencies echo -e "\nChecking package-specific dependencies:" cat packages/opentelemetry-instrumentation-langchain/pyproject.toml 2>/dev/null | grep -A 10 "\[tool.poetry.dev-dependencies\]" cat packages/opentelemetry-instrumentation-langchain/requirements-dev.txt 2>/dev/null | grep "httpx"
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
190-195: Consider usingcontextlib.suppressfor cleaner exception handling.While the current implementation is functionally correct and well-documented, using
contextlib.suppresswould be more idiomatic.+from contextlib import suppress + def _end_span(self, span: Span, run_id: UUID) -> None: for child_id in self.spans[run_id].children: if child_id in self.spans: child_span = self.spans[child_id].span if child_span.end_time is None: # avoid warning on ended spans child_span.end() span.end() token = self.spans[run_id].token if token: - try: - context_api.detach(token) - except ValueError: - # Context detach can fail in async scenarios when tokens are created in different contexts - # This is expected behavior and doesn't affect the correct span hierarchy - pass + # Context detach can fail in async scenarios when tokens are created in different contexts + # This is expected behavior and doesn't affect the correct span hierarchy + with suppress(ValueError): + context_api.detach(token)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py(2 hunks)packages/opentelemetry-instrumentation-langchain/tests/conftest.py(2 hunks)packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/opentelemetry-instrumentation-langchain/tests/conftest.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)
packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (5)
TestAgentState(45-48)http_call_node(50-72)opentelemetry_span_node(74-99)create_test_agent(101-115)run_test_agent(117-148)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
190-195: Use contextlib.suppress(ValueError) instead of try-except-pass
(SIM105)
🔇 Additional comments (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
236-236: Good fix for span context attachment.Making the token attachment unconditional ensures consistent span context propagation in both sync and async scenarios, which is critical for maintaining proper span hierarchy in LangGraph flows.
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (2)
38-40: Formatting improvement for readability.The set comprehension is now more readable on a single line.
Also applies to: 99-101
71-72: Good: Enabling async LangGraph test.Removing the
@pytest.mark.xfaildecorator enables this important async test to validate that the context propagation fixes work correctly in asynchronous scenarios.
Important
Fixes LangGraph span hierarchy issue and adds tests for validation, with updates to test configurations for security and accuracy.
callback_handler.py.langgraph_with_otel_sdk.pyto demonstrate the issue and fix.test_langgraph.pyand adds a new test for span hierarchy validation.test_agents.pyandtest_langgraph.pyto reflect changes in span IDs and log counts.conftest.py.This description was created by
for 5bd2330. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
Bug Fixes
Tests
Chores