Skip to content

fix(langchain): fix nesting of langgraph spans#3206

Merged
nirga merged 10 commits intomainfrom
gk/fix-langgraph-nesting-issue
Aug 12, 2025
Merged

fix(langchain): fix nesting of langgraph spans#3206
nirga merged 10 commits intomainfrom
gk/fix-langgraph-nesting-issue

Conversation

@galkleinman
Copy link
Copy Markdown
Contributor

@galkleinman galkleinman commented Jul 31, 2025

Important

Fixes LangGraph span hierarchy issue and adds tests for validation, with updates to test configurations for security and accuracy.

  • Behavior:
    • Fixes span hierarchy issue in LangGraph by ensuring consistent context attachment and guarded detach in callback_handler.py.
    • Adds a reproducible tracing script in langgraph_with_otel_sdk.py to demonstrate the issue and fix.
  • Tests:
    • Enables async LangGraph tracing test in test_langgraph.py and adds a new test for span hierarchy validation.
    • Updates test fixtures in test_agents.py and test_langgraph.py to reflect changes in span IDs and log counts.
    • Strengthens VCR matching and adds request pre-processing to redact API keys in conftest.py.
  • Misc:
    • Minor formatting cleanups in test cassettes.

This description was created by Ellipsis for 5bd2330. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • Bug Fixes

    • Made span cleanup resilient in async callback flows (guarded detach) and ensured consistent context attachment to preserve span hierarchy.
  • Tests

    • Enabled async LangGraph tracing test and added a reproduction test that validates spans, hierarchy, and end-to-end results.
    • Updated test cassettes/fixtures, adjusted expected IDs and log counts, and improved formatting.
    • Strengthened VCR handling and added pre-record request redaction of API keys.
  • Chores

    • Added a reproducible tracing script to run and visualize tracing behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jul 31, 2025

Walkthrough

Adds a LangGraph + OpenTelemetry reproduction script and test; hardens LangChain callback context attach/detach for async scenarios; updates VCR redaction and recorded test cassettes; adjusts several test expectations and tool_call IDs.

Changes

Cohort / File(s) Summary
Reproduction script
packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py
New script that wires OpenTelemetry (InMemorySpanExporter, TracerProvider) with a minimal two-node LangGraph (http_call → otel_span). Adds setup_tracing() and run_github_issue_reproduction() to run the async workflow, collect/display spans, and uninstrument on exit.
Callback handler (context management)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
_create_span now always attaches a context token via context_api.attach(set_span_in_context(span)). _end_span now wraps context_api.detach(token) in try/except ValueError to avoid detach failures across async/mismatched contexts. No public API signature changes.
LangGraph tests & new reproduction test
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py
Formatting tweaks, removal of @pytest.mark.xfail on async test, and addition of test_nesting_of_langgraph_spans(instrument_legacy, span_exporter, tracer_provider) which builds/runs a two-node LangGraph under OTel and asserts spans, hierarchy, and outputs.
Test fixtures (cassettes) updates
packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml
Recorded HTTP interactions and OpenAI streaming payloads refreshed: API versions, headers, timestamps, request/response bodies, and streaming token sequence updated to newer recordings. No application logic changes.
VCR config change + request redaction
packages/opentelemetry-instrumentation-langchain/tests/conftest.py
vcr_config() updated to remove filter_body, set match_on to ["method","scheme","host","port","path","query"], and add before_record_request to redact "api_key" from JSON request bodies; minor reformat of instrumentor invocation.
Agent tests expectations
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py
Adjusted expected log counts and updated several tool_calls IDs in tests to match updated fixtures/recordings.

Sequence Diagram(s)

sequenceDiagram
  participant Runner as Test/Script
  participant OTel as OpenTelemetry SDK
  participant Instr as LangChain Instrumentor
  participant Graph as LangGraph Agent
  participant HTTP as httpx.AsyncClient
  participant Exporter as InMemorySpanExporter

  Runner->>OTel: setup_tracing()
  OTel->>Instr: instrument(LangChain)
  Runner->>OTel: start root span (test_agent_execution_root)
  Runner->>Graph: ainvoke(initial_state)
  Graph->>OTel: start span (LangGraph.workflow)
  Graph->>Graph: execute node http_call.task
  Graph->>OTel: start span (http_call.task)
  Graph->>HTTP: POST /sum
  HTTP-->>Graph: response (35)
  Graph->>OTel: end span (http_call.task)
  Graph->>Graph: execute node otel_span.task
  Graph->>OTel: start span (test_agent_span)
  Graph->>OTel: end span (test_agent_span)
  Graph->>OTel: end span (LangGraph.workflow)
  Runner->>OTel: end root span
  OTel->>Exporter: export spans
  Runner->>Instr: uninstrument()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Suggested reviewers

  • doronkopit5
  • nirga

Poem

A rabbit hops through spans and threads,
I bind and unbind where context treads.
Two nodes trace a tiny row of light, 🥕
I count the hops and keep them tight.
Little paws, big traces — what a sight.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 293b501 and 5bd2330.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-langchain/tests/conftest.py
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gk/fix-langgraph-nesting-issue

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@galkleinman galkleinman marked this pull request as ready for review August 12, 2025 13:07
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 5ff9bbf in 1 minute and 51 seconds. Click for details.
  • Reviewed 507 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py:55
  • Draft comment:
    Consider using a descriptive variable name instead of '_' for the AsyncClient, to improve code clarity.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% 1. The _ is intentionally used here to indicate the variable is unused. 2. This follows Python conventions where _ indicates a throwaway variable. 3. Using a descriptive name for an unused variable could be misleading since it suggests the variable has a purpose. 4. The code is actually clearer with _ since it signals to readers that this variable isn't used. Maybe having a named client would make debugging easier if there are connection issues? Maybe the code will need to use the client in the future? The code is a test reproduction case demonstrating a specific issue - it's not production code that needs to handle real HTTP calls or be maintained long term. The _ is actually more appropriate for this context. The comment should be deleted. Using _ is the correct Python idiom here since the variable is intentionally unused, and a descriptive name would make the code less clear, not more clear.
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:55
  • Draft comment:
    Explicitly setting the global tracer provider (line 55–59) ensures that spans created by user code are properly nested. This is a solid improvement.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the change is a solid improvement, which violates the rule against making purely informative comments.
3. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:189
  • Draft comment:
    Wrapping context_api.detach(token) in a try/except block to handle ValueError in asynchronous scenarios is a good and necessary fix.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the change is good and necessary, which violates the rule against making purely informative comments.
4. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:227
  • Draft comment:
    Changing the conditional span context attachment to unconditional (removing the check for async callbacks) simplifies management. Verify that this change does not inadvertently affect synchronous callback behavior.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py:292
  • Draft comment:
    The tests, including the exact reproduction of GitHub Issue #3203, thoroughly assert span names and parent-child relationships. They provide good coverage to validate the corrected span nesting.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, providing details about the tests and their coverage. It does not suggest any changes or ask for confirmation on specific code behavior.

Workflow ID: wflow_TaEhtkkh8QGGmEE7

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 15476d1 in 46 seconds. Click for details.
  • Reviewed 27 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:50
  • Draft comment:
    Removed global tracer provider setting. Please document the rationale behind removing the global setting, as it could affect span nesting behavior.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking for documentation of the rationale behind a change, which is similar to asking for an update to the PR description. This violates the rule against asking the PR author to update the PR description or provide explanations.
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:29
  • Draft comment:
    Minor formatting change: extra space added in the _instruments tuple. Confirm if this is intentional.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_yksaE0uDHchF83Hz

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

190-195: Consider using contextlib.suppress for cleaner exception handling

While the current implementation is functionally correct, using contextlib.suppress would be more idiomatic and concise for ignoring exceptions.

+from contextlib import suppress

Then update the exception handling:

-            try:
-                context_api.detach(token)
-            except ValueError:
-                # Context detach can fail in async scenarios when tokens are created in different contexts
-                # This is expected behavior and doesn't affect the correct span hierarchy
-                pass
+            # Context detach can fail in async scenarios when tokens are created in different contexts
+            # This is expected behavior and doesn't affect the correct span hierarchy
+            with suppress(ValueError):
+                context_api.detach(token)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

297-309: Consider extracting span debugging logic to a test utility

The span hierarchy printing logic (lines 297-309) could be useful for debugging other tests. Consider extracting it to a shared test utility function.

Would you like me to create a utility function for debugging span hierarchies that could be reused across tests?

packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (1)

55-56: Unused httpx.AsyncClient context manager

The code creates an httpx.AsyncClient context manager but doesn't actually use it (the variable is named _). Since this is meant to simulate an HTTP call, either use the client or add a comment explaining why it's not used.

-            async with httpx.AsyncClient() as _:
+            # Simulating HTTP call context without actual network request
+            async with httpx.AsyncClient() as _:

Or if you want to make it more realistic:

-            async with httpx.AsyncClient() as _:
+            async with httpx.AsyncClient() as client:
+                # In a real scenario, we would make an actual HTTP call:
+                # response = await client.post("https://httpbin.org/post", json=data)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba68225 and 644435f.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (2 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py (1)
  • LangchainInstrumentor (32-185)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)
  • export (45-51)
  • InMemorySpanExporter (22-61)
  • get_finished_spans (40-43)
packages/opentelemetry-instrumentation-mcp/tests/conftest.py (2)
  • tracer_provider (32-44)
  • tracer (48-49)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (3)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
  • instrument_legacy (77-93)
packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (5)
  • TestAgentState (45-48)
  • http_call_node (50-72)
  • opentelemetry_span_node (74-99)
  • create_test_agent (101-115)
  • run_test_agent (117-148)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
  • get_finished_spans (40-43)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py

190-195: Use contextlib.suppress(ValueError) instead of try-except-pass

(SIM105)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Lint
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

236-236: Good fix for async context propagation

The unconditional attachment of the span context ensures that a token is always available for later detachment, which is critical for maintaining proper span hierarchy in async scenarios. This change, combined with the error handling in _end_span, properly addresses the GitHub issue #3203.

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

188-343: Comprehensive test coverage for the GitHub issue reproduction

The test thoroughly reproduces GitHub issue #3203 with:

  • Proper async execution flow with two LangGraph nodes
  • Correct verification of span hierarchy and parent-child relationships
  • Detailed assertions for all expected spans
  • Good debugging output for troubleshooting

This test will effectively prevent regression of the span nesting issue.

Comment on lines +1 to +19
#!/usr/bin/env python3
"""
Reproduce the exact GitHub issue #3203 and show the trace waterfall.
This demonstrates the span hierarchy issue before and after the fix.
"""

import asyncio
import sys
from typing import TypedDict
import httpx
from langgraph.graph import END, START, StateGraph
from opentelemetry import trace
from opentelemetry.instrumentation.langchain import LangchainInstrumentor
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter

from waterfall_visualizer import visualize_trace_waterfall, print_raw_span_data

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Missing import for waterfall_visualizer module

The script imports waterfall_visualizer (line 18) but this module doesn't appear to be part of the package. This will cause an ImportError when running the script.


🏁 Script executed:

#!/bin/bash
# Check if waterfall_visualizer module exists in the repository
fd -t f "waterfall_visualizer.py" 

# Check if it's imported elsewhere in the codebase
rg "from waterfall_visualizer import" -A 2

# Check if there's any reference to this module
rg "waterfall_visualizer" --type py

Length of output: 580


Add the missing waterfall_visualizer module or correct the import

The script at packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (line 18) imports waterfall_visualizer, but no such file exists in the repository. This will raise an ImportError at runtime.

To resolve:

  • Include waterfall_visualizer.py in the package (e.g., under packages/opentelemetry-instrumentation-langchain/), defining visualize_trace_waterfall and print_raw_span_data.
  • Or, if it belongs in a different module, update the import path and add the dependency to setup.py or pyproject.toml.
🤖 Prompt for AI Agents
In packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py
around lines 1 to 19, the script imports waterfall_visualizer which does not
exist and will cause ImportError; either add a new file
packages/opentelemetry-instrumentation-langchain/waterfall_visualizer.py that
defines visualize_trace_waterfall and print_raw_span_data and ensure it is
included in the package manifest, or update the import to the correct existing
module path and add the dependency to setup.py/pyproject.toml so the module is
available at runtime.

Comment on lines +179 to +182
print("\n🎉 GitHub Issue #3203 has been FIXED!")
print("The visualization above shows the CORRECTED span hierarchy.")
print("Note how POST and test_agent_span are now properly nested")
print("under their respective task spans (http_call.task and otel_span.task)!")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Misleading success message before fix verification

The script prints "GitHub Issue #3203 has been FIXED!" and describes the corrected hierarchy without actually verifying whether the fix is in place. This could be confusing if someone runs this script on a version without the fix.

Consider dynamically checking the span hierarchy to determine if the issue is fixed:

-        print("\n🎉 GitHub Issue #3203 has been FIXED!")
-        print("The visualization above shows the CORRECTED span hierarchy.")
-        print("Note how POST and test_agent_span are now properly nested")
-        print("under their respective task spans (http_call.task and otel_span.task)!")
+        # Verify if the hierarchy is correct
+        post_span = next((s for s in spans if s.name == "POST"), None)
+        http_call_task = next((s for s in spans if s.name == "http_call.task"), None)
+        
+        if post_span and http_call_task and post_span.parent:
+            if post_span.parent.span_id == http_call_task.context.span_id:
+                print("\n🎉 GitHub Issue #3203 has been FIXED!")
+                print("The visualization above shows the CORRECTED span hierarchy.")
+                print("Note how POST and test_agent_span are now properly nested")
+                print("under their respective task spans (http_call.task and otel_span.task)!")
+            else:
+                print("\n⚠️ GitHub Issue #3203 is still present!")
+                print("The spans are not properly nested under their task spans.")
+        else:
+            print("\n⚠️ Unable to verify span hierarchy - expected spans not found")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("\n🎉 GitHub Issue #3203 has been FIXED!")
print("The visualization above shows the CORRECTED span hierarchy.")
print("Note how POST and test_agent_span are now properly nested")
print("under their respective task spans (http_call.task and otel_span.task)!")
# Verify if the hierarchy is correct
post_span = next((s for s in spans if s.name == "POST"), None)
http_call_task = next((s for s in spans if s.name == "http_call.task"), None)
if post_span and http_call_task and post_span.parent:
if post_span.parent.span_id == http_call_task.context.span_id:
print("\n🎉 GitHub Issue #3203 has been FIXED!")
print("The visualization above shows the CORRECTED span hierarchy.")
print("Note how POST and test_agent_span are now properly nested")
print("under their respective task spans (http_call.task and otel_span.task)!")
else:
print("\n⚠️ GitHub Issue #3203 is still present!")
print("The spans are not properly nested under their task spans.")
else:
print("\n⚠️ Unable to verify span hierarchy - expected spans not found")

@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented Aug 12, 2025

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed ec1b5b6 in 2 minutes and 29 seconds. Click for details.
  • Reviewed 1149 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:150
  • Draft comment:
    Explicit 'match_on' keys added to vcr_config improve request matching. Confirm these keys cover all necessary URI components.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:122
  • Draft comment:
    The tool call ID has been updated to 'call_RQXCf1bdiBiFrtwfOTP3GCCd'. Ensure this change aligns with the underlying agent implementation.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that the change aligns with the underlying agent implementation, which is against the rules. It does not provide a specific suggestion or ask for a specific test to be written.
3. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:220
  • Draft comment:
    The tool call ID in the no-content scenario is now 'call_Joct6NnIJFGGWfvMqZJPRB4C'. Verify that this updated ID aligns with expected behavior.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify if the updated tool call ID aligns with expected behavior. This falls under asking the author to confirm their intention or ensure behavior, which is against the rules.
4. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:104
  • Draft comment:
    Log exporter assertions now expect 8 logs. Confirm that this new count accurately reflects the improved filtering/consolidation.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
5. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:202
  • Draft comment:
    The assertion that all logs have GenAIAttributes.GEN_AI_SYSTEM equal to 'langchain' is helpful. Ensure this remains valid for future behavior.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml:19
  • Draft comment:
    Typographical: There appears to be an extra stray apostrophe on this line that may be unintentional.
  • Reason this comment was not posted:
    Comment was on unchanged code.
7. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml:82
  • Draft comment:
    The JSON string for the "template" property now starts with "are a helpful assistant" instead of "You are a helpful assistant". This results in an incomplete sentence. Please verify if the leading "You" was inadvertently removed.
  • Reason this comment was not posted:
    Comment was on unchanged code.
8. packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml:256
  • Draft comment:
    Typo alert: The query string "OpenLLMetry" appears to have an extra 'L'. If the intent was to refer to OpenTelemetry, please update it accordingly.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_f2BGg6t3wtktllTA

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (2)

122-129: Don’t hardcode tool_call IDs; derive them from the logs

Tool call IDs are provider-generated and will change when cassettes are re-recorded. Hardcoding them forces frequent fixture updates. Extract the ID from the assistant message event and reuse it in subsequent assertions.

Apply this diff:

@@ def test_agents_with_events_with_content(
-    # Validate that the assistant message Event exists
+    # Extract tool_call id from assistant message for stable assertions
+    tool_call_id = None
+    for log in logs:
+        if log.log_record.attributes.get(EventAttributes.EVENT_NAME) == "gen_ai.assistant.message":
+            body = dict(log.log_record.body)
+            calls = body.get("tool_calls") or []
+            if calls:
+                tool_call_id = calls[0].get("id")
+                break
+    assert tool_call_id, "assistant tool_call id not found in logs"
+
+    # Validate that the assistant message Event exists
     assert_message_in_logs(
         logs,
         "gen_ai.assistant.message",
         {
             "content": "",
             "tool_calls": [
                 {
-                    "id": "call_RQXCf1bdiBiFrtwfOTP3GCCd",
+                    "id": tool_call_id,
                     "function": {
                         "name": "tavily_search_results_json",
                         "arguments": {"query": "OpenLLMetry"},
                     },
                     "type": "function",
                 }
             ],
         },
     )
@@
     choice_event = {
         "index": 0,
         "finish_reason": "tool_calls",
         "message": {"content": ""},
         "tool_calls": [
             {
-                "id": "call_RQXCf1bdiBiFrtwfOTP3GCCd",
+                "id": tool_call_id,
                 "function": {
                     "name": "tavily_search_results_json",
                     "arguments": {"query": "OpenLLMetry"},
                 },
                 "type": "function",
             }
         ],
     }

Repeat the same pattern for the no_content test (see next comment).

Also applies to: 140-148


220-226: Mirror dynamic tool_call ID extraction in the no-content test

Avoid hardcoding the streaming tool_call id here as well; reuse the id derived from the assistant.message log.

Apply this diff:

@@ def test_agents_with_events_with_no_content(
-    # Validate that the assistant message Event exists
+    # Extract tool_call id from assistant message for stable assertions
+    tool_call_id = None
+    for log in logs:
+        if log.log_record.attributes.get(EventAttributes.EVENT_NAME) == "gen_ai.assistant.message":
+            body = dict(log.log_record.body)
+            calls = body.get("tool_calls") or []
+            if calls:
+                tool_call_id = calls[0].get("id")
+                break
+    assert tool_call_id, "assistant tool_call id not found in logs"
+
+    # Validate that the assistant message Event exists
     assert_message_in_logs(
         logs,
         "gen_ai.assistant.message",
         {
             "tool_calls": [
                 {
-                    "id": "call_Joct6NnIJFGGWfvMqZJPRB4C",
+                    "id": tool_call_id,
                     "function": {"name": "tavily_search_results_json"},
                     "type": "function",
                 }
             ]
         },
     )
@@
     choice_event = {
         "index": 0,
         "finish_reason": "tool_calls",
         "message": {},
         "tool_calls": [
             {
-                "id": "call_Joct6NnIJFGGWfvMqZJPRB4C",
+                "id": tool_call_id,
                 "function": {"name": "tavily_search_results_json"},
                 "type": "function",
             }
         ],
     }

Also applies to: 235-240

🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)

149-156: Avoid weakening HTTP cassette matching by omitting the request body

Adding match_on without "body" increases the risk of mismatching multiple requests to the same endpoint (e.g., two POSTs to /v1/chat/completions in a single test). This can hide regressions and cause flaky replays across tests that hit identical routes with different payloads.

Consider including "body" back in the matchers and scrubbing dynamic fields instead. Also scrub sensitive data in JSON bodies to prevent leaks (see separate comment on leaked API key). Example:

 def vcr_config():
-    return {
-        "filter_headers": ["authorization", "x-api-key"],
-        "filter_body": ["api_key"],
-        "match_on": ["method", "scheme", "host", "port", "path", "query"],
-    }
+    return {
+        "filter_headers": ["authorization", "x-api-key", "cookie", "set-cookie"],
+        # If you use pytest-recording, prefer JSON scrubbing via before_record_request (below)
+        "match_on": ["method", "scheme", "host", "port", "path", "query", "body"],
+        "before_record_request": _scrub_request_body,
+        "before_record_response": _scrub_response_headers,
+    }

Add these helpers (outside this function):

# Add at the top of the file:
import json

def _scrub_request_body(request):
    # Scrub JSON api_key in request bodies
    try:
        if request.body and "application/json" in (request.headers or {}).get("Content-Type", ""):
            data = json.loads(request.body)
            if isinstance(data, dict) and "api_key" in data:
                data["api_key"] = "<REDACTED>"
                request.body = json.dumps(data)
    except Exception:
        pass
    # Remove cookies
    if request.headers:
        request.headers.pop("cookie", None)
    return request

def _scrub_response_headers(response):
    # Remove set-cookie header to reduce churn
    headers = response["headers"] or {}
    headers.pop("Set-Cookie", None)
    headers.pop("set-cookie", None)
    return response
packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml (1)

380-383: Reduce cassette churn: filter volatile cookies and request IDs

Set-Cookie, Cookie, CF-RAY, and x-request-id values are highly volatile and cause unnecessary cassette diffs. Filtering them improves stability, reduces noise, and avoids committing ephemeral identifiers.

In conftest vcr_config, add:

  • filter_headers: ["cookie", "set-cookie"]
  • before_record_response: drop Set-Cookie (see the helper suggested there).

Optionally also filter "CF-RAY" and "x-request-id" in response headers if you find them noisy.

Also applies to: 200-216, 995-1041

packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (2)

104-104: Avoid brittle assertion on total log count

Asserting an exact log count (8) is brittle across provider or semconv changes and can fail even if the required events are present. Prefer validating the presence of required events (which you already do) and, if needed, assert a minimum count.

For example:

-    assert len(logs) == 8
+    assert len(logs) >= 6

248-256: Use a broader type for logs to reflect actual usage

assert_message_in_logs accepts a Tuple[LogData], but callers pass a list. Use Sequence[LogData] to match both lists and tuples.

-from typing import Tuple
+from typing import Sequence
@@
-def assert_message_in_logs(
-    logs: Tuple[LogData], event_name: str, expected_content: dict
-):
+def assert_message_in_logs(
+    logs: Sequence[LogData], event_name: str, expected_content: dict
+):
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 644435f and ec1b5b6.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml (16 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)

@galkleinman galkleinman force-pushed the gk/fix-langgraph-nesting-issue branch from ec1b5b6 to a5296a4 Compare August 12, 2025 14:09
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed a5296a4 in 1 minute and 59 seconds. Click for details.
  • Reviewed 93 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:77
  • Draft comment:
    The multi-line formatting for the OpenAIInstrumentor.instrument call improves readability without altering functionality.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:151
  • Draft comment:
    The VCR configuration now uses a 'match_on' list and a before_record_request callback to filter the API key in JSON bodies. This stricter matching approach looks appropriate; just ensure non-JSON request bodies are handled as expected.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:104
  • Draft comment:
    In test_agents_with_events_with_content, the expected log count is updated from 15 to 8 and the tool call id is updated to 'call_RQXCf1bdiBiFrtwfOTP3GCCd', reflecting the corrected span nesting. Confirm that these new expectations are stable.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm the stability of new expectations, which violates the rule against asking for confirmation or verification. It does not provide a specific code suggestion or point out a clear issue that needs addressing.
4. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:219
  • Draft comment:
    In test_agents_with_events_with_no_content, the tool call id has been updated to 'call_Joct6NnIJFGGWfvMqZJPRB4C'. This change aligns with the fix; ensure these IDs remain consistent across test runs.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is informative and asks the author to ensure consistency across test runs, which violates the rules. It doesn't provide a specific code suggestion or ask for a specific test to be written.
5. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:143
  • Draft comment:
    Typographical error: The string "OpenLLMetry" on this line may be a typo. Did you mean "OpenTelemetry"?
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_HFjyES3oDMzdI1Bd

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@ellipsis-dev
Copy link
Copy Markdown
Contributor

ellipsis-dev Bot commented Aug 12, 2025

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (2)
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (2)

114-131: Avoid brittle assertions on dynamic tool_call IDs

Hard-coding OpenAI tool_call ids makes tests tightly coupled to cassettes and upstream behavior. Prefer structural assertions and treat the id as “any value”.

Apply this diff to assert a structural subset and allow any id:

-    assert_message_in_logs(
+    assert_message_in_logs_contains(
         logs,
         "gen_ai.assistant.message",
         {
             "content": "",
             "tool_calls": [
                 {
-                    "id": "call_RQXCf1bdiBiFrtwfOTP3GCCd",
+                    "id": ANY,
                     "function": {
                         "name": "tavily_search_results_json",
                         "arguments": {"query": "OpenLLMetry"},
                     },
                     "type": "function",
                 }
             ],
         },
     )
@@
-    assert_message_in_logs(logs, "gen_ai.choice", choice_event)
+    assert_message_in_logs_contains(logs, "gen_ai.choice", choice_event)

Additionally add this helper and import to support partial matching and ANY outside this hunk:

from unittest.mock import ANY

def _deep_contains(superset, subset):
    if subset is ANY:
        return True
    if isinstance(subset, dict):
        if not isinstance(superset, dict):
            return False
        for k, v in subset.items():
            if k not in superset or not _deep_contains(superset[k], v):
                return False
        return True
    if isinstance(subset, list):
        if not isinstance(superset, list) or len(subset) > len(superset):
            return False
        # every expected item must match at least one actual item
        for expected in subset:
            if not any(_deep_contains(actual, expected) for actual in superset):
                return False
        return True
    return superset == subset

def assert_message_in_logs_contains(logs: Tuple[LogData], event_name: str, expected_subset: dict):
    assert any(
        log.log_record.attributes.get(EventAttributes.EVENT_NAME) == event_name
        for log in logs
    )
    assert any(
        _deep_contains(dict(log.log_record.body), expected_subset) for log in logs
    )

Also applies to: 133-149


213-226: Apply the same robust matching for no-content flow

Mirror the structural assertion approach to avoid brittle id expectations here as well.

Apply this diff:

-    assert_message_in_logs(
+    assert_message_in_logs_contains(
         logs,
         "gen_ai.assistant.message",
         {
             "tool_calls": [
                 {
-                    "id": "call_Joct6NnIJFGGWfvMqZJPRB4C",
+                    "id": ANY,
                     "function": {"name": "tavily_search_results_json"},
                     "type": "function",
                 }
             ]
         },
     )
@@
-    assert_message_in_logs(logs, "gen_ai.choice", choice_event)
+    assert_message_in_logs_contains(logs, "gen_ai.choice", choice_event)

Also applies to: 228-241

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec1b5b6 and a5296a4.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (3)
packages/opentelemetry-instrumentation-pinecone/tests/conftest.py (1)
  • instrument (44-46)
packages/opentelemetry-instrumentation-mcp/tests/conftest.py (1)
  • tracer_provider (32-44)
packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)
  • meter_provider (45-50)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)

79-81: Formatting-only change — OK

Multi-line call for OpenAIInstrumentor.instrument is functionally equivalent. No action needed.

packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (1)

104-104: Confirm reduced log count is stable across providers/versions

The expected log count changed to 8. If the intent is to assert “no extra logs”, this is fine. If providers or semconv evolve, consider asserting presence of required events instead of an exact count to avoid brittle failures.

Comment on lines +153 to +171
def before_record_request(request):
if hasattr(request, "body") and request.body:
import json

try:
if isinstance(request.body, (str, bytes)):
body_str = (
request.body.decode("utf-8")
if isinstance(request.body, bytes)
else request.body
)
body_data = json.loads(body_str)
if "api_key" in body_data:
body_data["api_key"] = "FILTERED"
request.body = json.dumps(body_data)
except (json.JSONDecodeError, UnicodeDecodeError, AttributeError):
pass
return request

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Preserve request.body type and redact nested JSON keys (avoid str/bytes regression)

Current logic may change a bytes body to str (Line 167), which can break VCR matching/serialization. It also redacts only top-level "api_key" and misses nested occurrences and other common variants.

Proposed fix: preserve original body type and perform recursive redaction with a small, conservative secret key set (case-insensitive). Optionally, also redact sensitive query params in the same hook.

Apply this diff:

 def before_record_request(request):
-        if hasattr(request, "body") and request.body:
-            import json
-
-            try:
-                if isinstance(request.body, (str, bytes)):
-                    body_str = (
-                        request.body.decode("utf-8")
-                        if isinstance(request.body, bytes)
-                        else request.body
-                    )
-                    body_data = json.loads(body_str)
-                    if "api_key" in body_data:
-                        body_data["api_key"] = "FILTERED"
-                        request.body = json.dumps(body_data)
-            except (json.JSONDecodeError, UnicodeDecodeError, AttributeError):
-                pass
-        return request
+        # Redact secrets in JSON body (preserve original type) and in query parameters.
+        if hasattr(request, "body") and request.body and isinstance(request.body, (str, bytes)):
+            import json
+            try:
+                orig_is_bytes = isinstance(request.body, bytes)
+                body_str = request.body.decode("utf-8") if orig_is_bytes else request.body
+                body_data = json.loads(body_str)
+
+                # Conservative, case-insensitive set of sensitive keys
+                secret_keys = {"api_key", "apikey", "api-key", "access_token"}
+
+                def _redact(obj):
+                    if isinstance(obj, dict):
+                        redacted = {}
+                        for k, v in obj.items():
+                            k_l = k.lower() if isinstance(k, str) else k
+                            if isinstance(k_l, str) and k_l in secret_keys:
+                                redacted[k] = "FILTERED"
+                            else:
+                                redacted[k] = _redact(v)
+                        return redacted
+                    if isinstance(obj, list):
+                        return [_redact(i) for i in obj]
+                    return obj
+
+                body_data = _redact(body_data)
+                new_body = json.dumps(body_data)
+                request.body = new_body.encode("utf-8") if orig_is_bytes else new_body
+            except (json.JSONDecodeError, UnicodeDecodeError, AttributeError, TypeError):
+                # Non-JSON or non-decodable bodies are left as-is
+                pass
+
+        # Redact secrets in query parameters as well
+        try:
+            from urllib.parse import urlsplit, urlunsplit, parse_qsl, urlencode
+            parts = urlsplit(request.uri)
+            qs = parse_qsl(parts.query, keep_blank_values=True)
+            redacted_qs = [(k, "FILTERED" if k.lower() in {"api_key", "apikey", "api-key", "access_token"} else v) for k, v in qs]
+            new_query = urlencode(redacted_qs)
+            request.uri = urlunsplit((parts.scheme, parts.netloc, parts.path, new_query, parts.fragment))
+        except Exception:
+            pass
+
+        return request
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def before_record_request(request):
if hasattr(request, "body") and request.body:
import json
try:
if isinstance(request.body, (str, bytes)):
body_str = (
request.body.decode("utf-8")
if isinstance(request.body, bytes)
else request.body
)
body_data = json.loads(body_str)
if "api_key" in body_data:
body_data["api_key"] = "FILTERED"
request.body = json.dumps(body_data)
except (json.JSONDecodeError, UnicodeDecodeError, AttributeError):
pass
return request
def before_record_request(request):
# Redact secrets in JSON body (preserve original type) and in query parameters.
if hasattr(request, "body") and request.body and isinstance(request.body, (str, bytes)):
import json
try:
orig_is_bytes = isinstance(request.body, bytes)
body_str = request.body.decode("utf-8") if orig_is_bytes else request.body
body_data = json.loads(body_str)
# Conservative, case-insensitive set of sensitive keys
secret_keys = {"api_key", "apikey", "api-key", "access_token"}
def _redact(obj):
if isinstance(obj, dict):
redacted = {}
for k, v in obj.items():
k_l = k.lower() if isinstance(k, str) else k
if isinstance(k_l, str) and k_l in secret_keys:
redacted[k] = "FILTERED"
else:
redacted[k] = _redact(v)
return redacted
if isinstance(obj, list):
return [_redact(i) for i in obj]
return obj
body_data = _redact(body_data)
new_body = json.dumps(body_data)
request.body = new_body.encode("utf-8") if orig_is_bytes else new_body
except (json.JSONDecodeError, UnicodeDecodeError, AttributeError, TypeError):
# Non-JSON or non-decodable bodies are left as-is
pass
# Redact secrets in query parameters as well
try:
from urllib.parse import urlsplit, urlunsplit, parse_qsl, urlencode
parts = urlsplit(request.uri)
qs = parse_qsl(parts.query, keep_blank_values=True)
redacted_qs = [
(k, "FILTERED" if k.lower() in {"api_key", "apikey", "api-key", "access_token"} else v)
for k, v in qs
]
new_query = urlencode(redacted_qs)
request.uri = urlunsplit((parts.scheme, parts.netloc, parts.path, new_query, parts.fragment))
except Exception:
pass
return request
🤖 Prompt for AI Agents
In packages/opentelemetry-instrumentation-langchain/tests/conftest.py around
lines 153 to 171, the before_record_request hook currently decodes bytes to str
and reassigns request.body as a str (breaking VCR matching/serialization) and
only redacts a top-level "api_key"; instead, retain the original body type (if
bytes, operate on a bytes-compatible copy and re-encode before assigning back),
parse JSON safely, walk the parsed structure recursively to redact any keys
matching a small case-insensitive set of sensitive names (e.g., api_key, apikey,
secret, token), and when done serialize back to the same type as the original
body; additionally, inspect and redact matching query params on request.url
using the same key set, and ensure all exceptions (JSON decode, Unicode) are
caught and leave request unchanged on error.

Comment thread packages/opentelemetry-instrumentation-langchain/tests/conftest.py
@traceloop traceloop deleted a comment from coderabbitai Bot Aug 12, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec1b5b6 and bb72946.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (5 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-langchain/tests/test_agents.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Lint
🔇 Additional comments (6)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2)

79-81: Minor formatting improvement - good for readability.

The multi-line formatting of the OpenAIInstrumentor.instrument() call improves code readability.


153-176: Robust implementation for filtering sensitive data in VCR cassettes.

The before_record_request function properly handles various edge cases:

  • Handles both string and bytes request bodies
  • Includes proper error handling for JSON decode errors
  • Uses safe fallback on exceptions
  • Filters API keys from JSON request bodies

The addition of explicit match_on configuration will ensure more predictable cassette matching behavior.

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (4)

24-28: Minor formatting improvement with trailing comma.

The addition of trailing commas and consistent indentation improves code maintainability.


38-41: Improved test assertion formatting.

The single-line set comparison is more readable and concise.


85-89: Consistent formatting applied to async test.

Same formatting improvements as the sync version for consistency.


99-101: Consistent assertion formatting in async test.

Matches the formatting improvement made in the sync version.

Comment on lines +188 to +342
@pytest.mark.vcr
def test_langgraph_github_issue_3203_exact_reproduction(
instrument_legacy, span_exporter, tracer_provider
):
"""Test that exactly reproduces the GitHub issue #3203 with the exact same code structure."""
from opentelemetry import trace
import asyncio
import httpx
from langgraph.graph import END, START, StateGraph

trace.set_tracer_provider(tracer_provider)
tracer = trace.get_tracer(__name__)

class TestAgentState(TypedDict):
http_result: str
span_result: str
messages: list

async def http_call_node(state: TestAgentState) -> dict:
try:
data = {"a": 10, "b": 25}
async with httpx.AsyncClient() as _:
with tracer.start_as_current_span("POST") as span:
span.set_attribute("http.method", "POST")
span.set_attribute("http.url", "https://httpbin.org/post")
sum_result = data.get("a", 0) + data.get("b", 0)
http_result = f"HTTP call successful! Sum of {data.get('a')} + {data.get('b')} = {sum_result}"

span.set_attribute("http.response.status_code", 200)
span.set_attribute("calculation.result", sum_result)

except Exception as e:
http_result = f"HTTP call error: {str(e)}"

return {"http_result": http_result}

async def opentelemetry_span_node(state: TestAgentState) -> dict:
with tracer.start_as_current_span("test_agent_span") as span:
span.set_attribute("node.name", "opentelemetry_span_node")
span.set_attribute("agent.type", "test_agent")
span.set_attribute("operation.type", "span_creation")

span.add_event("Starting span processing")

await asyncio.sleep(0.01)

http_result = state.get("http_result", "No HTTP result available")
span.set_attribute("previous.http_result", http_result)

span.add_event("Processing HTTP result from previous node")

span_result = f"OpenTelemetry span created successfully! Span ID: {span.get_span_context().span_id}"

span.add_event("Span processing completed")
span.set_attribute("processing.status", "completed")

return {"span_result": span_result}

def create_test_agent():
"""Create a simple LangGraph agent with 2 nodes matching the GitHub issue exactly."""
builder = StateGraph(TestAgentState)

builder.add_node("http_call", http_call_node)
builder.add_node("otel_span", opentelemetry_span_node)

builder.add_edge(START, "http_call")
builder.add_edge("http_call", "otel_span")
builder.add_edge("otel_span", END)

agent = builder.compile()
return agent

async def run_test_agent():
with tracer.start_as_current_span("test_agent_execution_root") as root_span:
root_span.set_attribute("agent.name", "test_agent")
root_span.set_attribute("agent.version", "1.0.0")
root_span.set_attribute("execution.type", "full_agent_run")

root_span.add_event("Agent execution started")

try:
root_span.add_event("Creating agent graph")
agent = create_test_agent()
root_span.set_attribute("agent.nodes_count", 2)

initial_state = {"http_result": "", "span_result": "", "messages": []}
root_span.add_event("Initial state prepared")

root_span.add_event("Starting agent invocation")
final_state = await agent.ainvoke(initial_state)

root_span.set_attribute("execution.status", "completed")
return final_state

except Exception as e:
root_span.set_attribute("execution.status", "failed")
root_span.set_attribute("error.type", type(e).__name__)
root_span.set_attribute("error.message", str(e))
root_span.add_event("Agent execution failed", {"error": str(e)})
raise

final_state = asyncio.run(run_test_agent())

assert "http_result" in final_state
assert "span_result" in final_state
assert "Sum of 10 + 25 = 35" in final_state["http_result"]

spans = span_exporter.get_finished_spans()
span_names = [span.name for span in spans]

print(f"\nCaptured {len(spans)} spans:")
for span in spans:
parent_name = "None"
if span.parent:
parent_span = next(
(s for s in spans if s.context.span_id == span.parent.span_id), None
)
if parent_span:
parent_name = parent_span.name
else:
parent_name = f"Unknown({span.parent.span_id})"
print(f" - {span.name} (parent: {parent_name})")

assert "test_agent_execution_root" in span_names
assert "POST" in span_names
assert "test_agent_span" in span_names
assert "http_call.task" in span_names
assert "otel_span.task" in span_names
assert "LangGraph.workflow" in span_names

root_span = next(span for span in spans if span.name == "test_agent_execution_root")
post_span = next(span for span in spans if span.name == "POST")
test_agent_span = next(span for span in spans if span.name == "test_agent_span")
http_call_task_span = next(span for span in spans if span.name == "http_call.task")
otel_span_task_span = next(span for span in spans if span.name == "otel_span.task")
workflow_span = next(span for span in spans if span.name == "LangGraph.workflow")

print("\nHierarchy check:")
print(f"POST parent: {post_span.parent.span_id if post_span.parent else 'None'}")
print(f"http_call.task ID: {http_call_task_span.context.span_id}")
print(
f"test_agent_span parent: {test_agent_span.parent.span_id if test_agent_span.parent else 'None'}"
)
print(f"otel_span.task ID: {otel_span_task_span.context.span_id}")

assert (
post_span.parent.span_id == http_call_task_span.context.span_id
), "POST span should be child of http_call.task span"
assert (
test_agent_span.parent.span_id == otel_span_task_span.context.span_id
), "test_agent_span should be child of otel_span.task span"

assert http_call_task_span.parent.span_id == workflow_span.context.span_id
assert otel_span_task_span.parent.span_id == workflow_span.context.span_id
assert workflow_span.parent.span_id == root_span.context.span_id
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Comprehensive test reproduction for GitHub issue #3203.

This is an excellent reproduction test that validates the span hierarchy fix. The test:

  1. Creates a realistic scenario with HTTP calls and OpenTelemetry spans within LangGraph nodes
  2. Validates the complete span hierarchy ensuring proper parent-child relationships
  3. Includes detailed assertions for span nesting and context propagation
  4. Provides debugging output with span hierarchy visualization

The test structure effectively exercises the callback handler fixes for async context handling and span attachment.

However, there are a few considerations:

Let me verify if the async HTTP client usage could be simplified since it's not actually making real HTTP calls:


🏁 Script executed:

#!/bin/bash
# Check if httpx is being used elsewhere in the codebase and if it's a dependency
echo "Checking httpx usage in the codebase:"
rg -l "httpx" --type py

echo -e "\nChecking if httpx is declared as a dependency:"
fd -t f -e txt -e cfg -e toml -e yaml -e yml -e json | xargs grep -l "httpx" 2>/dev/null || echo "No httpx dependency declarations found"

Length of output: 6561


Add httpx to the project’s test/dev dependencies

The tracing tests (e.g., packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py) rely on httpx under @pytest.mark.vcr to record and replay HTTP interactions. Although the client isn’t making “real” calls at runtime, it’s essential for the VCR setup and is used across many instrumentation tests.

• Confirm and update your dependency declarations (e.g., in pyproject.toml, setup.cfg/setup.py, or requirements-dev.txt) to include:

[tool.poetry.dev-dependencies]
httpx = "^YOUR_SUPPORTED_VERSION"

—or—

[dev-packages]
httpx = version = "*"

—or—

# requirements-dev.txt
httpx>=x.y.z

• You can ignore the suggestion to “simplify” the HTTPX usage here, as the real calls are captured/replayed via VCR.

🤖 Prompt for AI Agents
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py lines
188-342: the test uses httpx under pytest.mark.vcr but httpx is not declared in
the project test/dev dependencies, causing test runs to fail in CI; add httpx to
the repository's test/dev dependency declarations (pyproject.toml under
[tool.poetry.dev-dependencies], setup.cfg/setup.py extras or
requirements-dev.txt) with a supported version range consistent with other
tests, update lockfile or run your dependency manager (poetry lock/install or
pip install -r requirements-dev.txt) and re-run tests to ensure httpx is
available for VCR playback.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 5bd2330 in 2 minutes and 25 seconds. Click for details.
  • Reviewed 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:172
  • Draft comment:
    Removed 'body' from match_on. Ensure that request bodies are not needed for VCR matching in tests.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:173
  • Draft comment:
    Removed filter_query_parameters for API keys. Confirm that query parameters do not leak sensitive info.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test configuration file, not production code. The code is specifically about VCR test recording configuration. The removal of filter_query_parameters suggests the author intentionally wants to record query parameters in tests. Security in test recordings is important, but the author has already demonstrated security awareness through other filters. The comment raises a valid security concern. Sensitive data in query parameters could potentially be recorded in tests. However, this is a test configuration where the author has already shown careful consideration of security by filtering headers and body content. The change appears intentional for test purposes. The comment should be deleted as it asks for confirmation of intention ("Confirm that...") which violates our rules, and it's speculative about potential issues rather than pointing out a definite problem.

Workflow ID: wflow_cBifz1dXyjBr0oE8

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

188-342: Comprehensive test reproduction for GitHub issue #3203.

This test effectively validates the span hierarchy fix by:

  1. Creating realistic scenarios with HTTP calls and OpenTelemetry spans within LangGraph nodes
  2. Validating the complete span hierarchy with proper parent-child relationships
  3. Including debugging output for span hierarchy visualization

The test structure properly exercises the callback handler fixes for async context handling and span attachment.

Verify that httpx is declared in the project's test/dev dependencies:

#!/bin/bash
# Check if httpx is declared in dependency files
echo "Checking for httpx in dependency declarations:"
fd -t f "(pyproject.toml|setup.py|setup.cfg|requirements.*\.txt)" | xargs grep -l "httpx" 2>/dev/null

# Check the specific package's dependencies
echo -e "\nChecking package-specific dependencies:"
cat packages/opentelemetry-instrumentation-langchain/pyproject.toml 2>/dev/null | grep -A 10 "\[tool.poetry.dev-dependencies\]"
cat packages/opentelemetry-instrumentation-langchain/requirements-dev.txt 2>/dev/null | grep "httpx"
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

190-195: Consider using contextlib.suppress for cleaner exception handling.

While the current implementation is functionally correct and well-documented, using contextlib.suppress would be more idiomatic.

+from contextlib import suppress
+
 def _end_span(self, span: Span, run_id: UUID) -> None:
     for child_id in self.spans[run_id].children:
         if child_id in self.spans:
             child_span = self.spans[child_id].span
             if child_span.end_time is None:  # avoid warning on ended spans
                 child_span.end()
     span.end()
     token = self.spans[run_id].token
     if token:
-        try:
-            context_api.detach(token)
-        except ValueError:
-            # Context detach can fail in async scenarios when tokens are created in different contexts
-            # This is expected behavior and doesn't affect the correct span hierarchy
-            pass
+        # Context detach can fail in async scenarios when tokens are created in different contexts
+        # This is expected behavior and doesn't affect the correct span hierarchy
+        with suppress(ValueError):
+            context_api.detach(token)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb72946 and 293b501.

📒 Files selected for processing (3)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (2 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-langchain/tests/conftest.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)
packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (5)
  • TestAgentState (45-48)
  • http_call_node (50-72)
  • opentelemetry_span_node (74-99)
  • create_test_agent (101-115)
  • run_test_agent (117-148)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py

190-195: Use contextlib.suppress(ValueError) instead of try-except-pass

(SIM105)

🔇 Additional comments (3)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

236-236: Good fix for span context attachment.

Making the token attachment unconditional ensures consistent span context propagation in both sync and async scenarios, which is critical for maintaining proper span hierarchy in LangGraph flows.

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (2)

38-40: Formatting improvement for readability.

The set comprehension is now more readable on a single line.

Also applies to: 99-101


71-72: Good: Enabling async LangGraph test.

Removing the @pytest.mark.xfail decorator enables this important async test to validate that the context propagation fixes work correctly in asynchronous scenarios.

@nirga nirga merged commit 3bb39f7 into main Aug 12, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants