fix(langchain): fix nesting of langgraph spans by galkleinman · Pull Request #3206 · traceloop/openllmetry

galkleinman · 2025-07-31T08:02:26Z

Important

Fixes LangGraph span hierarchy issue and adds tests for validation, with updates to test configurations for security and accuracy.

Behavior:
- Fixes span hierarchy issue in LangGraph by ensuring consistent context attachment and guarded detach in callback_handler.py.
- Adds a reproducible tracing script in langgraph_with_otel_sdk.py to demonstrate the issue and fix.
Tests:
- Enables async LangGraph tracing test in test_langgraph.py and adds a new test for span hierarchy validation.
- Updates test fixtures in test_agents.py and test_langgraph.py to reflect changes in span IDs and log counts.
- Strengthens VCR matching and adds request pre-processing to redact API keys in conftest.py.
Misc:
- Minor formatting cleanups in test cassettes.

^{This description was created by}^{for 5bd2330. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

Bug Fixes
- Made span cleanup resilient in async callback flows (guarded detach) and ensured consistent context attachment to preserve span hierarchy.
Tests
- Enabled async LangGraph tracing test and added a reproduction test that validates spans, hierarchy, and end-to-end results.
- Updated test cassettes/fixtures, adjusted expected IDs and log counts, and improved formatting.
- Strengthened VCR handling and added pre-record request redaction of API keys.
Chores
- Added a reproducible tracing script to run and visualize tracing behavior.

coderabbitai · 2025-07-31T08:02:33Z

Walkthrough

Adds a LangGraph + OpenTelemetry reproduction script and test; hardens LangChain callback context attach/detach for async scenarios; updates VCR redaction and recorded test cassettes; adjusts several test expectations and tool_call IDs.

Changes

Cohort / File(s)	Summary
Reproduction script `packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py`	New script that wires OpenTelemetry (InMemorySpanExporter, TracerProvider) with a minimal two-node LangGraph (http_call → otel_span). Adds `setup_tracing()` and `run_github_issue_reproduction()` to run the async workflow, collect/display spans, and uninstrument on exit.
Callback handler (context management) `packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py`	`_create_span` now always attaches a context token via `context_api.attach(set_span_in_context(span))`. `_end_span` now wraps `context_api.detach(token)` in try/except ValueError to avoid detach failures across async/mismatched contexts. No public API signature changes.
LangGraph tests & new reproduction test `packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py`	Formatting tweaks, removal of `@pytest.mark.xfail` on async test, and addition of `test_nesting_of_langgraph_spans(instrument_legacy, span_exporter, tracer_provider)` which builds/runs a two-node LangGraph under OTel and asserts spans, hierarchy, and outputs.
Test fixtures (cassettes) updates `packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml`	Recorded HTTP interactions and OpenAI streaming payloads refreshed: API versions, headers, timestamps, request/response bodies, and streaming token sequence updated to newer recordings. No application logic changes.
VCR config change + request redaction `packages/opentelemetry-instrumentation-langchain/tests/conftest.py`	`vcr_config()` updated to remove `filter_body`, set `match_on` to ["method","scheme","host","port","path","query"], and add `before_record_request` to redact `"api_key"` from JSON request bodies; minor reformat of instrumentor invocation.
Agent tests expectations `packages/opentelemetry-instrumentation-langchain/tests/test_agents.py`	Adjusted expected log counts and updated several `tool_calls` IDs in tests to match updated fixtures/recordings.

Sequence Diagram(s)

sequenceDiagram
  participant Runner as Test/Script
  participant OTel as OpenTelemetry SDK
  participant Instr as LangChain Instrumentor
  participant Graph as LangGraph Agent
  participant HTTP as httpx.AsyncClient
  participant Exporter as InMemorySpanExporter

  Runner->>OTel: setup_tracing()
  OTel->>Instr: instrument(LangChain)
  Runner->>OTel: start root span (test_agent_execution_root)
  Runner->>Graph: ainvoke(initial_state)
  Graph->>OTel: start span (LangGraph.workflow)
  Graph->>Graph: execute node http_call.task
  Graph->>OTel: start span (http_call.task)
  Graph->>HTTP: POST /sum
  HTTP-->>Graph: response (35)
  Graph->>OTel: end span (http_call.task)
  Graph->>Graph: execute node otel_span.task
  Graph->>OTel: start span (test_agent_span)
  Graph->>OTel: end span (test_agent_span)
  Graph->>OTel: end span (LangGraph.workflow)
  Runner->>OTel: end root span
  OTel->>Exporter: export spans
  Runner->>Instr: uninstrument()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

fix(langchain): spans dictionary memory leak #3216 — Modifies span lifecycle and context token behavior in callback_handler.py; directly related to attach/detach changes.
fix(langchain): populate metadata as span attributes in batch operations #3218 — Edits _create_span token handling in callback_handler.py; related to token attachment semantics.
fix(langchain): ensure llm spans are created for sync cases #3201 — Prior changes touching token attach/detach logic in callback_handler.py; relevant to async detach handling.

Suggested reviewers

doronkopit5
nirga

Poem

A rabbit hops through spans and threads,
I bind and unbind where context treads.
Two nodes trace a tiny row of light, 🥕
I count the hops and keep them tight.
Little paws, big traces — what a sight.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 293b501 and 5bd2330.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gk/fix-langgraph-nesting-issue

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…anggraph-nesting-issue

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 5ff9bbf in 1 minute and 51 seconds. Click for details.

Reviewed 507 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py:55

Draft comment:
Consider using a descriptive variable name instead of '_' for the AsyncClient, to improve code clarity.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% 1. The _ is intentionally used here to indicate the variable is unused. 2. This follows Python conventions where _ indicates a throwaway variable. 3. Using a descriptive name for an unused variable could be misleading since it suggests the variable has a purpose. 4. The code is actually clearer with _ since it signals to readers that this variable isn't used. Maybe having a named client would make debugging easier if there are connection issues? Maybe the code will need to use the client in the future? The code is a test reproduction case demonstrating a specific issue - it's not production code that needs to handle real HTTP calls or be maintained long term. The _ is actually more appropriate for this context. The comment should be deleted. Using _ is the correct Python idiom here since the variable is intentionally unused, and a descriptive name would make the code less clear, not more clear.

2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:55

Draft comment:
Explicitly setting the global tracer provider (line 55–59) ensures that spans created by user code are properly nested. This is a solid improvement.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the change is a solid improvement, which violates the rule against making purely informative comments.

3. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:189

Draft comment:
Wrapping context_api.detach(token) in a try/except block to handle ValueError in asynchronous scenarios is a good and necessary fix.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the change is good and necessary, which violates the rule against making purely informative comments.

4. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:227

Draft comment:
Changing the conditional span context attachment to unconditional (removing the check for async callbacks) simplifies management. Verify that this change does not inadvertently affect synchronous callback behavior.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py:292

Draft comment:
The tests, including the exact reproduction of GitHub Issue #3203, thoroughly assert span names and parent-child relationships. They provide good coverage to validate the corrected span nesting.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, providing details about the tests and their coverage. It does not suggest any changes or ask for confirmation on specific code behavior.

Workflow ID: wflow_TaEhtkkh8QGGmEE7

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

…anggraph-nesting-issue

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 15476d1 in 46 seconds. Click for details.

Reviewed 27 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:50

Draft comment:
Removed global tracer provider setting. Please document the rationale behind removing the global setting, as it could affect span nesting behavior.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking for documentation of the rationale behind a change, which is similar to asking for an update to the PR description. This violates the rule against asking the PR author to update the PR description or provide explanations.

2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:29

Draft comment:
Minor formatting change: extra space added in the _instruments tuple. Confirm if this is intentional.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_yksaE0uDHchF83Hz

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
190-195: Consider using contextlib.suppress for cleaner exception handling

While the current implementation is functionally correct, using contextlib.suppress would be more idiomatic and concise for ignoring exceptions.
+from contextlib import suppress
Then update the exception handling:
-            try:
-                context_api.detach(token)
-            except ValueError:
-                # Context detach can fail in async scenarios when tokens are created in different contexts
-                # This is expected behavior and doesn't affect the correct span hierarchy
-                pass
+            # Context detach can fail in async scenarios when tokens are created in different contexts
+            # This is expected behavior and doesn't affect the correct span hierarchy
+            with suppress(ValueError):
+                context_api.detach(token)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

297-309: Consider extracting span debugging logic to a test utility

The span hierarchy printing logic (lines 297-309) could be useful for debugging other tests. Consider extracting it to a shared test utility function.

Would you like me to create a utility function for debugging span hierarchies that could be reused across tests?
packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (1)
55-56: Unused httpx.AsyncClient context manager

The code creates an httpx.AsyncClient context manager but doesn't actually use it (the variable is named _). Since this is meant to simulate an HTTP call, either use the client or add a comment explaining why it's not used.
-            async with httpx.AsyncClient() as _:
+            # Simulating HTTP call context without actual network request
+            async with httpx.AsyncClient() as _:
Or if you want to make it more realistic:
-            async with httpx.AsyncClient() as _:
+            async with httpx.AsyncClient() as client:
+                # In a real scenario, we would make an actual HTTP call:
+                # response = await client.post("https://httpbin.org/post", json=data)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba68225 and 644435f.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (1 hunks)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (2 hunks)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (5 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (3)

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py (1)

LangchainInstrumentor (32-185)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)

export (45-51)

InMemorySpanExporter (22-61)

get_finished_spans (40-43)

packages/opentelemetry-instrumentation-mcp/tests/conftest.py (2)

tracer_provider (32-44)

tracer (48-49)

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (3)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)

instrument_legacy (77-93)

packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (5)

TestAgentState (45-48)

http_call_node (50-72)

opentelemetry_span_node (74-99)

create_test_agent (101-115)

run_test_agent (117-148)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py

190-195: Use contextlib.suppress(ValueError) instead of try-except-pass

(SIM105)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (2)

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

236-236: Good fix for async context propagation

The unconditional attachment of the span context ensures that a token is always available for later detachment, which is critical for maintaining proper span hierarchy in async scenarios. This change, combined with the error handling in _end_span, properly addresses the GitHub issue #3203.

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

188-343: Comprehensive test coverage for the GitHub issue reproduction

The test thoroughly reproduces GitHub issue #3203 with:

Proper async execution flow with two LangGraph nodes

Correct verification of span hierarchy and parent-child relationships

Detailed assertions for all expected spans

Good debugging output for troubleshooting

This test will effectively prevent regression of the span nesting issue.

coderabbitai · 2025-08-12T13:25:51Z

+#!/usr/bin/env python3
+"""
+Reproduce the exact GitHub issue #3203 and show the trace waterfall.
+This demonstrates the span hierarchy issue before and after the fix.
+"""
+
+import asyncio
+import sys
+from typing import TypedDict
+import httpx
+from langgraph.graph import END, START, StateGraph
+from opentelemetry import trace
+from opentelemetry.instrumentation.langchain import LangchainInstrumentor
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import SimpleSpanProcessor
+from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
+
+from waterfall_visualizer import visualize_trace_waterfall, print_raw_span_data
+


💡 Verification agent

🧩 Analysis chain

Missing import for waterfall_visualizer module

The script imports waterfall_visualizer (line 18) but this module doesn't appear to be part of the package. This will cause an ImportError when running the script.

🏁 Script executed:

#!/bin/bash # Check if waterfall_visualizer module exists in the repository fd -t f "waterfall_visualizer.py" # Check if it's imported elsewhere in the codebase rg "from waterfall_visualizer import" -A 2 # Check if there's any reference to this module rg "waterfall_visualizer" --type py

Length of output: 580

Add the missing waterfall_visualizer module or correct the import

The script at packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (line 18) imports waterfall_visualizer, but no such file exists in the repository. This will raise an ImportError at runtime.

To resolve:

Include waterfall_visualizer.py in the package (e.g., under packages/opentelemetry-instrumentation-langchain/), defining visualize_trace_waterfall and print_raw_span_data.

Or, if it belongs in a different module, update the import path and add the dependency to setup.py or pyproject.toml.

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py around lines 1 to 19, the script imports waterfall_visualizer which does not exist and will cause ImportError; either add a new file packages/opentelemetry-instrumentation-langchain/waterfall_visualizer.py that defines visualize_trace_waterfall and print_raw_span_data and ensure it is included in the package manifest, or update the import to the correct existing module path and add the dependency to setup.py/pyproject.toml so the module is available at runtime.

coderabbitai · 2025-08-12T13:25:51Z

+        print("\n🎉 GitHub Issue #3203 has been FIXED!")
+        print("The visualization above shows the CORRECTED span hierarchy.")
+        print("Note how POST and test_agent_span are now properly nested")
+        print("under their respective task spans (http_call.task and otel_span.task)!")


🛠️ Refactor suggestion

Misleading success message before fix verification

The script prints "GitHub Issue #3203 has been FIXED!" and describes the corrected hierarchy without actually verifying whether the fix is in place. This could be confusing if someone runs this script on a version without the fix.

Consider dynamically checking the span hierarchy to determine if the issue is fixed:

- print("\n🎉 GitHub Issue #3203 has been FIXED!") - print("The visualization above shows the CORRECTED span hierarchy.") - print("Note how POST and test_agent_span are now properly nested") - print("under their respective task spans (http_call.task and otel_span.task)!") + # Verify if the hierarchy is correct + post_span = next((s for s in spans if s.name == "POST"), None) + http_call_task = next((s for s in spans if s.name == "http_call.task"), None) + + if post_span and http_call_task and post_span.parent: + if post_span.parent.span_id == http_call_task.context.span_id: + print("\n🎉 GitHub Issue #3203 has been FIXED!") + print("The visualization above shows the CORRECTED span hierarchy.") + print("Note how POST and test_agent_span are now properly nested") + print("under their respective task spans (http_call.task and otel_span.task)!") + else: + print("\n⚠️ GitHub Issue #3203 is still present!") + print("The spans are not properly nested under their task spans.") + else: + print("\n⚠️ Unable to verify span hierarchy - expected spans not found")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print("\n🎉 GitHub Issue #3203 has been FIXED!")

print("The visualization above shows the CORRECTED span hierarchy.")

print("Note how POST and test_agent_span are now properly nested")

print("under their respective task spans (http_call.task and otel_span.task)!")

# Verify if the hierarchy is correct

post_span = next((s for s in spans if s.name == "POST"), None)

http_call_task = next((s for s in spans if s.name == "http_call.task"), None)

if post_span and http_call_task and post_span.parent:

if post_span.parent.span_id == http_call_task.context.span_id:

print("\n🎉 GitHub Issue #3203 has been FIXED!")

print("The visualization above shows the CORRECTED span hierarchy.")

print("Note how POST and test_agent_span are now properly nested")

print("under their respective task spans (http_call.task and otel_span.task)!")

else:

print("\n⚠️ GitHub Issue #3203 is still present!")

print("The spans are not properly nested under their task spans.")

else:

print("\n⚠️ Unable to verify span hierarchy - expected spans not found")

gitguardian · 2025-08-12T13:34:41Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed ec1b5b6 in 2 minutes and 29 seconds. Click for details.

Reviewed 1149 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 8 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:150

Draft comment:
Explicit 'match_on' keys added to vcr_config improve request matching. Confirm these keys cover all necessary URI components.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:122

Draft comment:
The tool call ID has been updated to 'call_RQXCf1bdiBiFrtwfOTP3GCCd'. Ensure this change aligns with the underlying agent implementation.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to ensure that the change aligns with the underlying agent implementation, which is against the rules. It does not provide a specific suggestion or ask for a specific test to be written.

3. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:220

Draft comment:
The tool call ID in the no-content scenario is now 'call_Joct6NnIJFGGWfvMqZJPRB4C'. Verify that this updated ID aligns with expected behavior.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify if the updated tool call ID aligns with expected behavior. This falls under asking the author to confirm their intention or ensure behavior, which is against the rules.

4. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:104

Draft comment:
Log exporter assertions now expect 8 logs. Confirm that this new count accurately reflects the improved filtering/consolidation.
Reason this comment was not posted:
Comment looked like it was already resolved.

5. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:202

Draft comment:
The assertion that all logs have GenAIAttributes.GEN_AI_SYSTEM equal to 'langchain' is helpful. Ensure this remains valid for future behavior.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6.

packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml:19

Draft comment:
Typographical: There appears to be an extra stray apostrophe on this line that may be unintentional.
Reason this comment was not posted:
Comment was on unchanged code.

7.

packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml:82

Draft comment:
The JSON string for the "template" property now starts with "are a helpful assistant" instead of "You are a helpful assistant". This results in an incomplete sentence. Please verify if the leading "You" was inadvertently removed.
Reason this comment was not posted:
Comment was on unchanged code.

8.

packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml:256

Draft comment:
Typo alert: The query string "OpenLLMetry" appears to have an extra 'L'. If the intent was to refer to OpenTelemetry, please update it accordingly.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_f2BGg6t3wtktllTA

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (2)

packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (2)

122-129: Don’t hardcode tool_call IDs; derive them from the logs

Tool call IDs are provider-generated and will change when cassettes are re-recorded. Hardcoding them forces frequent fixture updates. Extract the ID from the assistant message event and reuse it in subsequent assertions.

Apply this diff:

@@ def test_agents_with_events_with_content(
-    # Validate that the assistant message Event exists
+    # Extract tool_call id from assistant message for stable assertions
+    tool_call_id = None
+    for log in logs:
+        if log.log_record.attributes.get(EventAttributes.EVENT_NAME) == "gen_ai.assistant.message":
+            body = dict(log.log_record.body)
+            calls = body.get("tool_calls") or []
+            if calls:
+                tool_call_id = calls[0].get("id")
+                break
+    assert tool_call_id, "assistant tool_call id not found in logs"
+
+    # Validate that the assistant message Event exists
     assert_message_in_logs(
         logs,
         "gen_ai.assistant.message",
         {
             "content": "",
             "tool_calls": [
                 {
-                    "id": "call_RQXCf1bdiBiFrtwfOTP3GCCd",
+                    "id": tool_call_id,
                     "function": {
                         "name": "tavily_search_results_json",
                         "arguments": {"query": "OpenLLMetry"},
                     },
                     "type": "function",
                 }
             ],
         },
     )
@@
     choice_event = {
         "index": 0,
         "finish_reason": "tool_calls",
         "message": {"content": ""},
         "tool_calls": [
             {
-                "id": "call_RQXCf1bdiBiFrtwfOTP3GCCd",
+                "id": tool_call_id,
                 "function": {
                     "name": "tavily_search_results_json",
                     "arguments": {"query": "OpenLLMetry"},
                 },
                 "type": "function",
             }
         ],
     }

Repeat the same pattern for the no_content test (see next comment).

Also applies to: 140-148

220-226: Mirror dynamic tool_call ID extraction in the no-content test

Avoid hardcoding the streaming tool_call id here as well; reuse the id derived from the assistant.message log.

Apply this diff:

@@ def test_agents_with_events_with_no_content(
-    # Validate that the assistant message Event exists
+    # Extract tool_call id from assistant message for stable assertions
+    tool_call_id = None
+    for log in logs:
+        if log.log_record.attributes.get(EventAttributes.EVENT_NAME) == "gen_ai.assistant.message":
+            body = dict(log.log_record.body)
+            calls = body.get("tool_calls") or []
+            if calls:
+                tool_call_id = calls[0].get("id")
+                break
+    assert tool_call_id, "assistant tool_call id not found in logs"
+
+    # Validate that the assistant message Event exists
     assert_message_in_logs(
         logs,
         "gen_ai.assistant.message",
         {
             "tool_calls": [
                 {
-                    "id": "call_Joct6NnIJFGGWfvMqZJPRB4C",
+                    "id": tool_call_id,
                     "function": {"name": "tavily_search_results_json"},
                     "type": "function",
                 }
             ]
         },
     )
@@
     choice_event = {
         "index": 0,
         "finish_reason": "tool_calls",
         "message": {},
         "tool_calls": [
             {
-                "id": "call_Joct6NnIJFGGWfvMqZJPRB4C",
+                "id": tool_call_id,
                 "function": {"name": "tavily_search_results_json"},
                 "type": "function",
             }
         ],
     }

Also applies to: 235-240

🧹 Nitpick comments (4)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
149-156: Avoid weakening HTTP cassette matching by omitting the request body

Adding match_on without "body" increases the risk of mismatching multiple requests to the same endpoint (e.g., two POSTs to /v1/chat/completions in a single test). This can hide regressions and cause flaky replays across tests that hit identical routes with different payloads.

Consider including "body" back in the matchers and scrubbing dynamic fields instead. Also scrub sensitive data in JSON bodies to prevent leaks (see separate comment on leaked API key). Example:
 def vcr_config():
-    return {
-        "filter_headers": ["authorization", "x-api-key"],
-        "filter_body": ["api_key"],
-        "match_on": ["method", "scheme", "host", "port", "path", "query"],
-    }
+    return {
+        "filter_headers": ["authorization", "x-api-key", "cookie", "set-cookie"],
+        # If you use pytest-recording, prefer JSON scrubbing via before_record_request (below)
+        "match_on": ["method", "scheme", "host", "port", "path", "query", "body"],
+        "before_record_request": _scrub_request_body,
+        "before_record_response": _scrub_response_headers,
+    }
Add these helpers (outside this function):
# Add at the top of the file:
import json

def _scrub_request_body(request):
    # Scrub JSON api_key in request bodies
    try:
        if request.body and "application/json" in (request.headers or {}).get("Content-Type", ""):
            data = json.loads(request.body)
            if isinstance(data, dict) and "api_key" in data:
                data["api_key"] = "<REDACTED>"
                request.body = json.dumps(data)
    except Exception:
        pass
    # Remove cookies
    if request.headers:
        request.headers.pop("cookie", None)
    return request

def _scrub_response_headers(response):
    # Remove set-cookie header to reduce churn
    headers = response["headers"] or {}
    headers.pop("Set-Cookie", None)
    headers.pop("set-cookie", None)
    return response
packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml (1)

380-383: Reduce cassette churn: filter volatile cookies and request IDs

Set-Cookie, Cookie, CF-RAY, and x-request-id values are highly volatile and cause unnecessary cassette diffs. Filtering them improves stability, reduces noise, and avoids committing ephemeral identifiers.

In conftest vcr_config, add:

filter_headers: ["cookie", "set-cookie"]

before_record_response: drop Set-Cookie (see the helper suggested there).

Optionally also filter "CF-RAY" and "x-request-id" in response headers if you find them noisy.

Also applies to: 200-216, 995-1041
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (2)
104-104: Avoid brittle assertion on total log count

Asserting an exact log count (8) is brittle across provider or semconv changes and can fail even if the required events are present. Prefer validating the presence of required events (which you already do) and, if needed, assert a minimum count.

For example:
-    assert len(logs) == 8
+    assert len(logs) >= 6
248-256: Use a broader type for logs to reflect actual usage

assert_message_in_logs accepts a Tuple[LogData], but callers pass a list. Use Sequence[LogData] to match both lists and tuples.
-from typing import Tuple
+from typing import Sequence
@@
-def assert_message_in_logs(
-    logs: Tuple[LogData], event_name: str, expected_content: dict
-):
+def assert_message_in_logs(
+    logs: Sequence[LogData], event_name: str, expected_content: dict
+):

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 644435f and ec1b5b6.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-langchain/tests/cassettes/test_agents/test_agents_with_events_with_no_content.yaml (16 hunks)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1 hunks)
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Test Packages (3.12)
GitHub Check: Lint
GitHub Check: Build Packages (3.11)

ellipsis-dev

Important

Looks good to me! 👍

Reviewed a5296a4 in 1 minute and 59 seconds. Click for details.

Reviewed 93 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 5 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:77

Draft comment:
The multi-line formatting for the OpenAIInstrumentor.instrument call improves readability without altering functionality.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

2. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:151

Draft comment:
The VCR configuration now uses a 'match_on' list and a before_record_request callback to filter the API key in JSON bodies. This stricter matching approach looks appropriate; just ensure non-JSON request bodies are handled as expected.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

3. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:104

Draft comment:
In test_agents_with_events_with_content, the expected log count is updated from 15 to 8 and the tool call id is updated to 'call_RQXCf1bdiBiFrtwfOTP3GCCd', reflecting the corrected span nesting. Confirm that these new expectations are stable.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm the stability of new expectations, which violates the rule against asking for confirmation or verification. It does not provide a specific code suggestion or point out a clear issue that needs addressing.

4. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:219

Draft comment:
In test_agents_with_events_with_no_content, the tool call id has been updated to 'call_Joct6NnIJFGGWfvMqZJPRB4C'. This change aligns with the fix; ensure these IDs remain consistent across test runs.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is informative and asks the author to ensure consistency across test runs, which violates the rules. It doesn't provide a specific code suggestion or ask for a specific test to be written.

5. packages/opentelemetry-instrumentation-langchain/tests/test_agents.py:143

Draft comment:
Typographical error: The string "OpenLLMetry" on this line may be a typo. Did you mean "OpenTelemetry"?
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_HFjyES3oDMzdI1Bd

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-08-12T14:23:01Z

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev

Generated with ❤️ by ellipsis.dev

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (2)

packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (2)

114-131: Avoid brittle assertions on dynamic tool_call IDs

Hard-coding OpenAI tool_call ids makes tests tightly coupled to cassettes and upstream behavior. Prefer structural assertions and treat the id as “any value”.

Apply this diff to assert a structural subset and allow any id:

-    assert_message_in_logs(
+    assert_message_in_logs_contains(
         logs,
         "gen_ai.assistant.message",
         {
             "content": "",
             "tool_calls": [
                 {
-                    "id": "call_RQXCf1bdiBiFrtwfOTP3GCCd",
+                    "id": ANY,
                     "function": {
                         "name": "tavily_search_results_json",
                         "arguments": {"query": "OpenLLMetry"},
                     },
                     "type": "function",
                 }
             ],
         },
     )
@@
-    assert_message_in_logs(logs, "gen_ai.choice", choice_event)
+    assert_message_in_logs_contains(logs, "gen_ai.choice", choice_event)

Additionally add this helper and import to support partial matching and ANY outside this hunk:

from unittest.mock import ANY

def _deep_contains(superset, subset):
    if subset is ANY:
        return True
    if isinstance(subset, dict):
        if not isinstance(superset, dict):
            return False
        for k, v in subset.items():
            if k not in superset or not _deep_contains(superset[k], v):
                return False
        return True
    if isinstance(subset, list):
        if not isinstance(superset, list) or len(subset) > len(superset):
            return False
        # every expected item must match at least one actual item
        for expected in subset:
            if not any(_deep_contains(actual, expected) for actual in superset):
                return False
        return True
    return superset == subset

def assert_message_in_logs_contains(logs: Tuple[LogData], event_name: str, expected_subset: dict):
    assert any(
        log.log_record.attributes.get(EventAttributes.EVENT_NAME) == event_name
        for log in logs
    )
    assert any(
        _deep_contains(dict(log.log_record.body), expected_subset) for log in logs
    )

Also applies to: 133-149

213-226: Apply the same robust matching for no-content flow

Mirror the structural assertion approach to avoid brittle id expectations here as well.

Apply this diff:

-    assert_message_in_logs(
+    assert_message_in_logs_contains(
         logs,
         "gen_ai.assistant.message",
         {
             "tool_calls": [
                 {
-                    "id": "call_Joct6NnIJFGGWfvMqZJPRB4C",
+                    "id": ANY,
                     "function": {"name": "tavily_search_results_json"},
                     "type": "function",
                 }
             ]
         },
     )
@@
-    assert_message_in_logs(logs, "gen_ai.choice", choice_event)
+    assert_message_in_logs_contains(logs, "gen_ai.choice", choice_event)

Also applies to: 228-241

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec1b5b6 and a5296a4.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2 hunks)
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (5 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py (3)

packages/opentelemetry-instrumentation-pinecone/tests/conftest.py (1)

instrument (44-46)

packages/opentelemetry-instrumentation-mcp/tests/conftest.py (1)

tracer_provider (32-44)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

meter_provider (45-50)

🔇 Additional comments (2)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)

79-81: Formatting-only change — OK

Multi-line call for OpenAIInstrumentor.instrument is functionally equivalent. No action needed.

packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (1)

104-104: Confirm reduced log count is stable across providers/versions

The expected log count changed to 8. If the intent is to assert “no extra logs”, this is fine. If providers or semconv evolve, consider asserting presence of required events instead of an exact count to avoid brittle failures.

coderabbitai · 2025-08-12T14:28:30Z

+    def before_record_request(request):
+        if hasattr(request, "body") and request.body:
+            import json
+
+            try:
+                if isinstance(request.body, (str, bytes)):
+                    body_str = (
+                        request.body.decode("utf-8")
+                        if isinstance(request.body, bytes)
+                        else request.body
+                    )
+                    body_data = json.loads(body_str)
+                    if "api_key" in body_data:
+                        body_data["api_key"] = "FILTERED"
+                        request.body = json.dumps(body_data)
+            except (json.JSONDecodeError, UnicodeDecodeError, AttributeError):
+                pass
+        return request
+


🛠️ Refactor suggestion

⚠️ Potential issue

Preserve request.body type and redact nested JSON keys (avoid str/bytes regression)

Current logic may change a bytes body to str (Line 167), which can break VCR matching/serialization. It also redacts only top-level "api_key" and misses nested occurrences and other common variants.

Proposed fix: preserve original body type and perform recursive redaction with a small, conservative secret key set (case-insensitive). Optionally, also redact sensitive query params in the same hook.

Apply this diff:

def before_record_request(request): - if hasattr(request, "body") and request.body: - import json - - try: - if isinstance(request.body, (str, bytes)): - body_str = ( - request.body.decode("utf-8") - if isinstance(request.body, bytes) - else request.body - ) - body_data = json.loads(body_str) - if "api_key" in body_data: - body_data["api_key"] = "FILTERED" - request.body = json.dumps(body_data) - except (json.JSONDecodeError, UnicodeDecodeError, AttributeError): - pass - return request + # Redact secrets in JSON body (preserve original type) and in query parameters. + if hasattr(request, "body") and request.body and isinstance(request.body, (str, bytes)): + import json + try: + orig_is_bytes = isinstance(request.body, bytes) + body_str = request.body.decode("utf-8") if orig_is_bytes else request.body + body_data = json.loads(body_str) + + # Conservative, case-insensitive set of sensitive keys + secret_keys = {"api_key", "apikey", "api-key", "access_token"} + + def _redact(obj): + if isinstance(obj, dict): + redacted = {} + for k, v in obj.items(): + k_l = k.lower() if isinstance(k, str) else k + if isinstance(k_l, str) and k_l in secret_keys: + redacted[k] = "FILTERED" + else: + redacted[k] = _redact(v) + return redacted + if isinstance(obj, list): + return [_redact(i) for i in obj] + return obj + + body_data = _redact(body_data) + new_body = json.dumps(body_data) + request.body = new_body.encode("utf-8") if orig_is_bytes else new_body + except (json.JSONDecodeError, UnicodeDecodeError, AttributeError, TypeError): + # Non-JSON or non-decodable bodies are left as-is + pass + + # Redact secrets in query parameters as well + try: + from urllib.parse import urlsplit, urlunsplit, parse_qsl, urlencode + parts = urlsplit(request.uri) + qs = parse_qsl(parts.query, keep_blank_values=True) + redacted_qs = [(k, "FILTERED" if k.lower() in {"api_key", "apikey", "api-key", "access_token"} else v) for k, v in qs] + new_query = urlencode(redacted_qs) + request.uri = urlunsplit((parts.scheme, parts.netloc, parts.path, new_query, parts.fragment)) + except Exception: + pass + + return request

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def before_record_request(request):

if hasattr(request, "body") and request.body:

import json

try:

if isinstance(request.body, (str, bytes)):

body_str = (

request.body.decode("utf-8")

if isinstance(request.body, bytes)

else request.body

)

body_data = json.loads(body_str)

if "api_key" in body_data:

body_data["api_key"] = "FILTERED"

request.body = json.dumps(body_data)

except (json.JSONDecodeError, UnicodeDecodeError, AttributeError):

pass

return request

def before_record_request(request):

# Redact secrets in JSON body (preserve original type) and in query parameters.

if hasattr(request, "body") and request.body and isinstance(request.body, (str, bytes)):

import json

try:

orig_is_bytes = isinstance(request.body, bytes)

body_str = request.body.decode("utf-8") if orig_is_bytes else request.body

body_data = json.loads(body_str)

# Conservative, case-insensitive set of sensitive keys

secret_keys = {"api_key", "apikey", "api-key", "access_token"}

def _redact(obj):

if isinstance(obj, dict):

redacted = {}

for k, v in obj.items():

k_l = k.lower() if isinstance(k, str) else k

if isinstance(k_l, str) and k_l in secret_keys:

redacted[k] = "FILTERED"

else:

redacted[k] = _redact(v)

return redacted

if isinstance(obj, list):

return [_redact(i) for i in obj]

return obj

body_data = _redact(body_data)

new_body = json.dumps(body_data)

request.body = new_body.encode("utf-8") if orig_is_bytes else new_body

except (json.JSONDecodeError, UnicodeDecodeError, AttributeError, TypeError):

# Non-JSON or non-decodable bodies are left as-is

pass

# Redact secrets in query parameters as well

try:

from urllib.parse import urlsplit, urlunsplit, parse_qsl, urlencode

parts = urlsplit(request.uri)

qs = parse_qsl(parts.query, keep_blank_values=True)

redacted_qs = [

(k, "FILTERED" if k.lower() in {"api_key", "apikey", "api-key", "access_token"} else v)

for k, v in qs

]

new_query = urlencode(redacted_qs)

request.uri = urlunsplit((parts.scheme, parts.netloc, parts.path, new_query, parts.fragment))

except Exception:

pass

return request

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-langchain/tests/conftest.py around lines 153 to 171, the before_record_request hook currently decodes bytes to str and reassigns request.body as a str (breaking VCR matching/serialization) and only redacts a top-level "api_key"; instead, retain the original body type (if bytes, operate on a bytes-compatible copy and re-encode before assigning back), parse JSON safely, walk the parsed structure recursively to redact any keys matching a small case-insensitive set of sensitive names (e.g., api_key, apikey, secret, token), and when done serialize back to the same type as the original body; additionally, inspect and redact matching query params on request.url using the same key set, and ensure all exceptions (JSON decode, Unicode) are caught and leave request unchanged on error.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec1b5b6 and bb72946.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2 hunks)
packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (5 hunks)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-langchain/tests/test_agents.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Test Packages (3.12)
GitHub Check: Test Packages (3.11)
GitHub Check: Test Packages (3.10)
GitHub Check: Build Packages (3.11)
GitHub Check: Lint

🔇 Additional comments (6)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2)

79-81: Minor formatting improvement - good for readability.

The multi-line formatting of the OpenAIInstrumentor.instrument() call improves code readability.

153-176: Robust implementation for filtering sensitive data in VCR cassettes.

The before_record_request function properly handles various edge cases:

Handles both string and bytes request bodies

Includes proper error handling for JSON decode errors

Uses safe fallback on exceptions

Filters API keys from JSON request bodies

The addition of explicit match_on configuration will ensure more predictable cassette matching behavior.

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (4)

24-28: Minor formatting improvement with trailing comma.

The addition of trailing commas and consistent indentation improves code maintainability.

38-41: Improved test assertion formatting.

The single-line set comparison is more readable and concise.

85-89: Consistent formatting applied to async test.

Same formatting improvements as the sync version for consistency.

99-101: Consistent assertion formatting in async test.

Matches the formatting improvement made in the sync version.

coderabbitai · 2025-08-12T14:35:25Z

+@pytest.mark.vcr
+def test_langgraph_github_issue_3203_exact_reproduction(
+    instrument_legacy, span_exporter, tracer_provider
+):
+    """Test that exactly reproduces the GitHub issue #3203 with the exact same code structure."""
+    from opentelemetry import trace
+    import asyncio
+    import httpx
+    from langgraph.graph import END, START, StateGraph
+
+    trace.set_tracer_provider(tracer_provider)
+    tracer = trace.get_tracer(__name__)
+
+    class TestAgentState(TypedDict):
+        http_result: str
+        span_result: str
+        messages: list
+
+    async def http_call_node(state: TestAgentState) -> dict:
+        try:
+            data = {"a": 10, "b": 25}
+            async with httpx.AsyncClient() as _:
+                with tracer.start_as_current_span("POST") as span:
+                    span.set_attribute("http.method", "POST")
+                    span.set_attribute("http.url", "https://httpbin.org/post")
+                    sum_result = data.get("a", 0) + data.get("b", 0)
+                    http_result = f"HTTP call successful! Sum of {data.get('a')} + {data.get('b')} = {sum_result}"
+
+                    span.set_attribute("http.response.status_code", 200)
+                    span.set_attribute("calculation.result", sum_result)
+
+        except Exception as e:
+            http_result = f"HTTP call error: {str(e)}"
+
+        return {"http_result": http_result}
+
+    async def opentelemetry_span_node(state: TestAgentState) -> dict:
+        with tracer.start_as_current_span("test_agent_span") as span:
+            span.set_attribute("node.name", "opentelemetry_span_node")
+            span.set_attribute("agent.type", "test_agent")
+            span.set_attribute("operation.type", "span_creation")
+
+            span.add_event("Starting span processing")
+
+            await asyncio.sleep(0.01)
+
+            http_result = state.get("http_result", "No HTTP result available")
+            span.set_attribute("previous.http_result", http_result)
+
+            span.add_event("Processing HTTP result from previous node")
+
+            span_result = f"OpenTelemetry span created successfully! Span ID: {span.get_span_context().span_id}"
+
+            span.add_event("Span processing completed")
+            span.set_attribute("processing.status", "completed")
+
+        return {"span_result": span_result}
+
+    def create_test_agent():
+        """Create a simple LangGraph agent with 2 nodes matching the GitHub issue exactly."""
+        builder = StateGraph(TestAgentState)
+
+        builder.add_node("http_call", http_call_node)
+        builder.add_node("otel_span", opentelemetry_span_node)
+
+        builder.add_edge(START, "http_call")
+        builder.add_edge("http_call", "otel_span")
+        builder.add_edge("otel_span", END)
+
+        agent = builder.compile()
+        return agent
+
+    async def run_test_agent():
+        with tracer.start_as_current_span("test_agent_execution_root") as root_span:
+            root_span.set_attribute("agent.name", "test_agent")
+            root_span.set_attribute("agent.version", "1.0.0")
+            root_span.set_attribute("execution.type", "full_agent_run")
+
+            root_span.add_event("Agent execution started")
+
+            try:
+                root_span.add_event("Creating agent graph")
+                agent = create_test_agent()
+                root_span.set_attribute("agent.nodes_count", 2)
+
+                initial_state = {"http_result": "", "span_result": "", "messages": []}
+                root_span.add_event("Initial state prepared")
+
+                root_span.add_event("Starting agent invocation")
+                final_state = await agent.ainvoke(initial_state)
+
+                root_span.set_attribute("execution.status", "completed")
+                return final_state
+
+            except Exception as e:
+                root_span.set_attribute("execution.status", "failed")
+                root_span.set_attribute("error.type", type(e).__name__)
+                root_span.set_attribute("error.message", str(e))
+                root_span.add_event("Agent execution failed", {"error": str(e)})
+                raise
+
+    final_state = asyncio.run(run_test_agent())
+
+    assert "http_result" in final_state
+    assert "span_result" in final_state
+    assert "Sum of 10 + 25 = 35" in final_state["http_result"]
+
+    spans = span_exporter.get_finished_spans()
+    span_names = [span.name for span in spans]
+
+    print(f"\nCaptured {len(spans)} spans:")
+    for span in spans:
+        parent_name = "None"
+        if span.parent:
+            parent_span = next(
+                (s for s in spans if s.context.span_id == span.parent.span_id), None
+            )
+            if parent_span:
+                parent_name = parent_span.name
+            else:
+                parent_name = f"Unknown({span.parent.span_id})"
+        print(f"  - {span.name} (parent: {parent_name})")
+
+    assert "test_agent_execution_root" in span_names
+    assert "POST" in span_names
+    assert "test_agent_span" in span_names
+    assert "http_call.task" in span_names
+    assert "otel_span.task" in span_names
+    assert "LangGraph.workflow" in span_names
+
+    root_span = next(span for span in spans if span.name == "test_agent_execution_root")
+    post_span = next(span for span in spans if span.name == "POST")
+    test_agent_span = next(span for span in spans if span.name == "test_agent_span")
+    http_call_task_span = next(span for span in spans if span.name == "http_call.task")
+    otel_span_task_span = next(span for span in spans if span.name == "otel_span.task")
+    workflow_span = next(span for span in spans if span.name == "LangGraph.workflow")
+
+    print("\nHierarchy check:")
+    print(f"POST parent: {post_span.parent.span_id if post_span.parent else 'None'}")
+    print(f"http_call.task ID: {http_call_task_span.context.span_id}")
+    print(
+        f"test_agent_span parent: {test_agent_span.parent.span_id if test_agent_span.parent else 'None'}"
+    )
+    print(f"otel_span.task ID: {otel_span_task_span.context.span_id}")
+
+    assert (
+        post_span.parent.span_id == http_call_task_span.context.span_id
+    ), "POST span should be child of http_call.task span"
+    assert (
+        test_agent_span.parent.span_id == otel_span_task_span.context.span_id
+    ), "test_agent_span should be child of otel_span.task span"
+
+    assert http_call_task_span.parent.span_id == workflow_span.context.span_id
+    assert otel_span_task_span.parent.span_id == workflow_span.context.span_id
+    assert workflow_span.parent.span_id == root_span.context.span_id


💡 Verification agent

🧩 Analysis chain

Comprehensive test reproduction for GitHub issue #3203.

This is an excellent reproduction test that validates the span hierarchy fix. The test:

Creates a realistic scenario with HTTP calls and OpenTelemetry spans within LangGraph nodes

Validates the complete span hierarchy ensuring proper parent-child relationships

Includes detailed assertions for span nesting and context propagation

Provides debugging output with span hierarchy visualization

The test structure effectively exercises the callback handler fixes for async context handling and span attachment.

However, there are a few considerations:

Let me verify if the async HTTP client usage could be simplified since it's not actually making real HTTP calls:

🏁 Script executed:

#!/bin/bash # Check if httpx is being used elsewhere in the codebase and if it's a dependency echo "Checking httpx usage in the codebase:" rg -l "httpx" --type py echo -e "\nChecking if httpx is declared as a dependency:" fd -t f -e txt -e cfg -e toml -e yaml -e yml -e json | xargs grep -l "httpx" 2>/dev/null || echo "No httpx dependency declarations found"

Length of output: 6561

Add httpx to the project’s test/dev dependencies

The tracing tests (e.g., packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py) rely on httpx under @pytest.mark.vcr to record and replay HTTP interactions. Although the client isn’t making “real” calls at runtime, it’s essential for the VCR setup and is used across many instrumentation tests.

• Confirm and update your dependency declarations (e.g., in pyproject.toml, setup.cfg/setup.py, or requirements-dev.txt) to include:

[tool.poetry.dev-dependencies] httpx = "^YOUR_SUPPORTED_VERSION"

—or—

[dev-packages] httpx = version = "*"

—or—

# requirements-dev.txt httpx>=x.y.z

• You can ignore the suggestion to “simplify” the HTTPX usage here, as the real calls are captured/replayed via VCR.

🤖 Prompt for AI Agents

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py lines 188-342: the test uses httpx under pytest.mark.vcr but httpx is not declared in the project test/dev dependencies, causing test runs to fail in CI; add httpx to the repository's test/dev dependency declarations (pyproject.toml under [tool.poetry.dev-dependencies], setup.cfg/setup.py extras or requirements-dev.txt) with a supported version range consistent with other tests, update lockfile or run your dependency manager (poetry lock/install or pip install -r requirements-dev.txt) and re-run tests to ensure httpx is available for VCR playback.

…anggraph-nesting-issue

ellipsis-dev

Important

Looks good to me! 👍

Reviewed 5bd2330 in 2 minutes and 25 seconds. Click for details.

Reviewed 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:172

Draft comment:
Removed 'body' from match_on. Ensure that request bodies are not needed for VCR matching in tests.
Reason this comment was not posted:
Comment looked like it was already resolved.

2. packages/opentelemetry-instrumentation-langchain/tests/conftest.py:173

Draft comment:
Removed filter_query_parameters for API keys. Confirm that query parameters do not leak sensitive info.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a test configuration file, not production code. The code is specifically about VCR test recording configuration. The removal of filter_query_parameters suggests the author intentionally wants to record query parameters in tests. Security in test recordings is important, but the author has already demonstrated security awareness through other filters. The comment raises a valid security concern. Sensitive data in query parameters could potentially be recorded in tests. However, this is a test configuration where the author has already shown careful consideration of security by filtering headers and body content. The change appears intentional for test purposes. The comment should be deleted as it asks for confirmation of intention ("Confirm that...") which violates our rules, and it's speculative about potential issues rather than pointing out a definite problem.

Workflow ID: wflow_cBifz1dXyjBr0oE8

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)
188-342: Comprehensive test reproduction for GitHub issue #3203.

This test effectively validates the span hierarchy fix by:

Creating realistic scenarios with HTTP calls and OpenTelemetry spans within LangGraph nodes

Validating the complete span hierarchy with proper parent-child relationships

Including debugging output for span hierarchy visualization

The test structure properly exercises the callback handler fixes for async context handling and span attachment.

Verify that httpx is declared in the project's test/dev dependencies:
#!/bin/bash
# Check if httpx is declared in dependency files
echo "Checking for httpx in dependency declarations:"
fd -t f "(pyproject.toml|setup.py|setup.cfg|requirements.*\.txt)" | xargs grep -l "httpx" 2>/dev/null

# Check the specific package's dependencies
echo -e "\nChecking package-specific dependencies:"
cat packages/opentelemetry-instrumentation-langchain/pyproject.toml 2>/dev/null | grep -A 10 "\[tool.poetry.dev-dependencies\]"
cat packages/opentelemetry-instrumentation-langchain/requirements-dev.txt 2>/dev/null | grep "httpx"

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

190-195: Consider using contextlib.suppress for cleaner exception handling.

While the current implementation is functionally correct and well-documented, using contextlib.suppress would be more idiomatic.

+from contextlib import suppress
+
 def _end_span(self, span: Span, run_id: UUID) -> None:
     for child_id in self.spans[run_id].children:
         if child_id in self.spans:
             child_span = self.spans[child_id].span
             if child_span.end_time is None:  # avoid warning on ended spans
                 child_span.end()
     span.end()
     token = self.spans[run_id].token
     if token:
-        try:
-            context_api.detach(token)
-        except ValueError:
-            # Context detach can fail in async scenarios when tokens are created in different contexts
-            # This is expected behavior and doesn't affect the correct span hierarchy
-            pass
+        # Context detach can fail in async scenarios when tokens are created in different contexts
+        # This is expected behavior and doesn't affect the correct span hierarchy
+        with suppress(ValueError):
+            context_api.detach(token)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb72946 and 293b501.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (2 hunks)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (2 hunks)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-langchain/tests/conftest.py

🧰 Additional context used

🧬 Code Graph Analysis (1)

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1)

packages/opentelemetry-instrumentation-langchain/langgraph_with_otel_sdk.py (5)

TestAgentState (45-48)

http_call_node (50-72)

opentelemetry_span_node (74-99)

create_test_agent (101-115)

run_test_agent (117-148)

🪛 Ruff (0.12.2)

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py

190-195: Use contextlib.suppress(ValueError) instead of try-except-pass

(SIM105)

🔇 Additional comments (3)

packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)

236-236: Good fix for span context attachment.

Making the token attachment unconditional ensures consistent span context propagation in both sync and async scenarios, which is critical for maintaining proper span hierarchy in LangGraph flows.

packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (2)

38-40: Formatting improvement for readability.

The set comprehension is now more readable on a single line.

Also applies to: 99-101

71-72: Good: Enabling async LangGraph test.

Removing the @pytest.mark.xfail decorator enables this important async test to validate that the context propagation fixes work correctly in asynchronous scenarios.

fix(langchain): fix nesting of langgraph spans

3eb14c5

galkleinman mentioned this pull request Aug 1, 2025

🐛 Bug Report: Trace graph is incorrectly stitched for Langgraph agents. #3203

Open

1 task

galkleinman added 2 commits August 12, 2025 15:35

Merge branch 'main' of github.com:traceloop/openllmetry into gk/fix-l…

40fb725

…anggraph-nesting-issue

lint

5ff9bbf

galkleinman marked this pull request as ready for review August 12, 2025 13:07

ellipsis-dev Bot reviewed Aug 12, 2025

View reviewed changes

galkleinman added 2 commits August 12, 2025 16:11

remove redunant change

15476d1

Merge branch 'main' of github.com:traceloop/openllmetry into gk/fix-l…

644435f

…anggraph-nesting-issue

ellipsis-dev Bot reviewed Aug 12, 2025

View reviewed changes

coderabbitai Bot reviewed Aug 12, 2025

View reviewed changes

ellipsis-dev Bot reviewed Aug 12, 2025

View reviewed changes

coderabbitai Bot reviewed Aug 12, 2025

View reviewed changes

fix tests

a5296a4

galkleinman force-pushed the gk/fix-langgraph-nesting-issue branch from ec1b5b6 to a5296a4 Compare August 12, 2025 14:09

ellipsis-dev Bot reviewed Aug 12, 2025

View reviewed changes

fix failing test

bb72946

coderabbitai Bot reviewed Aug 12, 2025

View reviewed changes

traceloop deleted a comment from coderabbitai Bot Aug 12, 2025

coderabbitai Bot reviewed Aug 12, 2025

View reviewed changes

galkleinman added 3 commits August 12, 2025 17:41

fix conftest

2136589

Merge branch 'main' of github.com:traceloop/openllmetry into gk/fix-l…

293b501

…anggraph-nesting-issue

revert conftest

5bd2330

ellipsis-dev Bot reviewed Aug 12, 2025

View reviewed changes

coderabbitai Bot reviewed Aug 12, 2025

View reviewed changes

nirga approved these changes Aug 12, 2025

View reviewed changes

nirga merged commit 3bb39f7 into main Aug 12, 2025
9 checks passed

coderabbitai Bot mentioned this pull request Aug 13, 2025

fix(langchain): context detach exception #3255

Merged

This was referenced Aug 15, 2025

chore(deps): various vulnerabilities resolution #3280

Open

fix(openai-agents): switch to hook-based instrumentation #3283

Merged

coderabbitai Bot mentioned this pull request Sep 21, 2025

fix(langchain): span attrs and metrics missing of langchain third party integration #3391

Open

4 tasks

This was referenced Jan 29, 2026

Add support for LangGraph Framework traceloop/openllmetry-js#476

Open

PRD: Datadog Observability wiggitywhitney/cluster-whisperer#8

Closed

-    def before_record_request(request):
-        if hasattr(request, "body") and request.body:
-            import json
-            try:
-                if isinstance(request.body, (str, bytes)):
-                    body_str = (
-                        request.body.decode("utf-8")
-                        if isinstance(request.body, bytes)
-                        else request.body
-                    )
-                    body_data = json.loads(body_str)
-                    if "api_key" in body_data:
-                        body_data["api_key"] = "FILTERED"
-                        request.body = json.dumps(body_data)
-            except (json.JSONDecodeError, UnicodeDecodeError, AttributeError):
-                pass
-        return request
+    def before_record_request(request):
+        # Redact secrets in JSON body (preserve original type) and in query parameters.
+        if hasattr(request, "body") and request.body and isinstance(request.body, (str, bytes)):
+            import json
+            try:
+                orig_is_bytes = isinstance(request.body, bytes)
+                body_str = request.body.decode("utf-8") if orig_is_bytes else request.body
+                body_data = json.loads(body_str)
+                # Conservative, case-insensitive set of sensitive keys
+                secret_keys = {"api_key", "apikey", "api-key", "access_token"}
+                def _redact(obj):
+                    if isinstance(obj, dict):
+                        redacted = {}
+                        for k, v in obj.items():
+                            k_l = k.lower() if isinstance(k, str) else k
+                            if isinstance(k_l, str) and k_l in secret_keys:
+                                redacted[k] = "FILTERED"
+                            else:
+                                redacted[k] = _redact(v)
+                        return redacted
+                    if isinstance(obj, list):
+                        return [_redact(i) for i in obj]
+                    return obj
+                body_data = _redact(body_data)
+                new_body = json.dumps(body_data)
+                request.body = new_body.encode("utf-8") if orig_is_bytes else new_body
+            except (json.JSONDecodeError, UnicodeDecodeError, AttributeError, TypeError):
+                # Non-JSON or non-decodable bodies are left as-is
+                pass
+        # Redact secrets in query parameters as well
+        try:
+            from urllib.parse import urlsplit, urlunsplit, parse_qsl, urlencode
+            parts = urlsplit(request.uri)
+            qs = parse_qsl(parts.query, keep_blank_values=True)
+            redacted_qs = [
+                (k, "FILTERED" if k.lower() in {"api_key", "apikey", "api-key", "access_token"} else v)
+                for k, v in qs
+            ]
+            new_query = urlencode(redacted_qs)
+            request.uri = urlunsplit((parts.scheme, parts.netloc, parts.path, new_query, parts.fragment))
+        except Exception:
+            pass
+        return request

Conversation

galkleinman commented Jul 31, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

gitguardian Bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

️✅ There are no secrets present in this pull request anymore.

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot commented Aug 12, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

galkleinman commented Jul 31, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 31, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

gitguardian Bot commented Aug 12, 2025 •

edited

Loading