Skip to content

fix(langchain): context detach exception#3255

Merged
nirga merged 2 commits intomainfrom
gk/fix-langgraph-context-issue
Aug 13, 2025
Merged

fix(langchain): context detach exception#3255
nirga merged 2 commits intomainfrom
gk/fix-langgraph-context-issue

Conversation

@galkleinman
Copy link
Copy Markdown
Contributor

@galkleinman galkleinman commented Aug 13, 2025

Important

Fix context detachment issue in async scenarios by modifying _end_span() in callback_handler.py and adding a test in test_langgraph.py.

  • Behavior:
    • Modify _end_span() in callback_handler.py to use _RUNTIME_CONTEXT.detach() instead of context_api.detach() to avoid logging errors.
    • Extend exception handling in _end_span() to include RuntimeError and Exception.
  • Testing:
    • Add test_context_detachment_error_handling() in test_langgraph.py to verify context detachment errors are handled without logging.

This description was created by Ellipsis for b8b6bef. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • Bug Fixes

    • Improved tracing context handling to prevent spurious detachment errors and reduce noisy logs during async workflows, enhancing stability and preserving correct span hierarchies.
  • Tests

    • Added coverage to validate no detachment errors under concurrent async execution and to verify expected span counts and parent-child relationships, ensuring reliable behavior in complex workflows.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 13, 2025

Walkthrough

Replaces context_api.detach with _RUNTIME_CONTEXT.detach and broadens exception handling in callback_handler._end_span. Adds a new test validating context detachment error handling under concurrent async LangGraph execution; the test appears duplicated identically within the file.

Changes

Cohort / File(s) Summary
Instrumentation: context detach path
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Switches span context detachment to opentelemetry.context._RUNTIME_CONTEXT.detach; expands caught exceptions to (ValueError, RuntimeError, Exception); comments updated; no other logic changes.
Tests: LangGraph detachment handling
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py
Adds test_context_detachment_error_handling to assert no detachment errors and correct span hierarchy under concurrent async runs; identical test function is added twice.

Sequence Diagram(s)

sequenceDiagram
  participant App as LangGraph workflow
  participant Handler as CallbackHandler._end_span
  participant RT as _RUNTIME_CONTEXT
  participant OTel as Tracer/Exporter

  App->>Handler: end_span(span, token)
  Handler->>RT: detach(token)
  alt detach succeeds
    RT-->>Handler: ok
    Handler->>OTel: span.end()
  else detach raises
    RT-->>Handler: error
    note over Handler: Swallow exception
    Handler->>OTel: span.end()
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Poem

A rabbit unhooks a context thread,
With gentle paws, no errors spread.
Spans hop home, neatly aligned,
Even when detach falls behind.
Two tests echoed, thump-thump cheer—
Trace burrows tidy, crystal clear. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gk/fix-langgraph-context-issue

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@galkleinman galkleinman marked this pull request as ready for review August 13, 2025 16:47
Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to b8b6bef in 2 minutes and 15 seconds. Click for details.
  • Reviewed 168 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:190
  • Draft comment:
    Using the private _RUNTIME_CONTEXT to detach the token avoids unwanted error logging, but it relies on an internal API. If no public alternative exists, that's acceptable; however, please add a note about the potential fragility if the internal API changes.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment makes a valid point about private API usage, but the code already has a clear comment explaining why this approach was chosen. The suggested change is to add a warning comment, but this doesn't materially improve the code. The existing comment already documents the intent and tradeoff. Adding another comment about API fragility would be redundant since using a private API inherently implies potential fragility. The comment raises a legitimate concern about private API usage that could cause future maintenance issues. Documentation about potential breakage could help future maintainers. While the concern is valid, the existing comment already explains why this approach was chosen. The private API nature is obvious from the underscore prefix, and adding another warning comment would be redundant without providing actionable guidance. Delete the comment. While it raises a valid concern, the code already has sufficient documentation explaining the tradeoff, and adding another warning comment would be redundant without providing actionable value.
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:194
  • Draft comment:
    Catching Exception (in addition to ValueError and RuntimeError) may hide unexpected issues. If this broad catch is intended to ensure robustness in async contexts, consider documenting why all exceptions are silently ignored.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
3. packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py:343
  • Draft comment:
    The new test for context detachment error handling is comprehensive and simulates concurrent asynchronous operations well. It verifies that no 'Failed to detach context' errors are logged. Ensure that future maintainers understand the intent by keeping the inline comments clear.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It praises the test and suggests keeping inline comments clear, which is not specific or actionable enough to be useful.

Workflow ID: wflow_8DJHiTBMzu84OWxy

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (2)

458-470: Make log assertion resilient to upstream message wording changes

Matching the exact message string is brittle. Filter by logger name and semantic keyword to future-proof the assertion while still guarding against detach errors.

Apply this diff:

-        error_logs = [
-            record.message
-            for record in caplog.records
-            if record.levelno >= logging.ERROR
-        ]
-        context_errors = [
-            msg for msg in error_logs if "Failed to detach context" in msg
-        ]
-
-        assert len(context_errors) == 0, (
-            f"Found {len(context_errors)} context detachment errors in logs. "
-            f"This indicates the fix is not working properly. Errors: {context_errors}"
-        )
+        context_error_records = [
+            r
+            for r in caplog.records
+            if r.levelno >= logging.ERROR
+            and r.name == "opentelemetry.context"
+            and "detach" in r.getMessage()
+        ]
+        assert not context_error_records, (
+            f"Found {len(context_error_records)} context detachment errors in logs. "
+            f"This indicates the fix is not working properly. Errors: "
+            f"{[r.getMessage() for r in context_error_records]}"
+        )

472-481: Reduce O(n^2) parent lookup to O(n) for readability and efficiency

Create an index of spans by span_id to simplify the parent assertions and avoid repeated scans.

Apply this diff:

-        for nested_span in nested_spans:
-            assert nested_span.parent is not None, "Nested spans should have parents"
-            parent_span = next(
-                (s for s in spans if s.context.span_id == nested_span.parent.span_id),
-                None,
-            )
-            assert parent_span is not None, "Parent span should exist"
-            assert (
-                parent_span.name == "concurrent_async_span"
-            ), "Nested span should be child of concurrent_async_span"
+        spans_by_id = {s.context.span_id: s for s in spans}
+        for nested_span in nested_spans:
+            assert nested_span.parent is not None, "Nested spans should have parents"
+            parent_span = spans_by_id.get(nested_span.parent.span_id)
+            assert parent_span is not None, "Parent span should exist"
+            assert parent_span.name == "concurrent_async_span", (
+                "Nested span should be child of concurrent_async_span"
+            )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9603e1b and b8b6bef.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1 hunks)
  • packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (2)
packages/opentelemetry-instrumentation-langchain/tests/conftest.py (1)
  • instrument_legacy (77-95)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)
  • get_finished_spans (40-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint
  • GitHub Check: Build Packages (3.11)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (2)

343-356: Good, targeted async-concurrency test to exercise context detachment paths

The test sets up realistic async/concurrent spans and validates no error logs. This directly covers the regressions observed around detach. Looks solid.


343-471: Confirmed: Unique Test and Exporter Reset per Test

  • No duplicate definitions of test_context_detachment_error_handling were found in test_langgraph.py.
  • Although span_exporter is session-scoped, the clear_exporter autouse fixture clears it before each test, ensuring a fresh exporter for span-count assertions.

No changes needed here.

Comment on lines +191 to 197
# Use the runtime context directly to avoid logging from context_api.detach()
from opentelemetry.context import _RUNTIME_CONTEXT
_RUNTIME_CONTEXT.detach(token)
except (ValueError, RuntimeError, Exception):
# Context detach can fail in async scenarios when tokens are created in different contexts
# This includes ValueError, RuntimeError, and other context-related exceptions
# This is expected behavior and doesn't affect the correct span hierarchy
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid importing and using private _RUNTIME_CONTEXT inline; add a safe, module-level fallback and narrow exception scope

Importing a private symbol inside the method on every call is brittle and adds overhead. Also, catching broad Exception can hide unrelated bugs. Prefer a module-level optional import (with fallback to context_api.detach) and only catch the expected detach errors.

Apply this diff within the selected range:

-            try:
-                # Use the runtime context directly to avoid logging from context_api.detach()
-                from opentelemetry.context import _RUNTIME_CONTEXT
-                _RUNTIME_CONTEXT.detach(token)
-            except (ValueError, RuntimeError, Exception):
-                # Context detach can fail in async scenarios when tokens are created in different contexts
-                # This includes ValueError, RuntimeError, and other context-related exceptions
-                # This is expected behavior and doesn't affect the correct span hierarchy
-                pass
+            try:
+                # Prefer the runtime context to avoid error-level logging in context_api.detach()
+                if _OTEL_RUNTIME_CONTEXT is not None:
+                    _OTEL_RUNTIME_CONTEXT.detach(token)
+                else:
+                    context_api.detach(token)
+            except (ValueError, RuntimeError):
+                # Detach can fail when token belongs to a different async context; ignore.
+                pass

And at the top of the file (module scope), add a safe optional import:

# Prefer runtime context when available to avoid logging in context_api.detach()
try:
    from opentelemetry.context import _RUNTIME_CONTEXT as _OTEL_RUNTIME_CONTEXT  # type: ignore[attr-defined]
except Exception:
    _OTEL_RUNTIME_CONTEXT = None  # type: ignore[assignment]

Rationale:

  • _RUNTIME_CONTEXT is a private API; guarding with a fallback limits blast radius if the upstream package changes.
  • Narrowing to (ValueError, RuntimeError) prevents masking unexpected failures while still addressing the async token-mismatch cases.
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
around lines 191-197, replace the inline import and broad Exception catch: add a
module-level optional import of _RUNTIME_CONTEXT (named _OTEL_RUNTIME_CONTEXT)
at the top of the file with a safe fallback to None, then in this block use
_OTEL_RUNTIME_CONTEXT.detach(token) when available and otherwise call
context_api.detach(token); also narrow the except clause to only catch
ValueError and RuntimeError so unrelated exceptions are not swallowed.

@nirga nirga merged commit c7e309c into main Aug 13, 2025
9 checks passed
@nirga nirga deleted the gk/fix-langgraph-context-issue branch August 13, 2025 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants