fix(langchain): langgraph application crash due to context detach#3256
fix(langchain): langgraph application crash due to context detach#3256galkleinman merged 2 commits intomainfrom
Conversation
WalkthroughAdds guarded context attach/detach operations across LangChain instrumentation. Introduces safe helper methods in the callback handler and wraps SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY updates with try/except blocks. Adjusts span creation and end paths to tolerate context API failures without raising, while leaving public APIs unchanged. Changes
Sequence Diagram(s)sequenceDiagram
participant LangChain
participant CallbackHandler
participant ContextAPI
participant Tracer/Span
LangChain->>CallbackHandler: _create_span(...)
CallbackHandler->>ContextAPI: safe attach span context
alt attach fails
CallbackHandler->>CallbackHandler: proceed without token
end
CallbackHandler->>Tracer/Span: create/start span
LangChain->>CallbackHandler: _create_llm_span(...)
CallbackHandler->>ContextAPI: set SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY (guarded)
alt suppress set fails
CallbackHandler->>CallbackHandler: continue without suppression token
end
LangChain->>CallbackHandler: _end_span(token)
CallbackHandler->>ContextAPI: safe detach(token)
alt detach fails
CallbackHandler->>CallbackHandler: swallow error
end
CallbackHandler->>Tracer/Span: end span
sequenceDiagram
participant Caller
participant OpenAITracingWrapper
participant ContextAPI
Caller->>OpenAITracingWrapper: __call__(...)
OpenAITracingWrapper->>ContextAPI: set/attach suppression key (guarded)
alt failure
OpenAITracingWrapper->>OpenAITracingWrapper: continue call without suppression
end
OpenAITracingWrapper->>Caller: result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 3666eaf in 55 seconds. Click for details.
- Reviewed
160lines of code in2files - Skipped
0files when reviewing. - Skipped posting
6draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py:242
- Draft comment:
Wraps context API attach in a try/except to avoid crash. Consider logging the caught exception at debug level for extra diagnostics. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:194
- Draft comment:
The new _safe_attach_context helper cleanly wraps context attachment. Optionally log the exception on failure to aid debugging. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
3. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:208
- Draft comment:
The _safe_detach_context method gracefully handles context detach errors. Adding optional debug logging for exceptions might help diagnose context issues in async scenarios. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
4. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:261
- Draft comment:
The association properties update is now wrapped in a try/except block. Consider logging the exception at a debug level to record if the context update fails. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
5. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:363
- Draft comment:
In _create_llm_span, the try/except for setting the suppression flag is consistent. Consider adding optional debug logging if context setting fails. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
6. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:461
- Draft comment:
The on_chain_end method resets the suppression flag within a try/except block. Similar to other places, consider logging exceptions to ease future troubleshooting. - Reason this comment was not posted:
Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_TYJUKATW0BFQh1D8
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed 8caba09 in 51 seconds. Click for details.
- Reviewed
57lines of code in1files - Skipped
0files when reviewing. - Skipped posting
3draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:194
- Draft comment:
Minor docstring whitespace cleanup in _safe_attach_context; looks fine. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:221
- Draft comment:
Docstring formatting and inline comment cleanup in _safe_detach_context improve readability. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
3. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:564
- Draft comment:
Reformatted association_properties block in on_llm_end for improved readability. - Reason this comment was not posted:
Confidence changes required:0%<= threshold50%None
Workflow ID: wflow_CSdr3vNvUD2WmWxH
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py (1)
245-255: Attach without detach leaks suppression context; use a token and detach in finally (also satisfies Ruff SIM105).Current code sets SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY but never restores the previous context, potentially suppressing downstream instrumentation longer than intended. Wrap the call with a token and detach in a finally block. Also replace try/except/pass with contextlib.suppress per SIM105.
Apply this diff within call:
- # In legacy chains like LLMChain, suppressing model instrumentations - # within create_llm_span doesn't work, so this should helps as a fallback - try: - context_api.attach( - context_api.set_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, True) - ) - except Exception: - # If context setting fails, continue without suppression - # This is not critical for core functionality - pass - - return wrapped(*args, **kwargs) + # In legacy chains like LLMChain, suppressing model instrumentations + # within create_llm_span doesn't work, so this helps as a fallback. + token = None + with contextlib.suppress(Exception): + token = context_api.attach( + context_api.set_value( + SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, True + ) + ) + try: + return wrapped(*args, **kwargs) + finally: + if token: + with contextlib.suppress(Exception): + context_api.detach(token)Additionally, add the missing import at the top of the file:
import contextlibpackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
283-299: Critical: span context token is overwritten by suppression token → leaked current-span context and broken LIFO detach.In _create_span you now attach the span to context and store its token in SpanHolder. In _create_llm_span you attach a suppression token and then overwrite self.spans[run_id] with a new SpanHolder that stores the suppression token, losing the span token. _end_span detaches only one token, so the span’s context token will leak and may corrupt parentage of subsequent spans.
Fix by:
- Preserving the span token from _create_span.
- Tracking the suppression token separately and detaching it first (then the span token, then the association token if any) to respect LIFO.
Apply these diffs:
- Detach suppression, then span token, then association token in _end_span:
span.end() - token = self.spans[run_id].token - if token: - self._safe_detach_context(token) + # Detach in LIFO order: suppression -> span -> association + supp_token = getattr(self, "_suppression_tokens", {}).pop(run_id, None) + if supp_token: + self._safe_detach_context(supp_token) + + token = self.spans[run_id].token # span context token + if token: + self._safe_detach_context(token) + + assoc_token = getattr(self, "_association_tokens", {}).pop(run_id, None) + if assoc_token: + self._safe_detach_context(assoc_token)
- Store only the suppression token (do not overwrite SpanHolder) in _create_llm_span:
- # we already have an LLM span by this point, - # so skip any downstream instrumentation from here - try: - token = context_api.attach( - context_api.set_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, True) - ) - except Exception: - # If context setting fails, continue without suppression token - token = None - - self.spans[run_id] = SpanHolder( - span, token, None, [], workflow_name, None, entity_path - ) + # We already have an LLM span; suppress downstream instrumentation from here + supp_token = None + with contextlib.suppress(Exception): + supp_token = context_api.attach( + context_api.set_value( + SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, True + ) + ) + # Track suppression token separately to detach it in LIFO order on end + if supp_token is not None: + self._suppression_tokens[run_id] = supp_token
- Track association_properties context token in _create_span for proper cleanup later:
- try: - context_api.attach( - context_api.set_value( - "association_properties", - {**current_association_properties, **sanitized_metadata}, - ) - ) - except Exception: - # If setting association properties fails, continue without them - # This doesn't affect the core span functionality - pass + with contextlib.suppress(Exception): + assoc_token = context_api.attach( + context_api.set_value( + "association_properties", + {**current_association_properties, **sanitized_metadata}, + ) + ) + # Store token for LIFO-detach at end of run + if assoc_token is not None: + self._association_tokens[run_id] = assoc_tokenAdditions required outside these hunks:
# In TraceloopCallbackHandler.__init__: self._suppression_tokens: dict[UUID, object] = {} self._association_tokens: dict[UUID, object] = {}And ensure contextlib is imported at the top of this file:
import contextlibThis preserves correct context stack behavior and prevents leaks across async runs.
Also applies to: 362-374, 188-193
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (2)
194-207: Annotate return type of _safe_attach_context for clarity.Explicitly annotate the token type as Optional[object] to improve readability and help static analysis.
- def _safe_attach_context(self, span: Span): + def _safe_attach_context(self, span: Span) -> Optional[object]:
462-470: Use contextlib.suppress instead of try/except/pass (Ruff SIM105).This resets suppression without emitting errors, consistent with the resilience goals.
- try: - context_api.attach( - context_api.set_value( - SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, False - ) - ) - except Exception: - # If context reset fails, it's not critical for functionality - pass + with contextlib.suppress(Exception): + context_api.attach( + context_api.set_value( + SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, False + ) + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py(1 hunks)packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.py: Python code must conform to Flake8 linting rules
Do not hardcode API keys in source code; read them from environment variables or a secure vault
Files:
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.pypackages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
🧬 Code Graph Analysis (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py (1)
_extract_model_name_from_association_metadata(373-376)
🪛 Ruff (0.12.2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py
245-252: Use contextlib.suppress(Exception) instead of try-except-pass
(SIM105)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
262-272: Use contextlib.suppress(Exception) instead of try-except-pass
(SIM105)
462-470: Use contextlib.suppress(Exception) instead of try-except-pass
(SIM105)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Packages (3.12)
- GitHub Check: Test Packages (3.11)
- GitHub Check: Lint
- GitHub Check: Test Packages (3.10)
- GitHub Check: Build Packages (3.11)
🔇 Additional comments (2)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (2)
568-573: LGTM: safer, clearer association properties retrieval.The multi-line assignment improves readability without changing behavior.
241-300: SpanHolder is expandable and context attach is safe – please audit callers of_create_span
- Verified that
SpanHolder(packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/span_utils.py) is a@dataclasswithout__slots__, so it can safely be extended with additional fields.- The updated
_create_spanincallback_handler.py(lines 241–300) now attaches the new span to the current context, altering the previous “non-current” span pattern. The two-token detach in_safe_attach_contextshould restore the original context correctly.- Please review any callers of
_create_spanto ensure none relied on the old behavior of leaving the span non-current.
| def _safe_detach_context(self, token): | ||
| """ | ||
| Safely detach context token without causing application crashes. | ||
|
|
||
| This method implements a fail-safe approach to context detachment that handles | ||
| all known edge cases in async/concurrent scenarios where context tokens may | ||
| become invalid or be detached in different execution contexts. | ||
|
|
||
| We use the runtime context directly to avoid logging errors from context_api.detach() | ||
| """ | ||
| if not token: | ||
| return | ||
|
|
||
| try: | ||
| # Use the runtime context directly to avoid error logging from context_api.detach() | ||
| from opentelemetry.context import _RUNTIME_CONTEXT | ||
|
|
||
| _RUNTIME_CONTEXT.detach(token) | ||
| except Exception: | ||
| # Context detach can fail in async scenarios when tokens are created in different contexts | ||
| # This includes ValueError, RuntimeError, and other context-related exceptions | ||
| # This is expected behavior and doesn't affect the correct span hierarchy | ||
| # | ||
| # Common scenarios where this happens: | ||
| # 1. Token created in one async task/thread, detached in another | ||
| # 2. Context was already detached by another process | ||
| # 3. Token became invalid due to context switching | ||
| # 4. Race conditions in highly concurrent scenarios | ||
| # | ||
| # This is safe to ignore as the span itself was properly ended | ||
| # and the tracing data is correctly captured. | ||
| pass | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Prefer contextlib.suppress and avoid relying solely on private _RUNTIME_CONTEXT.
Using a private API may break across OTel versions. Try context_api.detach first, then fall back to _RUNTIME_CONTEXT only if needed. Also adopt contextlib.suppress to satisfy linting and reduce noise.
- def _safe_detach_context(self, token):
+ def _safe_detach_context(self, token: Optional[object]) -> None:
@@
- try:
- # Use the runtime context directly to avoid error logging from context_api.detach()
- from opentelemetry.context import _RUNTIME_CONTEXT
-
- _RUNTIME_CONTEXT.detach(token)
- except Exception:
- # Context detach can fail in async scenarios when tokens are created in different contexts
- # This includes ValueError, RuntimeError, and other context-related exceptions
- # This is expected behavior and doesn't affect the correct span hierarchy
- #
- # Common scenarios where this happens:
- # 1. Token created in one async task/thread, detached in another
- # 2. Context was already detached by another process
- # 3. Token became invalid due to context switching
- # 4. Race conditions in highly concurrent scenarios
- #
- # This is safe to ignore as the span itself was properly ended
- # and the tracing data is correctly captured.
- pass
+ # Attempt a normal detach first; fall back to runtime context if needed.
+ with contextlib.suppress(Exception):
+ context_api.detach(token)
+ return
+ with contextlib.suppress(Exception):
+ from opentelemetry.context import _RUNTIME_CONTEXT # type: ignore
+ _RUNTIME_CONTEXT.detach(token)Committable suggestion skipped: line range outside the PR's diff.
| try: | ||
| context_api.attach( | ||
| context_api.set_value( | ||
| "association_properties", | ||
| {**current_association_properties, **sanitized_metadata}, | ||
| ) | ||
| ) | ||
| ) | ||
| except Exception: | ||
| # If setting association properties fails, continue without them | ||
| # This doesn't affect the core span functionality | ||
| pass |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Replace try/except/pass with contextlib.suppress and retain the token for later detach (Ruff SIM105).
This aligns with lint guidance and prevents leaking association_properties in the runtime context beyond the span lifecycle.
Apply this diff:
- try:
- context_api.attach(
- context_api.set_value(
- "association_properties",
- {**current_association_properties, **sanitized_metadata},
- )
- )
- except Exception:
- # If setting association properties fails, continue without them
- # This doesn't affect the core span functionality
- pass
+ with contextlib.suppress(Exception):
+ assoc_token = context_api.attach(
+ context_api.set_value(
+ "association_properties",
+ {**current_association_properties, **sanitized_metadata},
+ )
+ )
+ if assoc_token is not None:
+ self._association_tokens[run_id] = assoc_token📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| context_api.attach( | |
| context_api.set_value( | |
| "association_properties", | |
| {**current_association_properties, **sanitized_metadata}, | |
| ) | |
| ) | |
| ) | |
| except Exception: | |
| # If setting association properties fails, continue without them | |
| # This doesn't affect the core span functionality | |
| pass | |
| with contextlib.suppress(Exception): | |
| assoc_token = context_api.attach( | |
| context_api.set_value( | |
| "association_properties", | |
| {**current_association_properties, **sanitized_metadata}, | |
| ) | |
| ) | |
| if assoc_token is not None: | |
| self._association_tokens[run_id] = assoc_token |
🧰 Tools
🪛 Ruff (0.12.2)
262-272: Use contextlib.suppress(Exception) instead of try-except-pass
(SIM105)
🤖 Prompt for AI Agents
In
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
around lines 262-272, replace the bare try/except/pass with contextlib.suppress
while capturing the token returned by context_api.attach so it can be detached
later; specifically, import contextlib, use contextlib.suppress(Exception)
around the context_api.attach(...) call and assign its return to a variable
(e.g., token = context_api.attach(...)) and ensure that elsewhere after the span
lifecycle you call context_api.detach(token) to avoid leaking
association_properties in the runtime context beyond the span.
Important
Fixes Langchain application crash by implementing safe context attachment and detachment in async scenarios.
__call__()in__init__.pyand_create_span(),_create_llm_span(),_end_span()incallback_handler.py._safe_attach_context()and_safe_detach_context()incallback_handler.pyto handle context operations safely._create_span(),_create_llm_span(), and_end_span()to use the new safe context functions.This description was created by
for 8caba09. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
Bug Fixes
Refactor