fix(langchain): ensure llm span is created by obs-gh-abhishekrao · Pull Request #2805 · traceloop/openllmetry

obs-gh-abhishekrao · 2025-04-03T20:50:46Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Fixes #2271

Reproducible code from bug report

Courtesy #2271. Thanks @jemo21k!
Expected behaviour: LLM span should be created.

Before

No openai.chat span.

After

openai.chat span is present.

sample_app/langgraph_example

Expected behaviour: No change. Additional LLM span should not be created since ChatOpenAI callback already creates one.

Before

After

Important

Fixes LLM span creation and context retention in Langchain and OpenAI instrumentation by setting default values for missing attributes and ensuring spans are correctly attached to context.

Behavior:
- Fixes missing openai.chat span creation in callback_handler.py by ensuring LLM_RESPONSE_MODEL is set to model or "unknown".
- Ensures parent context is retained and appropriate LLM span is created in _create_llm_span() in callback_handler.py.
Attributes:
- Sets LLM_RESPONSE_MODEL to model or "unknown" in _build_from_streaming_response() and _abuild_from_streaming_response() in chat_wrappers.py.
- Updates messages_list_wrapper() and runs_create_and_stream_wrapper() in assistant_wrappers.py to set LLM_RESPONSE_MODEL to model or "unknown".
Misc:
- Minor updates to ensure spans are correctly attached to context in callback_handler.py.

^{This description was created by}^{for 08116d9. It will automatically update as commits are pushed.}

CLAassistant · 2025-04-03T20:50:52Z

All committers have signed the CLA.

ellipsis-dev

❌ Changes requested. Reviewed everything up to 08116d9 in 2 minutes and 8 seconds

More details

Looked at 98 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:215

Draft comment:
Avoid using 'SpanAttributes.LLM_RESPONSE_MODEL or "unknown"' as the attribute name. Instead, pass SpanAttributes.LLM_RESPONSE_MODEL as the key and handle fallback in the value (e.g., assistants[assistant_id]["model"] or "unknown").
Reason this comment was not posted:
Marked as duplicate.

2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:362

Draft comment:
Changed token attachment from suppressing instrumentation to setting the span context with set_span_in_context(span). Ensure this change correctly retains the parent context without re-enabling downstream instrumentation.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
The comment is asking the author to ensure that the change retains the parent context without re-enabling downstream instrumentation. This is a request for confirmation of intended behavior, which violates the rules.

3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:130

Draft comment:
The expression 'SpanAttributes.LLM_RESPONSE_MODEL or "unknown"' is used as the attribute key. Likely you meant to assign a default value to the attribute value (i.e. assistant["model"] or "unknown"). Use _set_span_attribute(span, SpanAttributes.LLM_RESPONSE_MODEL, assistant["model"] or "unknown") instead.
Reason this comment was not posted:
Marked as duplicate.

4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:215

Draft comment:
Similar issue: 'SpanAttributes.LLM_RESPONSE_MODEL or "unknown"' is incorrectly used as the key. It should be applied to the model value instead. Consider using _set_span_attribute(span, SpanAttributes.LLM_RESPONSE_MODEL, assistants[assistant_id]["model"] or "unknown") for proper defaulting.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_x4Sz3CnV2AcX7OLr

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-04-03T20:53:01Z

        _set_span_attribute(
            span,
-            SpanAttributes.LLM_RESPONSE_MODEL,
+            SpanAttributes.LLM_RESPONSE_MODEL or "unknown",


Avoid using 'SpanAttributes.LLM_RESPONSE_MODEL or "unknown"' as the attribute name. Instead, pass SpanAttributes.LLM_RESPONSE_MODEL as the key and handle fallback in the value (e.g., assistant["model"] or "unknown").

Suggested change

SpanAttributes.LLM_RESPONSE_MODEL or "unknown",

SpanAttributes.LLM_RESPONSE_MODEL,

nirga

Thanks for submitting @obs-gh-abhishekrao - can you take a look into the failing tests, lint and sign the CLA?

obs-gh-abhishekrao · 2025-04-07T17:00:02Z

Hi @nirga, sorry it's taking time, I'm still working on getting the CLA signed.

Took a look a the test failures. The langchain test failure happening on SequentialChain raises some suspicions for me. It looks like context isn't updated / propagated during on_llm_start for SequentialChain.ainvoke. What seems to be happening is the OpenAI instrumentation is triggered (asynchronously?) as a child of the task span as opposed to as the child of the _create_llm_span call, and hence ends up not seeing SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY

So far I'm wondering if this is because the langchain instrumentation uses BaseCallbackHandler and (probably) missing an AsyncCallbackHandler.
Some references I was reading into:

https://python.langchain.com/docs/concepts/callbacks/#callback-handlers
https://python.langchain.com/docs/how_to/callbacks_async/
The second link mentions this:

If you use a sync CallbackHandler while using an async method to run your LLM / Chain / Tool / Agent, it will still work. However, under the hood, it will be called with [run_in_executor](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor) which can cause issues if your CallbackHandler is not thread-safe.

I tried adding opentelemetry-instrumentation-threading to the test to see if it solves the context propagation but it didn't help.

Suggestions would be really helpful here, so I'm curious to hear your thoughts.

obs-gh-abhishekrao

This should also fix #2661. I just ran into that bug as well.

obs-gh-abhishekrao · 2025-04-22T19:52:27Z

@nirga Please review the PR at your convenience. CLA is signed.

nirga

Thanks @obs-gh-abhishekrao - I think it's almost there. Can you add a test for this specific use case?

nirga · 2025-05-08T20:06:25Z

        _set_span_attribute(
            span,
-            SpanAttributes.LLM_RESPONSE_MODEL,
+            SpanAttributes.LLM_RESPONSE_MODEL or "unknown",


why do you need this?

Hmm that was incorrectly coalesced. The reason I added this was that I'd encountered silent failures in the metric generation which I resolved in 4263495. I was hoping similar logic was needed here. Although, I don't see any metrics generation as part of the assistant wrapper
My latest commit a2daa86 fixes the coalescing mistake, but I'm happy to remove it altogether too.

obs-gh-abhishekrao · 2025-05-12T19:55:09Z

Added test based on the sample code, but noticing some unexpected behaviour that I'm still debugging.

If I run the sample code in isolation, I'm able to successfully validate this PR
If I run the same code through the test, it does not produce spans openai.chat at all, as well as forsome other operations.
For what it's worth, I tried to link our own LLM app to this PR, and saw the openai.chat spans produced correctly.

My analysis so far:
Somehow during the test, the SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY is set to True, so the openai instrumentor ends up not creating a span at all. However, going by this PR, this context key is only set as part of _create_llm_span in langchain instrumentor, which isn't even called through the test code, so I'm not sure where else or how it's getting set.

obs-gh-abhishekrao · 2025-05-12T21:59:56Z

Found the reason

Added test based on the sample code, but noticing some unexpected behaviour that I'm still debugging.
* If I run the sample code in isolation, I'm able to successfully validate this PR

* If I run the same code through the test, it does not produce spans `openai.chat` at all, as well as forsome other operations.
  For what it's worth, I tried to link our own LLM app to this PR, and saw the `openai.chat` spans produced correctly.
My analysis so far: Somehow during the test, the SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY is set to True, so the openai instrumentor ends up not creating a span at all. However, going by this PR, this context key is only set as part of _create_llm_span in langchain instrumentor, which isn't even called through the test code, so I'm not sure where else or how it's getting set.

This happens because all the (langchain) tests are happening under a shared trace context (and hence all spans nest under the same trace). So, test A may set a context key and test B ends up accidentally reading it in its OpenAI instrumentor. Adding a pytest fixture that starts a root span before each test solves this. However some cases tend to fail now, which need fixing.

ronensc

@obs-gh-abhishekrao Thanks for working on fixing this issue! I’ve come across a subtle problem with the proposed fix. Please see the details below.

ronensc · 2025-05-20T09:12:29Z

-        token = context_api.attach(
-            context_api.set_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, True)
-        )
+        token = context_api.attach(set_span_in_context(span))


I believe this should have a corresponding detach() in _end_span(). Without it, the context of the caller could be altered, potentially breaking the parent-child relationship between spans and causing unrelated spans to be linked together.

Consider the following unit test, which currently fails due to this issue:

@pytest.mark.vcr def test_langgraph_double_invoke(exporter): class DummyGraphState(TypedDict): result: str def mynode_func(state: DummyGraphState) -> DummyGraphState: return state def build_graph(): workflow = StateGraph(DummyGraphState) workflow.add_node("mynode", mynode_func) workflow.set_entry_point("mynode") langgraph = workflow.compile() return langgraph graph = build_graph() from opentelemetry import trace assert "test_langgraph_double_invoke" == trace.get_current_span().name graph.invoke({"result": "init"}) assert "test_langgraph_double_invoke" == trace.get_current_span().name # incorrectly set to 'mynode.task' graph.invoke({"result": "init"}) assert "test_langgraph_double_invoke" == trace.get_current_span().name # still incorrectly set to 'mynode.task'

As you can see, the span name unexpectedly switches to 'mynode.task', indicating that the span context is not properly detached after each invocation.

Please take a look at my proposed fix here:
Fix Suggestion

Feel free to review and adjust as needed.

Thanks @ronensc! Great catch, will look into adding this test and fix.

ronensc · 2025-05-22T06:48:37Z

+
+
+@pytest.mark.vcr
+def test_langgraph_double_invoke(exporter):


The test_langgraph_double_invoke() and test_langgraph_double_ainvoke() don't catch the regression I mentioned.
If I comment out the detach() invocation, the tests pass even though they should have fail.
https://github.com/traceloop/openllmetry/pull/2805/files#diff-caff857eafc9121a585e40cdac1c0815d97f474cd3d0bf1a23755572e3b38105R702-R703

Apologies, I agree the test case is incorrect. However the fix you suggested earlier does not work for me (at least not locally). Not only does the (fixed) test fail but I'm also noticing there's a problem with detaching context in _end_span. There appears to be a flurry of these errors when it's detached there, possibly due to async behaviour:

ValueError: <Token var=<ContextVar name='current_context' default={} at 0xXXXXX> at 0xYYYYY> was created in a different Context

I wasn’t able to reproduce the error - could you share more details on how to reproduce it? For example, the Python version you're using, how you created the virtual environment, and how you ran the tests.

This only happens when you test for the async case i.e., ainvoke, which I don't think was included in your suggested fix.

Python 3.11.8
I setup the virtual environment using:

nx run opentelemetry-instrumentation-langchain:lock

And then nx run opentelemetry-instrumentation-langchain:install

Ran the tests within the langchain instrumentation directory poetry run pytest -svv tests/ (I could have ran it through nx but I preferred the verbosity of -svv as I was logging some stuff out.

Thanks, I'm able to reproduce the issue.

It looks like the root cause is that the on_XXX_end() callbacks run in a different task (and thus a different context) from the on_XXX_start() callbacks. I added the following lines to each callback to verify this:

task = asyncio.current_task() print(f"Task ID: {id(task)} Task name: {task.get_name()}")

I also came across this LangChain discussion, though it hasn't received any input from the maintainers yet.

From what I can tell, the on_XXX_end() callbacks are dispatched on a different task due to the use of the @shielded decorator:
https://github.com/langchain-ai/langchain/blob/1ebcbf1d11578cb55db26013c745dc1e5722966e/libs/core/langchain_core/callbacks/manager.py#L962-L963

Removing the decorator might resolve the issue, but I couldn't find any explanation in the PR that introduced it for why it was added in the first place.

Thanks for taking a closer look!

For what it's worth, there's a similar discussion here - langchain-ai/langsmith-sdk#1725. Could it also be that opentelemetry is not propagating context correctly? I wonder if injecting the OTel asyncio instrumentor and tagging the appropriate coroutine might take care of that.

Another issue I noticed was the contextvar error doesn't happen (or rather silently ignored) if the test case "passes". But when I added some logging, I found out that a bunch of tests were passing despite the context.detach happening outside the token's original context. That doesn't sound right to me.

I also wonder if Openllmetry could benefit forking the callback handler into Sync vs Async like the langchain tracer does i.e., BaseTracer vs AsyncBaseTracer.

I've created an issue in langchain's repo. You're right about removing the decorator solving the issue, I was able to verify that locally.
One other approach I tried here is safeguard the detach by storing copy_context() in SpanHolder

ctx = contextvars.copy_context() self.spans[run_id] = SpanHolder( span, token, None, ctx, [], workflow_name, entity_name, entity_path ) # later when detaching, detach it in a ctx.run(...) if token: ctx = self.spans[run_id].ctx ctx.run(context_api.detach, token)

But this does not fix the issue and somehow I still get the error for incorrect context used during detach and the test still fails with an incorrect span name :-(

I'm of the opinion that we should skip fixing span relationships for now and retain behavior from main. The main issue this PR addresses is missing downstream spans, which should be fixed. Aside from reporting upstream in langchain, I've also asked the OTel folks if there's any gotchas with shielded functions. I have however retained the double invoke/ainvoke tests for posterity for when we get to addressing the span relationships.

AFAIU, context_api.detach() should be invoked only from the context that created the token and the contextvars package doesn't have an API to get the current context (only a copy of it).

ronensc

I've reviewed the rest of the PR and shared a few thoughts and questions.

ronensc · 2025-05-26T12:54:05Z

+
+
+@pytest.mark.vcr
+def test_langgraph_double_invoke(exporter):


I wasn’t able to reproduce the error - could you share more details on how to reproduce it? For example, the Python version you're using, how you created the virtual environment, and how you ran the tests.

nirga

Thanks @obs-gh-abhishekrao but can you help me understand what does this changes in terms of functionality? I'm not sure it resolves the context propagation issue, right?

nirga · 2025-06-03T05:33:52Z

        return None
    if isinstance(value, (bool, str, bytes, int, float)):
        return value
-    if isinstance(value, (list, tuple)):


revert this?

There's some edge case that's causing span attributes and metrics to not get added / computed. I (locally) added tests but could not find a way to replicate this.
However, the sample code when run through sample-app/ consistently reproduces the missing spans and metrics.

I took a closer look, and have narrowed it down to this line of code..

openllmetry/packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py

Line 255 in 6d13035

return model.model_dump()

What's really strange is between the above change and main, there is no difference in the response being passed to model_as_dict and subsequently model.model_dump()

All that said I'm happy to revert this and file a separate bug. I can also add the sample code to sample-app for posterity.

Found the root cause and fixed it (and reverted the change from langchain). This was actually happening within the SDK here

openllmetry/packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py

Line 1067 in 6d13035

] = value

obs-gh-abhishekrao · 2025-06-03T21:01:36Z

Thanks @obs-gh-abhishekrao but can you help me understand what does this changes in terms of functionality? I'm not sure it resolves the context propagation issue, right?

Thanks for reviewing again @nirga. Apologies for the flurry of commits.
Initially, I thought I could address the missing LLM span as well as fix the parent span scope from the langchain task / workflow that triggers it. However, based on @ronensc's feedback and further discovery, context.attach and context.detach don't work predictably within the scope of langchain callbacks.

To sum it up - the PR will address

The main concern i.e., missing LLM spans
The newly created LLM spans did not have many span attributes. I addressed that as well.

Context will not be propagated from langchain to downstreams unfortunately, and this means the new LLM spans get disconnected from the trace.

…nt fix on langchain and add tests

obs-gh-abhishekrao · 2025-07-07T16:39:04Z

@obs-gh-abhishekrao To allow edits by maintainers on this PR, you simply mark the Allow edits from maintainers checkbox as described here: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests

Sorry for the hassle. I don't see any such option. Apparently this is a limitation when a fork was made from an org :(. https://github.com/orgs/community/discussions/5634. I'm checking to see what my options are. Worst case, I'll have to re-create this PR off a personal fork.

obs-gh-abhishekrao · 2025-07-07T17:09:30Z

@nirga I've submitted a different PR here - #3094. I see the option to allow maintainer edits now, so it should work.

ronensc · 2025-08-04T12:39:57Z

Since #3201 has been merged, I believe we can close this PR.

ellipsis-dev Bot reviewed Apr 3, 2025

View reviewed changes

nirga reviewed Apr 4, 2025

View reviewed changes

obs-gh-abhishekrao commented Apr 8, 2025

View reviewed changes

obs-gh-abhishekrao requested a review from nirga April 14, 2025 16:43

obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 25fe3fd to 6fcbdd9 Compare April 22, 2025 14:32

obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 6fcbdd9 to bb1e024 Compare April 25, 2025 16:09

obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from f9d466d to 3d4f99f Compare May 8, 2025 14:32

nirga reviewed May 8, 2025

View reviewed changes

obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 3d4f99f to 4263495 Compare May 12, 2025 19:27

obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 6301498 to 73648f1 Compare May 12, 2025 19:57

obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 7ed511a to 5b491f8 Compare May 14, 2025 15:52

ronensc requested changes May 20, 2025

View reviewed changes

obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch 2 times, most recently from 023a52c to be86077 Compare May 20, 2025 15:29

obs-gh-abhishekrao requested review from nirga and ronensc May 21, 2025 16:45

ronensc reviewed May 22, 2025

View reviewed changes

ronensc reviewed May 26, 2025

View reviewed changes

obs-gh-abhishekrao mentioned this pull request May 28, 2025

core(callbacks): context is not maintained when async callbacks get shielded langchain-ai/langchain#31398

Closed

5 tasks

xor-hardener-dev Bot mentioned this pull request May 31, 2025

core/callbacks - context is not maintained when async callbacks get shielded get-xor/langchain-ai-XOR-langchain#236

Open

5 tasks

obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch 2 times, most recently from e76c1cf to e304cd4 Compare June 2, 2025 17:54

obs-gh-abhishekrao requested a review from ronensc June 2, 2025 19:02

nirga reviewed Jun 3, 2025

View reviewed changes

obs-gh-abhishekrao added 23 commits July 7, 2025 12:15

Fix incorrectly coalesced attribute

41f9ae4

Fix linting issue

c6f73c1

Remove extraneous context suppression

3405ee5

Fix missing attributes

974261e

Fix incorrectly coalesced attribute value in openai instrumentation

ffd5825

Add test

7e39d7c

Fix test flakiness from shared span context

5437212

Match langgraph version used by sample apps

9483836

unused imports

585df14

Inject suppression token for legacy chains

3dfffe7

Add test for langgraph ainvoke

c6a5e48

Fix and add test for double langgraph invoke/ainvoke

a33806c

Remove set_span_in_context for now

4d8a977

Lint fix

75567ca

Fix root cause of missing LLM attributes and metrics, revert equivale…

de74ec1

…nt fix on langchain and add tests

Add sample app

4267ef1

Propagate ctx for sync callbacks, refactor langgraph metric test

5d9a48e

Improve async handling of propagation and test

3637178

Improve handling of sync context propagation

1ca4d9c

Renaming vars and adding comment for async case

5ce09b5

Remove unnecessary fixture from conftest

257c044

Assert against INVALID_SPAN instead of NonRecordingSpan

f149273

Revert unnecessary changes, add clarifying comments

d4f5403

obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 45ff713 to d4f5403 Compare July 7, 2025 16:15

droidnxs mentioned this pull request Jul 7, 2025

fix(langchain): ensure llm spans are created for sync cases #3094

Closed

4 tasks

Copilot AI mentioned this pull request Jul 22, 2025

fix(core): context preservation in shielded async callbacks langchain-ai/langchain#32163

Merged

obs-gh-abhishekrao closed this Aug 5, 2025

	SpanAttributes.LLM_RESPONSE_MODEL or "unknown",
	SpanAttributes.LLM_RESPONSE_MODEL,

Conversation

obs-gh-abhishekrao commented Apr 3, 2025 • edited by ellipsis-dev Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reproducible code from bug report

Before

After

sample_app/langgraph_example

Before

After

Uh oh!

CLAassistant commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev Bot Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

obs-gh-abhishekrao commented Apr 7, 2025

Uh oh!

obs-gh-abhishekrao left a comment

Choose a reason for hiding this comment

Uh oh!

obs-gh-abhishekrao commented Apr 22, 2025

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

obs-gh-abhishekrao commented May 12, 2025

Uh oh!

obs-gh-abhishekrao commented May 12, 2025

Uh oh!

ronensc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ronensc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

obs-gh-abhishekrao commented Apr 3, 2025 •

edited by ellipsis-dev Bot

Loading

CLAassistant commented Apr 3, 2025 •

edited

Loading

obs-gh-abhishekrao commented Jun 3, 2025 •

edited

Loading