Skip to content

fix(langchain): ensure llm span is created#2805

Closed
obs-gh-abhishekrao wants to merge 24 commits intotraceloop:mainfrom
observeinc:abhishek/fix-langchain-propagation
Closed

fix(langchain): ensure llm span is created#2805
obs-gh-abhishekrao wants to merge 24 commits intotraceloop:mainfrom
observeinc:abhishek/fix-langchain-propagation

Conversation

@obs-gh-abhishekrao
Copy link
Copy Markdown
Contributor

@obs-gh-abhishekrao obs-gh-abhishekrao commented Apr 3, 2025

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Fixes #2271

Reproducible code from bug report

Courtesy #2271. Thanks @jemo21k!
Expected behaviour: LLM span should be created.

Before

No openai.chat span.
image

After

openai.chat span is present.
image

sample_app/langgraph_example

Expected behaviour: No change. Additional LLM span should not be created since ChatOpenAI callback already creates one.

Before

image

After

image

Important

Fixes LLM span creation and context retention in Langchain and OpenAI instrumentation by setting default values for missing attributes and ensuring spans are correctly attached to context.

  • Behavior:
    • Fixes missing openai.chat span creation in callback_handler.py by ensuring LLM_RESPONSE_MODEL is set to model or "unknown".
    • Ensures parent context is retained and appropriate LLM span is created in _create_llm_span() in callback_handler.py.
  • Attributes:
    • Sets LLM_RESPONSE_MODEL to model or "unknown" in _build_from_streaming_response() and _abuild_from_streaming_response() in chat_wrappers.py.
    • Updates messages_list_wrapper() and runs_create_and_stream_wrapper() in assistant_wrappers.py to set LLM_RESPONSE_MODEL to model or "unknown".
  • Misc:
    • Minor updates to ensure spans are correctly attached to context in callback_handler.py.

This description was created by Ellipsis for 08116d9. It will automatically update as commits are pushed.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 3, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 08116d9 in 2 minutes and 8 seconds

More details
  • Looked at 98 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:215
  • Draft comment:
    Avoid using 'SpanAttributes.LLM_RESPONSE_MODEL or "unknown"' as the attribute name. Instead, pass SpanAttributes.LLM_RESPONSE_MODEL as the key and handle fallback in the value (e.g., assistants[assistant_id]["model"] or "unknown").
  • Reason this comment was not posted:
    Marked as duplicate.
2. packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py:362
  • Draft comment:
    Changed token attachment from suppressing instrumentation to setting the span context with set_span_in_context(span). Ensure this change correctly retains the parent context without re-enabling downstream instrumentation.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
    The comment is asking the author to ensure that the change retains the parent context without re-enabling downstream instrumentation. This is a request for confirmation of intended behavior, which violates the rules.
3. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:130
  • Draft comment:
    The expression 'SpanAttributes.LLM_RESPONSE_MODEL or "unknown"' is used as the attribute key. Likely you meant to assign a default value to the attribute value (i.e. assistant["model"] or "unknown"). Use _set_span_attribute(span, SpanAttributes.LLM_RESPONSE_MODEL, assistant["model"] or "unknown") instead.
  • Reason this comment was not posted:
    Marked as duplicate.
4. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py:215
  • Draft comment:
    Similar issue: 'SpanAttributes.LLM_RESPONSE_MODEL or "unknown"' is incorrectly used as the key. It should be applied to the model value instead. Consider using _set_span_attribute(span, SpanAttributes.LLM_RESPONSE_MODEL, assistants[assistant_id]["model"] or "unknown") for proper defaulting.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_x4Sz3CnV2AcX7OLr


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

_set_span_attribute(
span,
SpanAttributes.LLM_RESPONSE_MODEL,
SpanAttributes.LLM_RESPONSE_MODEL or "unknown",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using 'SpanAttributes.LLM_RESPONSE_MODEL or "unknown"' as the attribute name. Instead, pass SpanAttributes.LLM_RESPONSE_MODEL as the key and handle fallback in the value (e.g., assistant["model"] or "unknown").

Suggested change
SpanAttributes.LLM_RESPONSE_MODEL or "unknown",
SpanAttributes.LLM_RESPONSE_MODEL,

Copy link
Copy Markdown
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting @obs-gh-abhishekrao - can you take a look into the failing tests, lint and sign the CLA?

@obs-gh-abhishekrao
Copy link
Copy Markdown
Contributor Author

Hi @nirga, sorry it's taking time, I'm still working on getting the CLA signed.

Took a look a the test failures. The langchain test failure happening on SequentialChain raises some suspicions for me. It looks like context isn't updated / propagated during on_llm_start for SequentialChain.ainvoke. What seems to be happening is the OpenAI instrumentation is triggered (asynchronously?) as a child of the task span as opposed to as the child of the _create_llm_span call, and hence ends up not seeing SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY

So far I'm wondering if this is because the langchain instrumentation uses BaseCallbackHandler and (probably) missing an AsyncCallbackHandler.
Some references I was reading into:

If you use a sync CallbackHandler while using an async method to run your LLM / Chain / Tool / Agent, it will still work. However, under the hood, it will be called with [run_in_executor](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor) which can cause issues if your CallbackHandler is not thread-safe.

I tried adding opentelemetry-instrumentation-threading to the test to see if it solves the context propagation but it didn't help.

Suggestions would be really helpful here, so I'm curious to hear your thoughts.

Copy link
Copy Markdown
Contributor Author

@obs-gh-abhishekrao obs-gh-abhishekrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also fix #2661. I just ran into that bug as well.

@obs-gh-abhishekrao obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 25fe3fd to 6fcbdd9 Compare April 22, 2025 14:32
@obs-gh-abhishekrao
Copy link
Copy Markdown
Contributor Author

@nirga Please review the PR at your convenience. CLA is signed.

@obs-gh-abhishekrao obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 6fcbdd9 to bb1e024 Compare April 25, 2025 16:09
@obs-gh-abhishekrao obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from f9d466d to 3d4f99f Compare May 8, 2025 14:32
Copy link
Copy Markdown
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @obs-gh-abhishekrao - I think it's almost there. Can you add a test for this specific use case?

_set_span_attribute(
span,
SpanAttributes.LLM_RESPONSE_MODEL,
SpanAttributes.LLM_RESPONSE_MODEL or "unknown",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that was incorrectly coalesced. The reason I added this was that I'd encountered silent failures in the metric generation which I resolved in 4263495. I was hoping similar logic was needed here. Although, I don't see any metrics generation as part of the assistant wrapper
My latest commit a2daa86 fixes the coalescing mistake, but I'm happy to remove it altogether too.

@obs-gh-abhishekrao obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 3d4f99f to 4263495 Compare May 12, 2025 19:27
@obs-gh-abhishekrao
Copy link
Copy Markdown
Contributor Author

Added test based on the sample code, but noticing some unexpected behaviour that I'm still debugging.

  • If I run the sample code in isolation, I'm able to successfully validate this PR
  • If I run the same code through the test, it does not produce spans openai.chat at all, as well as forsome other operations.
    For what it's worth, I tried to link our own LLM app to this PR, and saw the openai.chat spans produced correctly.

My analysis so far:
Somehow during the test, the SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY is set to True, so the openai instrumentor ends up not creating a span at all. However, going by this PR, this context key is only set as part of _create_llm_span in langchain instrumentor, which isn't even called through the test code, so I'm not sure where else or how it's getting set.

@obs-gh-abhishekrao obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 6301498 to 73648f1 Compare May 12, 2025 19:57
@obs-gh-abhishekrao
Copy link
Copy Markdown
Contributor Author

Found the reason

Added test based on the sample code, but noticing some unexpected behaviour that I'm still debugging.

* If I run the sample code in isolation, I'm able to successfully validate this PR

* If I run the same code through the test, it does not produce spans `openai.chat` at all, as well as forsome other operations.
  For what it's worth, I tried to link our own LLM app to this PR, and saw the `openai.chat` spans produced correctly.

My analysis so far: Somehow during the test, the SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY is set to True, so the openai instrumentor ends up not creating a span at all. However, going by this PR, this context key is only set as part of _create_llm_span in langchain instrumentor, which isn't even called through the test code, so I'm not sure where else or how it's getting set.

This happens because all the (langchain) tests are happening under a shared trace context (and hence all spans nest under the same trace). So, test A may set a context key and test B ends up accidentally reading it in its OpenAI instrumentor. Adding a pytest fixture that starts a root span before each test solves this. However some cases tend to fail now, which need fixing.

@obs-gh-abhishekrao obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 7ed511a to 5b491f8 Compare May 14, 2025 15:52
Copy link
Copy Markdown
Collaborator

@ronensc ronensc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obs-gh-abhishekrao Thanks for working on fixing this issue! I’ve come across a subtle problem with the proposed fix. Please see the details below.

token = context_api.attach(
context_api.set_value(SUPPRESS_LANGUAGE_MODEL_INSTRUMENTATION_KEY, True)
)
token = context_api.attach(set_span_in_context(span))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should have a corresponding detach() in _end_span(). Without it, the context of the caller could be altered, potentially breaking the parent-child relationship between spans and causing unrelated spans to be linked together.

Consider the following unit test, which currently fails due to this issue:

@pytest.mark.vcr
def test_langgraph_double_invoke(exporter):
    class DummyGraphState(TypedDict):
        result: str

    def mynode_func(state: DummyGraphState) -> DummyGraphState:
        return state

    def build_graph():
        workflow = StateGraph(DummyGraphState)
        workflow.add_node("mynode", mynode_func)
        workflow.set_entry_point("mynode")
        langgraph = workflow.compile()
        return langgraph
    
    graph = build_graph()
    from opentelemetry import trace
    assert "test_langgraph_double_invoke" == trace.get_current_span().name
    graph.invoke({"result": "init"})
    assert "test_langgraph_double_invoke" == trace.get_current_span().name # incorrectly set to 'mynode.task' 
    graph.invoke({"result": "init"})
    assert "test_langgraph_double_invoke" == trace.get_current_span().name # still incorrectly set to 'mynode.task' 

As you can see, the span name unexpectedly switches to 'mynode.task', indicating that the span context is not properly detached after each invocation.

Please take a look at my proposed fix here:
Fix Suggestion

Feel free to review and adjust as needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ronensc! Great catch, will look into adding this test and fix.

@obs-gh-abhishekrao obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch 2 times, most recently from 023a52c to be86077 Compare May 20, 2025 15:29


@pytest.mark.vcr
def test_langgraph_double_invoke(exporter):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_langgraph_double_invoke() and test_langgraph_double_ainvoke() don't catch the regression I mentioned.
If I comment out the detach() invocation, the tests pass even though they should have fail.
https://github.com/traceloop/openllmetry/pull/2805/files#diff-caff857eafc9121a585e40cdac1c0815d97f474cd3d0bf1a23755572e3b38105R702-R703

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I agree the test case is incorrect. However the fix you suggested earlier does not work for me (at least not locally). Not only does the (fixed) test fail but I'm also noticing there's a problem with detaching context in _end_span. There appears to be a flurry of these errors when it's detached there, possibly due to async behaviour:

ValueError: <Token var=<ContextVar name='current_context' default={} at 0xXXXXX> at 0xYYYYY> was created in a different Context

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn’t able to reproduce the error - could you share more details on how to reproduce it? For example, the Python version you're using, how you created the virtual environment, and how you ran the tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only happens when you test for the async case i.e., ainvoke, which I don't think was included in your suggested fix.

  • Python 3.11.8
    I setup the virtual environment using:
  • nx run opentelemetry-instrumentation-langchain:lock
  • And then nx run opentelemetry-instrumentation-langchain:install
  • Ran the tests within the langchain instrumentation directory poetry run pytest -svv tests/ (I could have ran it through nx but I preferred the verbosity of -svv as I was logging some stuff out.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm able to reproduce the issue.

It looks like the root cause is that the on_XXX_end() callbacks run in a different task (and thus a different context) from the on_XXX_start() callbacks. I added the following lines to each callback to verify this:

task = asyncio.current_task()
print(f"Task ID: {id(task)} Task name: {task.get_name()}")

I also came across this LangChain discussion, though it hasn't received any input from the maintainers yet.

From what I can tell, the on_XXX_end() callbacks are dispatched on a different task due to the use of the @shielded decorator:
https://github.com/langchain-ai/langchain/blob/1ebcbf1d11578cb55db26013c745dc1e5722966e/libs/core/langchain_core/callbacks/manager.py#L962-L963

Removing the decorator might resolve the issue, but I couldn't find any explanation in the PR that introduced it for why it was added in the first place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a closer look!

For what it's worth, there's a similar discussion here - langchain-ai/langsmith-sdk#1725. Could it also be that opentelemetry is not propagating context correctly? I wonder if injecting the OTel asyncio instrumentor and tagging the appropriate coroutine might take care of that.

Another issue I noticed was the contextvar error doesn't happen (or rather silently ignored) if the test case "passes". But when I added some logging, I found out that a bunch of tests were passing despite the context.detach happening outside the token's original context. That doesn't sound right to me.

I also wonder if Openllmetry could benefit forking the callback handler into Sync vs Async like the langchain tracer does i.e., BaseTracer vs AsyncBaseTracer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created an issue in langchain's repo. You're right about removing the decorator solving the issue, I was able to verify that locally.
One other approach I tried here is safeguard the detach by storing copy_context() in SpanHolder

ctx = contextvars.copy_context()
self.spans[run_id] = SpanHolder(
    span, token, None, ctx, [], workflow_name, entity_name, entity_path
)

# later when detaching, detach it in a ctx.run(...)
if token:
    ctx = self.spans[run_id].ctx
    ctx.run(context_api.detach, token)

But this does not fix the issue and somehow I still get the error for incorrect context used during detach and the test still fails with an incorrect span name :-(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of the opinion that we should skip fixing span relationships for now and retain behavior from main. The main issue this PR addresses is missing downstream spans, which should be fixed. Aside from reporting upstream in langchain, I've also asked the OTel folks if there's any gotchas with shielded functions. I have however retained the double invoke/ainvoke tests for posterity for when we get to addressing the span relationships.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, context_api.detach() should be invoked only from the context that created the token and the contextvars package doesn't have an API to get the current context (only a copy of it).

Copy link
Copy Markdown
Collaborator

@ronensc ronensc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the rest of the PR and shared a few thoughts and questions.

Comment thread packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py Outdated
Comment thread packages/opentelemetry-instrumentation-langchain/pyproject.toml Outdated


@pytest.mark.vcr
def test_langgraph_double_invoke(exporter):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn’t able to reproduce the error - could you share more details on how to reproduce it? For example, the Python version you're using, how you created the virtual environment, and how you ran the tests.

Comment thread packages/opentelemetry-instrumentation-langchain/pyproject.toml Outdated
Copy link
Copy Markdown
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @obs-gh-abhishekrao but can you help me understand what does this changes in terms of functionality? I'm not sure it resolves the context propagation issue, right?

return None
if isinstance(value, (bool, str, bytes, int, float)):
return value
if isinstance(value, (list, tuple)):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some edge case that's causing span attributes and metrics to not get added / computed. I (locally) added tests but could not find a way to replicate this.
However, the sample code when run through sample-app/ consistently reproduces the missing spans and metrics.

I took a closer look, and have narrowed it down to this line of code..


What's really strange is between the above change and main, there is no difference in the response being passed to model_as_dict and subsequently model.model_dump()

All that said I'm happy to revert this and file a separate bug. I can also add the sample code to sample-app for posterity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the root cause and fixed it (and reverted the change from langchain). This was actually happening within the SDK here

@obs-gh-abhishekrao
Copy link
Copy Markdown
Contributor Author

obs-gh-abhishekrao commented Jun 3, 2025

Thanks @obs-gh-abhishekrao but can you help me understand what does this changes in terms of functionality? I'm not sure it resolves the context propagation issue, right?

Thanks for reviewing again @nirga. Apologies for the flurry of commits.
Initially, I thought I could address the missing LLM span as well as fix the parent span scope from the langchain task / workflow that triggers it. However, based on @ronensc's feedback and further discovery, context.attach and context.detach don't work predictably within the scope of langchain callbacks.

To sum it up - the PR will address

  • The main concern i.e., missing LLM spans
  • The newly created LLM spans did not have many span attributes. I addressed that as well.

Context will not be propagated from langchain to downstreams unfortunately, and this means the new LLM spans get disconnected from the trace.

@obs-gh-abhishekrao obs-gh-abhishekrao force-pushed the abhishek/fix-langchain-propagation branch from 45ff713 to d4f5403 Compare July 7, 2025 16:15
@obs-gh-abhishekrao
Copy link
Copy Markdown
Contributor Author

@obs-gh-abhishekrao To allow edits by maintainers on this PR, you simply mark the Allow edits from maintainers checkbox as described here: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork#enabling-repository-maintainer-permissions-on-existing-pull-requests

Sorry for the hassle. I don't see any such option. Apparently this is a limitation when a fork was made from an org :(. https://github.com/orgs/community/discussions/5634. I'm checking to see what my options are. Worst case, I'll have to re-create this PR off a personal fork.

@obs-gh-abhishekrao
Copy link
Copy Markdown
Contributor Author

@nirga I've submitted a different PR here - #3094. I see the option to allow maintainer edits now, so it should work.

@ronensc
Copy link
Copy Markdown
Collaborator

ronensc commented Aug 4, 2025

Since #3201 has been merged, I believe we can close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: OpenAI Requests Not Traced When Sent from a LangGraph Node

4 participants