Skip to content

Conversation

@AstraBert
Copy link
Member

Description

Added an integration for OpenTelemetry with a custom EventHandler that is able to trace all events and log their details. Highly customizable but easy-to-set-up interface.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran uv run make format; uv run make lint to appease the lint gods

@AstraBert AstraBert self-assigned this May 15, 2025
@AstraBert AstraBert added the enhancement New feature or request label May 15, 2025
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 15, 2025
@AstraBert
Copy link
Member Author

Solving the conflicts, will be ready to be tested soon :)

Copy link
Collaborator

@logan-markewich logan-markewich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works! But, I think we can make it better 💪🏻

Right now, we create a new span for each event, so you get something like this
image

I think if we combine a span handler and event handler, we can get more accurate traces that better represent the execution paths inside llama-index

Thoughts? Do you think its possible?

@AstraBert
Copy link
Member Author

This works! But, I think we can make it better 💪🏻

Right now, we create a new span for each event, so you get something like this image

I think if we combine a span handler and event handler, we can get more accurate traces that better represent the execution paths inside llama-index

Thoughts? Do you think its possible?

@logan-markewich I think it is possible to do this, I'll have a look later :))

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 16, 2025
@AstraBert
Copy link
Member Author

Hey @logan-markewich, I added some more logic so that spans emitted from openetelemetry can be backtraced among each other (we have parent-children relationships now!)... Let me know if this is aligned with what you thought :)

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

span_handler = OpenTelemetrySpanHandler()
event_handler = OpenTelemetryEventHandler(span_handler=span_handler)
Copy link
Collaborator

@logan-markewich logan-markewich May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for ergonomics, I wonder if we should instead define some parent instrument_otel function that automatically sets up the span and event handlers

Some helper function like:

def instrument_otel(tracer_operator=None, dispatcher=None):
    dispatcher = dispatcher or instrument.get_dispatcher()

    span_handler = OpenTelemetrySpanHandler(tracer_operator=tracer_operator)
    event_handler = OpenTelemetryEventHandler(span_handler=span_handler)

    dispatcher.add_event_handler(event_handler)
    dispatcher.add_span_handler(span_handler)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am working exactly on that😁

@AstraBert
Copy link
Member Author

Ok, with this version you can create your own LlamaIndexOpenTelemetry instrumentation class, start listening and recording events, and then you can turn those events into a OpenTelemetry span whenever you want!

This is more or less the code:

from llama_index.observability.otel import LlamaIndexOpenTelemetry
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from opentelemetry.exporter.otlp.proto.http.trace_exporter import (
    OTLPSpanExporter,
)

# define a custom span exporter
span_exporter = OTLPSpanExporter("http://0.0.0.0:4318/v1/traces")

# initialize the instrumentation object
instrumentor = LlamaIndexOpenTelemetry(
    service_name_or_resource="my.otel.service", span_exporter=span_exporter
)

if __name__ == "__main__":
    # start listening!
    instrumentor.start_registering()
    # register events
    documents = SimpleDirectoryReader(
        input_dir="./data/paul_graham/"
    ).load_data()
    index = VectorStoreIndex.from_documents(documents)
    query_engine = index.as_query_engine()
    query_result = query_engine.query("Who is Paul?")
    # turn the events into a span and streamline them to OpenTelemetry
    instrumentor.to_otel_traces()
    # register another batch of events
    quere_result_one = query_engine.query("What did Paul do?")
    # turn the events into another span and streamline them to OpenTelemetry
    instrumentor.to_otel_traces()

And with this you will have two spans containing respectively 15 and 13 events😁

@logan-markewich
Copy link
Collaborator

@AstraBert this is closer! But I think we can do even better and make it even more automatic 💪🏻

Think of it this way

  1. With the span handler, we know when each span starts and stop. Inside the framework, we are constantly making new spans, and tracking the parent span ID, etc. Each span is mapped to a function call
  2. With the event handler, we know each event as it happens
  3. Put those two together, and we know
    a. Which event belongs to which span
    b. The entire hierarchy of spans

I think once a user runs instrumentor.start_registering(), then every span llama-index creates after that should be sent automatically over otel.

Basically, I think you can do something like

def new_span(...):
  ...
  otel_span = self._tracer.start_span(span_name, context=ctx) 
  ...

<similar handling for exit/drop_span>

And in the event handler

from opentelemetry import trace

def handle(...):
  ...
  current_span = trace.get_current_span()
  current_span.add_event(....)
  ...

In both cases, there is a lot of metadata we can attach to the otel spans and otel events. For example on the spans, we can set attributes for the function name and args. And for events, we can attach the event data (which we already do 💪🏻)

I hope this makes sense!

@AstraBert
Copy link
Member Author

Hey @logan-markewich, this should now finally work as we wanted ;)

You can try out this code that pipes the traces in Jaeger:

from llama_index.observability.otel import LlamaIndexOpenTelemetry
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from opentelemetry.exporter.otlp.proto.http.trace_exporter import (
    OTLPSpanExporter,
)
from llama_index.core.llms import MockLLM
from llama_index.core.embeddings import MockEmbedding
from llama_index.core import Settings

# define a custom span exporter
span_exporter = OTLPSpanExporter("http://0.0.0.0:4318/v1/traces")

# initialize the instrumentation object
instrumentor = LlamaIndexOpenTelemetry(
    service_name="my.test.service.1",
    span_exporter=span_exporter,
    debug=True,
    dispatcher_name="my.dispatcher.name",
)

if __name__ == "__main__":
    embed_model = MockEmbedding(embed_dim=256)
    llm = MockLLM()
    Settings.embed_model = embed_model
    # start listening!
    instrumentor.start_registering()
    # register events
    documents = SimpleDirectoryReader(
        input_dir="./data/paul_graham/"
    ).load_data()
    index = VectorStoreIndex.from_documents(documents)
    query_engine = index.as_query_engine(llm=llm)
    query_result = query_engine.query("Who is Paul?")
    query_result_one = query_engine.query("What did Paul do?")

span = self.all_spans[id_]
for event in self.all_events:
span.add_event(name=event.name, attributes=event.attributes)
self.all_events.clear()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I guess one issue with this approach is if I do something like await asyncio.gather(index.aquery(...), index.aquery(...)) -- events from both queries would get mixed together (I think, I couldn't quite test this, see other comment 👍🏻)

@logan-markewich
Copy link
Collaborator

logan-markewich commented May 19, 2025

@AstraBert hmm, looks like I still get traces with only one span

Here's the code

from opentelemetry.exporter.otlp.proto.http.trace_exporter import (
    OTLPSpanExporter,
)
from llama_index.observability.otel import LlamaIndexOpenTelemetry
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

# define a custom span exporter
span_exporter = OTLPSpanExporter("http://0.0.0.0:4318/v1/traces")

# initialize the instrumentation object
instrumentor = LlamaIndexOpenTelemetry(
    service_name_or_resource="my.otel.service.testv2", 
    span_exporter=span_exporter
)

if __name__ == "__main__":
    # try it out with a simple RAG example!
    instrumentor.start_registering()

    documents = SimpleDirectoryReader(
        input_dir="./docs/docs/examples/data/paul_graham/"
    ).load_data()
    index = VectorStoreIndex.from_documents(documents)

    query_engine = index.as_query_engine()
    query_result = query_engine.query("Who is Paul?")
    print(query_result)

From this code, I think I would expect only two top-level traces

  • one trace for VectorStoreIndex.from_document(...) -- this would contain some splitting and embedding events
  • one trace for query_engine.query() -- this would include events from retrieval, embedding, synthesis, calling the llm, etc. (quite a few here)

Where each trace has the hierarchy of spans

Here's what I currently get (let me know if you get something else locally! I hope I installed this branch properly lol)
image

@AstraBert
Copy link
Member Author

So actually I got similar results, but:

  • could you please check that the spans are correctly labelled as "ok"? Because from the screenshot it doesn't seem so :(
  • most of these spans are empty, maybe it would be easier if we only registered non-empty spans

I'll work on it :)

@logan-markewich
Copy link
Collaborator

@AstraBert It works! 🎉
image

I think the only remaining thing is the events don't quite get attached to the proper spans, I think?
image

I would expect LLM.apredict to have LLMPredictStart/End, while the OpenAI.chat call would have LLMChatStart/End 🤔 Will take a peek at the code and see if there is an easy fix

@AstraBert
Copy link
Member Author

Yeah, that's actually weird and unexpected, but I might have a hint here: the way we handle events is just to add them into a list within the SpanHandler - when a span is preparing to exit, all the events in the list are added to the span and then the span is ended. After that, the event list is cleared and filled by the new events, that are gonna go in the following span, and so on... Might be that this is too simplistic for the way we emit spans/events, and thus these result in being messed up.
I can also take a deeper look into it tomorrow, if needed :))

@logan-markewich
Copy link
Collaborator

@AstraBert I fixed it! Yes your suspicion was right! There was also a hidden issue where sometimes ChatResponse.raw was sometimes hitting a serialization error (its well known when using openai's sdk apparently, they way they use pydantic is janky)

I fixed both issues :)

@logan-markewich logan-markewich merged commit 632b0d3 into main May 21, 2025
6 of 10 checks passed
@logan-markewich logan-markewich deleted the clelia/opentelemetry-integration branch May 21, 2025 18:19
@colca colca mentioned this pull request Jun 9, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants