-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Add feature: OpenTelemetry integration for observability #18744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Solving the conflicts, will be ready to be tested soon :) |
logan-markewich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works! But, I think we can make it better 💪🏻
Right now, we create a new span for each event, so you get something like this

I think if we combine a span handler and event handler, we can get more accurate traces that better represent the execution paths inside llama-index
Thoughts? Do you think its possible?
...grations/observability/llama-index-observability-otel/llama_index/observability/otel/base.py
Outdated
Show resolved
Hide resolved
…/run-llama/llama_index into clelia/opentelemetry-integration
@logan-markewich I think it is possible to do this, I'll have a look later :)) |
|
Hey @logan-markewich, I added some more logic so that spans emitted from openetelemetry can be backtraced among each other (we have parent-children relationships now!)... Let me know if this is aligned with what you thought :) |
| from llama_index.core import SimpleDirectoryReader, VectorStoreIndex | ||
|
|
||
| span_handler = OpenTelemetrySpanHandler() | ||
| event_handler = OpenTelemetryEventHandler(span_handler=span_handler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for ergonomics, I wonder if we should instead define some parent instrument_otel function that automatically sets up the span and event handlers
Some helper function like:
def instrument_otel(tracer_operator=None, dispatcher=None):
dispatcher = dispatcher or instrument.get_dispatcher()
span_handler = OpenTelemetrySpanHandler(tracer_operator=tracer_operator)
event_handler = OpenTelemetryEventHandler(span_handler=span_handler)
dispatcher.add_event_handler(event_handler)
dispatcher.add_span_handler(span_handler)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am working exactly on that😁
...grations/observability/llama-index-observability-otel/llama_index/observability/otel/base.py
Outdated
Show resolved
Hide resolved
|
Ok, with this version you can create your own This is more or less the code: from llama_index.observability.otel import LlamaIndexOpenTelemetry
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from opentelemetry.exporter.otlp.proto.http.trace_exporter import (
OTLPSpanExporter,
)
# define a custom span exporter
span_exporter = OTLPSpanExporter("http://0.0.0.0:4318/v1/traces")
# initialize the instrumentation object
instrumentor = LlamaIndexOpenTelemetry(
service_name_or_resource="my.otel.service", span_exporter=span_exporter
)
if __name__ == "__main__":
# start listening!
instrumentor.start_registering()
# register events
documents = SimpleDirectoryReader(
input_dir="./data/paul_graham/"
).load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
query_result = query_engine.query("Who is Paul?")
# turn the events into a span and streamline them to OpenTelemetry
instrumentor.to_otel_traces()
# register another batch of events
quere_result_one = query_engine.query("What did Paul do?")
# turn the events into another span and streamline them to OpenTelemetry
instrumentor.to_otel_traces()And with this you will have two spans containing respectively 15 and 13 events😁 |
|
@AstraBert this is closer! But I think we can do even better and make it even more automatic 💪🏻 Think of it this way
I think once a user runs Basically, I think you can do something like def new_span(...):
...
otel_span = self._tracer.start_span(span_name, context=ctx)
...
<similar handling for exit/drop_span>And in the event handler from opentelemetry import trace
def handle(...):
...
current_span = trace.get_current_span()
current_span.add_event(....)
...In both cases, there is a lot of metadata we can attach to the otel spans and otel events. For example on the spans, we can set attributes for the function name and args. And for events, we can attach the event data (which we already do 💪🏻) I hope this makes sense! |
|
Hey @logan-markewich, this should now finally work as we wanted ;) You can try out this code that pipes the traces in Jaeger: from llama_index.observability.otel import LlamaIndexOpenTelemetry
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from opentelemetry.exporter.otlp.proto.http.trace_exporter import (
OTLPSpanExporter,
)
from llama_index.core.llms import MockLLM
from llama_index.core.embeddings import MockEmbedding
from llama_index.core import Settings
# define a custom span exporter
span_exporter = OTLPSpanExporter("http://0.0.0.0:4318/v1/traces")
# initialize the instrumentation object
instrumentor = LlamaIndexOpenTelemetry(
service_name="my.test.service.1",
span_exporter=span_exporter,
debug=True,
dispatcher_name="my.dispatcher.name",
)
if __name__ == "__main__":
embed_model = MockEmbedding(embed_dim=256)
llm = MockLLM()
Settings.embed_model = embed_model
# start listening!
instrumentor.start_registering()
# register events
documents = SimpleDirectoryReader(
input_dir="./data/paul_graham/"
).load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(llm=llm)
query_result = query_engine.query("Who is Paul?")
query_result_one = query_engine.query("What did Paul do?") |
| span = self.all_spans[id_] | ||
| for event in self.all_events: | ||
| span.add_event(name=event.name, attributes=event.attributes) | ||
| self.all_events.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I guess one issue with this approach is if I do something like await asyncio.gather(index.aquery(...), index.aquery(...)) -- events from both queries would get mixed together (I think, I couldn't quite test this, see other comment 👍🏻)
|
@AstraBert hmm, looks like I still get traces with only one span Here's the code From this code, I think I would expect only two top-level traces
Where each trace has the hierarchy of spans Here's what I currently get (let me know if you get something else locally! I hope I installed this branch properly lol) |
|
So actually I got similar results, but:
I'll work on it :) |
|
@AstraBert It works! 🎉 I think the only remaining thing is the events don't quite get attached to the proper spans, I think? I would expect |
|
Yeah, that's actually weird and unexpected, but I might have a hint here: the way we handle events is just to add them into a list within the SpanHandler - when a span is preparing to exit, all the events in the list are added to the span and then the span is ended. After that, the event list is cleared and filled by the new events, that are gonna go in the following span, and so on... Might be that this is too simplistic for the way we emit spans/events, and thus these result in being messed up. |
|
@AstraBert I fixed it! Yes your suspicion was right! There was also a hidden issue where sometimes I fixed both issues :) |




Description
Added an integration for OpenTelemetry with a custom
EventHandlerthat is able to trace all events and log their details. Highly customizable but easy-to-set-up interface.New Package?
Did I fill in the
tool.llamahubsection in thepyproject.tomland provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.tomlfile of the package I am updating? (Except for thellama-index-corepackage)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
Suggested Checklist:
uv run make format; uv run make lintto appease the lint gods