-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document span events <-> log-based events conversion #4393
Comments
Coming here from #376 (comment) A couple points I made in that thread:
In other words, I would have made the following changes to the traces spec:
This has the following benefits:
|
I'm happy to bring up these points on a call, wherever is appropriate if not here. |
The motivation to use logs for something goes way beyond span limitations. I think @open-telemetry/android-approvers @open-telemetry/semconv-mobile-approvers and Client SIG can write volumes about their journey to writing-event-as-zero-duration-spans and back.
Some examples of logs that are not spans and shouldn't be on spans:
So OTel community is moving towards leveraging logs and recording data that does not fit into tree-like trace structure.
If you want to discuss using spans to record logs - there were a lot of discussions in the past. I can find these two #2125, #4123 and maybe @breedx-splk or @bidetofevil have some good pointers on how spans are not a replacement for events. On span events -> events, Logs SIG or spec is the good place to discuss. Check out #3406 On logs as span attributes - see above - I don't think it's a viable approach (lack of severity, filtering, non-tracing use-cases, existing logging world, etc) On complex attributes: it's been actively discussed recently and is allowed on logs (events). You can find a lot of context on #2888, #2888, #3858, open-telemetry/semantic-conventions#1669 The TL;DR: allowing them on spans is not out of the question, but is a quite contentious topic with no consensus in sight. You're welcome to drive it in the Spec call, but I recommend first researching the history and understanding the concerns |
The TLDR on my points below is that I still think a couple small backward compatible tweaks on the tracing signal could have enabled a lot of the use cases that have moved to logs with a lot less downside than adding a new signal, and much less a new signal that is encumbered by being designed for backward compatibility with legacy systems.
I'm not sure what the point is here. Are you saying there are things which need to emit logs but don't need to emit spans? What if they only emit logs that go via the span machinery? I just don't understand what the advantage of instrumenting something with OTEL logging is vs. instrumenting it with OTEL tracing, assuming you emit the same amount of telemetry and aren't comparing doing nothing with logging with auto-instrumenting every dependency via tracing.
My whole point is that it wouldn't be that hard to expand the definition of a span to encapsulate things that have no duration. Call it zero duration spans, or call it logs if that sounds better. As you say that was asked in #2125 and it got no response. I agree with your opinion in #4123 (comment): I would like spans to have been re-used for this purpose, I don't even think we'd need to make it a "new thing" without a duration or status, just add a new field that marks that span as a "log". This has the advantage of being completely backward compatible down to the wire protocol level. Old backends will support it (probably pretty well) and new or updated backends can do something fancier than displaying zero duration (e.g. not display a duration at all) based on the new field/flag. We can give these things a semantic meaning and semantic conventions without changing the wire protocol or requiring backends to do something new.
As you say there are examples where you don't want to propagate the existing context. Some examples that I've encountered and come up in the linked discussions are:
But logs can also have context! And you can make a contextless span by creating a new trace. What you want in the tree and not is somewhat arbitrary and I think OTEL should have guidance and APIs to make these things easier, but I don't think a new signal is necessary and I see no reason why this can't and shouldn't be modeled under the existing paradigms. Performance wise it seems to me like at the SDK level the machinery is very similar between logs and tracing and thus would have similar performance. I don't think creating 16 bytes of randomness is a drop in the bucket of the overhead of instrumenting with OTEL, at least not in Python or Rust. I agree that systems that emit logs to a file with no context or buffering probably have less overhead, but that's a completely different topic and more akin to piping data from a legacy system into an OTEL system which has existing solutions.
I think the place where we are not agreeing is on what is a span. I'm proposing you make a new thing that's the same as a span (same internal implementation, same wire protocol, etc.) except that it semantically has no duration and no status. Physically it can have zero duration and an unset status, that doesn't impact the semantic meaning. Once you have that thing, all of your examples fit within it nicely.
I've been around the block on that one as well... I've been banging the table for a single attributes model that is a superset of JSON for years. As per above I suspect a lot of uses of logs are actually just uses of complex attributes and logs happen to support them because legacy system support reasons. Somehow we've now gotten ourselves into this weird place where a backend is basically forced to support logs (which have complex attributes) but we are also saying that we don't want to add them to spans because it might break backends that don't support complex attributes. I know the proposal of "logs" via the tracing machinery and complex attributes works and is useful to users because:
|
hi @adriangb, The OTEP which defines the vision for OpenTelemetry Events was widely discussed and approved. I understand your concerns and needs, but the decisions around logs and events have been a huge balancing act among lots of concerns, lots of needs, lots of people, and lots of ecosystems. We are fully committed to supporting users who want to funnel their events to span events, but it will (eventually) require some kind of opt-in strategy (could potentially be as simple as a declarative configuration property). It will be hard to change this direction given the amount of discussions and approvals that went into it, but if you'd like to try I'd suggest adding a topic to the Specification SIG meeting agenda and raising it there. |
@adriangb To add to what @trask said, if the core need for you is to have complex attributes on spans - I think you can find a lot of support in the community (myself included). There are a lot of valid points both in favor and against it in #2888, #3858 and linked discussions. It needs a champion to drive it and reach consensus |
Yeah I think what I'm bringing up are orthogonal concerns in some sense, although they are related: Reading discussions and motivations it feels like (1) and (2) are related insofar that things have biased toward logs because they have complex attributes that make it easier to support use cases like LLM chats. And now we've ended up with the worst of all worlds where as a backend developer I have to support complex attributes because I have to support logs (which also has a lot of other complexity e.g. what to do with the log body) but I can't benefit from the power of complex attributes in spans. I'll try to join the next SIG meeting to campaign for complex attributes at least. |
I'll make a note on this. We rolled this out then had to roll it back because currently every span an exception goes through generates a span event and thus a log event if you treat them the same during ingestion. This is very noisy and is not the same as the intention of #4333 which specifically tries to address this:
But there is no such guidance / recommendation for span events. So if you treat them similarly things get quite messy. IMO a reasonable compromise would be to make a change to span events / recording of exceptions on spans to:
Then if span events are deprecated in favor of logs the semantics at least match up better and the transition is easier. |
The long term vision for events is to leverage log based events instead of span events.
There are different (not mutually-exclusive) migration strategies we could use:
...
span event -> log event conversion is as trivial as:
With the only challenge being getting access to Logs SDK pipeline from Trace SDK
log events -> span events is more complicated.
The motivation to define this conversion is to support
otel.log
log.record.severity.number
)log.record.severity.text
)log.record.body
We should document translation rules to have consistency across languages. Implementation would remain optional and done as a part of contrib repos, but this might change depending on the migration path.
Related:
The text was updated successfully, but these errors were encountered: