-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New component: logtospanconnector #23182
Comments
This is a cool use case, exactly the kind for which connectors were designed. It does seem as though the biggest challenge here is bridging the gap between the data models.
Is there any set of sane defaults for these that would allow the connector to work "out of the box"? i.e. with no pre-processing necessary? If there is, I assume there would be some limitations on the usefulness of the output, but still it would be nice if users could drop it in and then opt in to more refined capabilities.
Since these aren't required fields in the log data model, what would be the behavior if they are empty? Dropping such logs seems reasonable to me in this case but I'm curious if there is any alternative.
If the connector could work "out of the box", even in a limited form, this option might make sense because the user could start with just the connector, and then put in a pre-processor and iterate on it to take advantage of additional mapping capabilities. However, it seems the connector would be extremely limited. Depending on a tight coupling with another component is not a great approach.
I like where you're going with this, but it may be overly ambitious extend/recreate the necessary grammar. There might be a middle road that allows the connector to be more self-contained but also minimizes complexity and overlap with processors. Rather than a generalized set of "statements", could the configuration directly specify how the mapping should work? I think we would still need to borrow OTTL's syntax for referring to fields, and there would be some cases where the user needs to pre-process. # keys are span fields. values are log fields
logtospan:
# defaults to 'trace_id'
trace_id: trace_id
# defaults to 'span_id'
span_id: span_id
# defaults to all attributes. If user lists any attributes, only those are copied
copy_attributes: []
# - attributes["foo"]
# - attributes["bar"]
# specify exactly 2 of 'start_time_unix_nano', 'end_time_unix_nano', 'duration'
# 'start_time_unix_nano' = 'end_time_unix_nano' - duration
# 'end_time_unix_nano' = 'start_time_unix_nano' + duration
# start_time_unix_nano:
end_time_unix_nano: time_unix_nano
duration: # could support some basic parsing
parse_from: attributes["latencyMs"]
unit: ms
# anything fancy here probably requires preprocessing, but maybe there's a creative solution?
parent_span_id.string: attributes["parent_span_id"]
# maybe a reasonable default?
name: attributes["event.name"]
# defaults to empty list. Applied after all other mapping is complete
drop_attributes: []
# - attributes["parent_span_id"]
# - attributes["latencyMs"]
# Same as above, without comments and defaults
logtospan/min_config:
end_time_unix_nano: time_unix_nano
duration: # could support some basic parsing
parse_from: attributes["latencyMs"]
unit: ms
parent_span_id.string: attributes["parent_span_id"] What do you think? |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I don't think there can be, unless we expect this to be used primarily for a root span. Even then we'd ideally want to get the duration or end time from somewhere.
I hackishly implemented it this way a while back, though didn't get around to setting it up in our pipeline. If you're interested I could rebase / tidy up and push that branch up as a draft PR. My concern with the middle road approach is that it's recreating a non-trivial amount of the OTTL transform, but in an ad hoc way. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
The purpose and use-cases of the new component
The logtospanconnector would emit spans based on logs. This can be useful if program doesn't emit spans natively, or if its spans cannot be customised as much as its logs can. For example, many HTTP servers either do not include tracing, or do not offer as much customisation of attributes as they do for access logs. Similarly, database servers can be configured to log statements, where one can "smuggle" trace context via something like sqlcommenter or marginalia-style comments.
Running the OpenTelemetry collector to process these logs into spans would make it possible to meet those programs where they are, and improve observability overall.
Example configuration for the component
There are a few main things that you need to extract from the log record that aren't part of the logs data model:
ParentSpanID
StartTime
andEndTime
(one of these is likely the log record'sTimestamp
though—probablyEndTime
)Name
The span's trace context can be assumed to live in the log record's
TraceID
andSpanID
fields.I've gone back and forth on configuration style for between:
cache
scratch space for work like computing the end time from a duration fieldMutatesData: false
set
on log record fields in the context would be disallowed at startupAssuming the more featureful version, then for a sidicar collector to an HTTP server you could have something like:
Telemetry data types supported
This is a connector that accepts logs and emits traces.
Is this a vendor-specific component?
Sponsor (optional)
No response
Additional context
We've been processing Google Cloud Load Balancer logs into spans with a custom pubsub client for about a year. I have been working on a replacement using a custom build of the opentelemetry collector with a connector like this, and realised it may be interesting to upstream it.
If this is interesting enough to upstream, I'll need to workshop the config style a bit. As mentioned above, I lean towards the transform-style, which is what I've got implemented so far.
The text was updated successfully, but these errors were encountered: