Skip to content

[processor/enrichmentprocessor] first version of enrichmentprocessor#42056

Closed
kyo-ke wants to merge 24 commits into
open-telemetry:mainfrom
kyo-ke:enrichmentprocessor
Closed

[processor/enrichmentprocessor] first version of enrichmentprocessor#42056
kyo-ke wants to merge 24 commits into
open-telemetry:mainfrom
kyo-ke:enrichmentprocessor

Conversation

@kyo-ke

@kyo-ke kyo-ke commented Aug 17, 2025

Copy link
Copy Markdown
Contributor

Description

New Component enrichemntprocessor.
This processor can enrich attribute for three pillar using external data via file or http(csv/json).
This component keep monitoring file/endpoint.

Link to tracking issue

#41816
#40526

Testing

unit test is added with 79% coverage

Documentation

Each enrichment for datapoint/ span/ logrecord is done in constant order.
Internally it is holding each line/object of csv/json as array and when it load the data it will create index.(lookup.go)

@kyo-ke kyo-ke requested a review from a team as a code owner August 17, 2025 09:44
@kyo-ke kyo-ke requested a review from crobert-1 August 17, 2025 09:44
@kyo-ke kyo-ke changed the title Enrichmentprocessor [processor/enrichmentprocessor] first version of enrichmentprocessor Aug 17, 2025
@atoulme atoulme added the Sponsor Needed New component seeking sponsor label Aug 18, 2025
@atoulme

atoulme commented Aug 18, 2025

Copy link
Copy Markdown
Contributor

Pushing this code is premature. We need more discussion on the merits of this approach over adding a detector to resourcedetectionprocessor.

@kyo-ke

kyo-ke commented Aug 19, 2025

Copy link
Copy Markdown
Contributor Author

@atoulme Thank you for taking look at this PR!
In my understanding, resourcedetectionprocessor is focusing on collecting/enriching Resource level attribute.

This processor is for enriching each log, metric, span level attribute using external datasource.
This is why we need lookup functionality.

If there is plan to expand resourcedetectionprocessor to enrich each metric, span, log, totally agreed to add these functionality to resourcedetectionprocessor.

Are there any plan for this?

@atoulme atoulme added the discussion needed Community discussion needed label Aug 19, 2025
@atoulme

atoulme commented Aug 19, 2025

Copy link
Copy Markdown
Contributor

Can you please join a SIG meeting and discuss this with the community? That would help move things forward. Thanks!

@kyo-ke

kyo-ke commented Aug 27, 2025

Copy link
Copy Markdown
Contributor Author

In SIG meeting(Aug 26, 2025) got 2 feedback

  1. Why datapoint/logrecod/span level enrichment is required
  2. Need to use configsource to avoid duplicate

@kyo-ke

kyo-ke commented Aug 27, 2025

Copy link
Copy Markdown
Contributor Author

In SIG meeting(Aug 26, 2025) got 2 feedback

  1. Why datapoint/logrecod/span level enrichment is required
  2. Need to use configsource to avoid duplicate

I have some thoughts on 1
In opentelemetry specification db.namespace is not described in Entity section.
https://opentelemetry.io/docs/specs/semconv/registry/attributes/db/#db-namespace
Also these are consumed as datapoint level attribute by prometheus receiver
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/internal/prom_to_otlp.go#L45

I think it's fair to think there is situation like in one ResourceMetrics contains multiple db.namespace and user want to enrich datapoint level attribute by owner.

May be user need to use groupbyattrsprocessor in this case.

Another case which we need to enrich datapoint level attribute is adding username based on userid.
This happens if organization manages resource by creating unique id and that id is part of attribute but user want to query based on actual name.
In this case, these attributes are not Entity level

So think there is usecase for functionality for enriching datapoint/log/span level attribute

@jsvd

jsvd commented Sep 3, 2025

Copy link
Copy Markdown
Contributor

@kyo-ke 👋 I'm back from vacation and noticed this PR and its movement and discussion in a past SIG, would love to chat about this and see if we can come up with a joint proposal that can be sponsored so we can move on with implementation. I'm on the community slack as João Duarte (EU timezone).

@kyo-ke

kyo-ke commented Sep 7, 2025

Copy link
Copy Markdown
Contributor Author

@kyo-ke 👋 I'm back from vacation and noticed this PR and its movement and discussion in a past SIG, would love to chat about this and see if we can come up with a joint proposal that can be sponsored so we can move on with implementation. I'm on the community slack as João Duarte (EU timezone).

@jsvd Thank you for taking a look at this.
will join SIG meeting in EU timezone but before this let me share my opinion here.
In last ASIA timezone SIG, I got some feedback. One thing we need to clarify here is
Whether we need DataPoint/logRecord/Span level attribute enrichmenst.

To answer this question we need to clarify 2 thing

  1. Do we need not entity attribute enrichment
  2. how we should treat entity related attribute which is generated as DataPoint/logRecord/Span.

For 1.
IMO sometime we want to do enrichment based on client side data(not entity side) like we want to monitor telemetry based on user name not userID

For 2.
Sometimes entity level attribute is generated as DataPoint/logRecord/Span level attribute.
One of our use case is monitoring database server.
Database namespace is treated as DataPoint attribute when we use query-exporter(https://github.com/albertodonato/query-exporter) and think this should be entity level attribute.
If we only focusing on resource level attribute enrichment we need to promote these attribute to resource attribute using groupby processor to operate enrichment based on this type of attribute.

IMO regrouping always for this type of attribute is too much

Correct me if I'm misunderstanding concept.

Helpful if you share your opinion/use case.

Thanks!

@jsvd

jsvd commented Sep 15, 2025

Copy link
Copy Markdown
Contributor

Do we need not entity attribute enrichment

I do think enrichment should be possible across the entire set of attributes of each signal:

      - Traces: resource, span, span event
      - Metrics: resource and every data point type
      - Logs: resource and log record

This is particularly important for logs where log records can contain events with properties related to entities passable of being mapped into other values.

how we should treat entity related attribute which is generated as DataPoint/logRecord/Span.

IMO the processor should iterate over the maps and perform the lookup whenever the attributes are found. A performance improvement could be done to focus the traversal on resource attributes only if a configuration such as scope: [resource] or similar is enabled.

@github-actions

Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions Bot added the Stale label Sep 30, 2025
@github-actions github-actions Bot removed the Stale label Oct 1, 2025
@kyo-ke

kyo-ke commented Oct 19, 2025

Copy link
Copy Markdown
Contributor Author

Thank you @jsvd for the comment.
Will update this PR to skip if scope is resource.

do you think we can get sponsor for this by bringing this up in sig meeting?

@jsvd

jsvd commented Oct 20, 2025

Copy link
Copy Markdown
Contributor

hi @kyo-ke, I had a parallel effort to add lookup-based enrichment to the Collector by following the recommended contribution process as closely as possible, which started with a "formal" proposal: #41816.

This proposal suggests the creation of a processor that performs enrichment on signals, and the sources for the data (or metadata) come from extensions to the processor. These extensions would have a common interface and behave in a similar manner (e.g. caching, error handling, metrics, etc.). Example lookup extensions could be: HTTP lookups, lookups in CSV/JSON/YAML, lookups in databases, etc.

IMO #41816 has several similarities to this effort and it'd be nice to work together on delivering this feature.

I haven't put up a "meaty" PR yet as that proposal doesn't have a sponsor either but I added a skeleton PR #43120 to show how the interfaces would work together.

The proposal has been brought up in a couple of SIG meetings looking for sponsorship, and while it has been collecting +1s and ❤️ s, I haven't had any luck either, but not losing hope :)

Sorry for the direct ping here @atoulme, but I'm wondering if you'd be interested in getting involved in this effort, either through @kyo-ke's or my proposal (or a joint one). The proposal at #41816 is the 2nd/3rd most voted open proposal looking for sponsorship at this point our of nearly 30. This demonstrates lookup-based enrichment is a sough after feature in the Collector, and I'd like to avoid having a single person (or company) own the feature. Happy to discuss this in any channel or medium.

@github-actions

github-actions Bot commented Nov 4, 2025

Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions Bot added the Stale label Nov 4, 2025
@github-actions

Copy link
Copy Markdown
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions Bot closed this Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discussion needed Community discussion needed Sponsor Needed New component seeking sponsor Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants