-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent config interface for filtering external datasets #25161
Comments
OTTL should definitely be the de facto solution when making decisions on data that has already be turned into pdata, but for situations where the filter configuration is modifying how the receiver collects data in the first place it gets trickier. If we want a truly uniform filtering experience across all components making decision at any moment I believe the best solution is OTTL. It will be the most robust, allowing users to build any type of condition they need. If we use OTTL we won't need to worry about feature parity between 2 condition systems. We have added a lot of plumbing to OTTL that will treat But taking a step back, I believe we can separate the configuration API itself from the underlying implementation details. I am not interested in changing the glob pattern matching that filelogreceiver uses - it is a very natural and appropriate form of identifying file paths to collect. Also, I don't fully comprehend situation 2. I don't think receivers should be using a filter configuration to determine what data to grab and how to modify it at the same time. I'd rather the modification of any data be handled in the transformprocessor. @dmitryax I think this would be a good topic for the SIG meeting. |
Sometimes it needs to be done in the same component. For example, we need to grab key/value pairs from k8s pod annotations AND labels making sure that key collisions are resolved. k8sattributes processor sets them with defferent prefies: |
Reading through the k8sattributes config I now see what you mean. It make sense why it allows setting extract:
labels:
- tag_name: $$1
key_regex: kubernetes.io/(.*) as something like extract:
labels:
# IDK what the actual syntax would look like for this as we havent written an extract function
# We are adding named parameters tho which ensure clarity
- extract(attributes, "$$1", "kubernetes.io/(.*)") But again, I don't want to add overcomplexity where it isn't needed. Introducing OTTL functions and conditions gives a lot of flexibility, but I don't want the added complexity if it isn't needed. Behind the scenes I still think it would be good to use OTTL, that way we aren't managing multiple filter frameworks. |
Considering non-OTTL options I lean towards option B, with the addition of a |
As said in the meeting, please keep I favor option B as well. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I also like option B), it's pretty convenient and readable, you don't need to learn new syntax of Though how would we support excluding elements? Two different fields with the same config structure, named For example including certain resource attribute values:
Example of excluding certain attribute values:
Another interesting complication from filtering resource attributes, is that they have a type: String, Slice, Map, Bytes. Would this work with maps / slice? |
Sounds good to me. We should clearly document the precedence logic that is applied between them.
I think for other types, we should only support the |
I also like option B. It's concise - brief enough, but comprehensible and unequivocal at the same time. I would change the term |
Problem
We use many different filtering interfaces in the collector configuration where we need to provide users with an option to limit an external set of data. The interfaces differ pretty significantly, making it confusing for the end user. Moreover, developers may find it unclear which interface to adopt for their components moving forward.
Here we are talking only about filtering external sets, not pdata objects which are supposed to be fully handled by OTTL.
There are two main use cases:
1. Just filter external data sets prior to further processing.
This scenario arises when users need to limit the external dataset before the collector proceeds with specific processing.
1.1 List of glob file paths:
Example: pkg/stanza/fileconsumer/matcher.Include/Exclude taking slices of glob strings to match against files to watch
While effective for file paths, this method might not be universally applicable to other data sources.
1.2 Filtering by strict/regex rules provided explicitly
This method is widely employed in the filtering processor, soon to be substituted by OTTL. Additionally, it has been implemented in the hostmetrics receiver with this structure:
It was introduced in open-telemetry/opentelemetry-collector#522 not to break the interfaces that were already built by that time. This will be fully replaced by OTTL in the filter processor and by #25134 in hostmetrics receiver.
1.3 Dynamically deducting regexp/glob/strict type from the string
This approach is used in dockerstats receiver
exclude_images
Strings wrapped with
/
represent regex, strings prefixed with!
represent negation, glob strings are automatically deducted if any the*?[]{}!
characters are found in the string.This approach is the least verbose but requires adding some additional escaping rules, e.g. if we want to do a strict match against
/some-string/
or!string
. Also, the unclear distinction between glob and regular strings can cause confusion.2. Convert external maps into a new set of attributes
This functionality is needed when users desire a configuration interface to map external maps into resource attributes. Examples include:
The filtering part of this is the same as in the use case (1), but we also need to provide a way for users to remap keys from the original map to the attribute keys. This cannot be done in the downstream component because different sources (like pod annotations and labels) can have the same key, so users need a way to avoid keys from one map being overridden by another map. The mapping part can also be solved as a separate configuration option, but it's nice to do it at once, especially when we need to map regex-mapped groups. For example, k8sattributes processor currently provides the following interface for this:
Goal
Resolving this issue does not require changing all the existing filtering interfaces. The goal is to establish a prescribed way to define the filtering interfaces for external datasets for any future use cases. This issue would also be a prerequisite for resolving the following ones:
Possible solutions
Option A
Keep using approach 1.3 "Dynamically deducting regexp/glob/strict type from the string".
Pros
For the use case (2), we cannot use a map because we need to preserve the order of the filtering/mapping rules evaluation, so we need to keep it as a list of structs with two fields (
filter_and_map
is just a placeholder):Cons
/
for regexp and the!
negation prefix. Potentially, we can avoid supporting the negation!
sign because it can always be handled by a regexp. Also, we can recommend using regex for strings wrapped with/
's, e.g./\/...\//
.?
sign as part of the string they want to match, but it'll match any character. Potentially we can remove glob expression support and keep only/regex/
andstrict-string
matchingOption B
Make every filtering item a struct that would include a matching type as a field name:
strict
,glob
regexp
. Setting more than one of them shouldn't pass validation. To support use case (2), we can simply add another optional field,target_key
, for example (note thatfilter
andfilter_and_map
are just placeholders, they can be anything else likeinclude_container_values
):Pros
Cons
Option C
Utilize OTTLP for the external datasets additionally to OTLP data. This would require introducing additional identifiers for the external items. It needs more discussion to determine its pros and cons
The text was updated successfully, but these errors were encountered: