Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/tailsampling] fix InvertNotSampled decision precedence when inside and sub policy #33671

Merged

Conversation

jamesrwhite
Copy link
Contributor

Description:

This fixes the handling of AND policies that contain a sub-policy with invert_match=true. Previously if the decision from a policy evaluation was NotSampled or InvertNotSampled it would return a NotSampled decision regardless, effectively downgrading the result.

This was breaking the documented behaviour that inverted decisions should take precedence over all others.

This is related to the changes made in #9768 that introduced support for using invert_match within and sub policies.

Link to tracking Issue: #33656

Testing:

I tested manually that this fixes the issue described in #33656 and also updated the tests. If you have any suggestions for more tests we could add let me know.

Documentation:

This fixes the handling of AND policies that contain a sub-policy
with invert_match=true. Previously if the decision from a policy
evaluation was NotSampled or InvertNotSampled it would return a
NotSampled decision regardless, effectively downgrading the result.

This was breaking the documented behaviour that inverted decisions
should take precedence over all others.

This is related to the changes made in open-telemetry#9768
that introduced support for using invert_match within and sub policies.
@jamesrwhite jamesrwhite requested a review from jpkrohling as a code owner June 20, 2024 10:54
@jamesrwhite jamesrwhite requested a review from a team June 20, 2024 10:54
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Jun 20, 2024
@jamesrwhite jamesrwhite changed the title [processor/tailsampling] fix inverted not sampled AND policy [processor/tailsampling] fix InvertNotSampled decision precedence when inside and sub policy Jun 20, 2024
@jpkrohling jpkrohling assigned jpkrohling and unassigned crobert-1 Jun 20, 2024
@jpkrohling
Copy link
Member

Thank you for the PR! This might need a changelog entry:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#adding-a-changelog-entry

@jamesrwhite
Copy link
Contributor Author

@jpkrohling I've added a changelog entry but I wasn't sure whether to classify this as a breaking change or not? The behaviour now matches the documentation with this change but it could be a breaking change for people if they were relying on the old behaviour.

@jpkrohling
Copy link
Member

I believe this is a bug fix.

@jpkrohling jpkrohling merged commit e2fda02 into open-telemetry:main Jun 20, 2024
154 checks passed
@github-actions github-actions bot added this to the next release milestone Jun 20, 2024
@jamesrwhite jamesrwhite deleted the fix-and-policy-invert-not-sampled branch July 2, 2024 17:37
hyang023 pushed a commit to hyang023/opentelemetry-collector-contrib that referenced this pull request Jul 16, 2024
…edence when inside and sub policy (open-telemetry#33671)"

This reverts commit e2fda02.
@karinhawk
Copy link

karinhawk commented Nov 5, 2024

update: didn't notice this was already covered in this bug.

We implemented tail sampling using v0.100.0 of the collector, and want to upgrade to the latest version, but have encountered being unable to replicate behaviour we took from the documentation examples of this repo (namely, the backwards compatibility policy) since that upgrade.

It seems that since v0.104.0 when this bug fix was merged that behaviour is no longer possible. I believe that the documentation examples are now not representative of the possible sampling decisions, which is misleading.

We liked the pattern of being able to sample everything but an array of services, which you would then create policies on what traces to keep from those services, as it made for a lean set of policies.

Our sampling used to work with the following, but now does not, as the "inverted not sample decision" is taking precedence over every subsequent "sample" or "not sample" policy made for those services - meaning those services will always NOT be sampled:

      {
        name: 'backwards-compatibility-policy',
        type: 'and',
        and: {
          and_sub_policy: [
            {
              name: 'services-using-tail_sampling-policy',
              type: 'string_attribute',
              string_attribute: {
                key: 'service.name',
                values: [
                  'otel-demo*',
                ],
                invert_match: true,
                enabled_regex_matching: true,
              },
            },
            {
              name: 'sample-all-policy',
              type: 'always_sample',
            },
          ],
        },
      },
       ...subsequent policies sampling the 'otel-demo' services

Keeping backwards compatibility with sampling services not ready to implement tail sampling at 100% is extremely important to us as we have such a large estate.

I understand this bug fix was needed, but would love some help to see if this behaviour can in fact be replicated after this change - it was working so well for us!

@jpkrohling
Copy link
Member

@karinhawk, we have another issue covering this, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/tailsampling Tail sampling processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants