Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: exclude_matching_metrics is excluding non-ProcessSamples #1943

Merged
merged 8 commits into from
Nov 6, 2024

Conversation

DavSanchez
Copy link
Contributor

@DavSanchez DavSanchez commented Oct 29, 2024

This fixes NR-334472.

Due to how the logic for the exclude_matching_metrics option was implemented for 1.57.2, when this setting is present all samples that are not ProcessSamples are filtered out and not sent.

This reworks the matcher logic to cover against this and add relevant unit tests to check for the inclusion or exclusion of non-ProcessSample events.

@DavSanchez DavSanchez force-pushed the fix/exclude_matching_process_metrics branch 2 times, most recently from f6aeccf to d22da5f Compare October 29, 2024 22:41
@DavSanchez
Copy link
Contributor Author

DavSanchez commented Oct 29, 2024

Commit bbe25a6 touches pretty ancient code to remove a check that we were performing in different sections of the logic and in a different way each time: if the sample is a ProcessSample or not.

These duplicate checks were removed at the cost of some duplication regarding the include/exclude behavior, though the code might be more explicit this way and thus easier to follow.

I have doubts regarding behavior when feature flags are involved. To clarify these doubts I am counting on:

  • Knowing exactly what would be the behavior when FF is enabled
  • Debug with an account that uses this FF

That issue aside, I'll be doing more manual tests for this and probably involve someone else to test live and caught potential issues. We need more eyes!

@DavSanchez DavSanchez marked this pull request as ready for review October 29, 2024 23:01
@DavSanchez DavSanchez requested a review from a team October 29, 2024 23:01
@DavSanchez DavSanchez force-pushed the fix/exclude_matching_process_metrics branch from d22da5f to bbe25a6 Compare October 30, 2024 10:38
@DavSanchez
Copy link
Contributor Author

DavSanchez commented Oct 30, 2024

Last commit (92d3411) removed redundant checks that were made inside the NewExcludeSampleMatchFn function. Now it does not need to receive the enableProcessMetrics config value nor the ffRetriever, so it looks less similar to NewIncludeSampleMatchFn and thus more clear, less prone to include/exclude confusions.

Tested manually (both commits bbe25a6 and 92d3411) with all the combinations of:

  • Include/exclude matching metrics:
    • No rules
    • Include rules with include_matching_metrics
    • Exclude rules with exclude_matching_metrics
    • Both include and exclude rules with include_matching_metrics + exclude_matching_metrics
  • Presence of the enable_process_metrics setting:
    • Unset (nil)
    • false
    • true

The behavior is the same in both and the expected behavior.

enable_process_metrics include_matching_metrics exclude_matching_metrics ProcessSample
Unset (nil) No No Not emitted
Unset (nil) regex systemd No *systemd* only
Unset (nil) No regex systemd Not emitted
Unset (nil) regex systemd regex systemd *systemd* only
false No No Not emitted
false regex systemd No Not emitted
false No regex systemd Not emitted
false regex systemd regex systemd Not emitted
true No No Emitted (unfiltered)
true regex systemd No *systemd* only
true No regex systemd non-*systemd* only
true regex systemd regex systemd *systemd* only

Non-ProcessSamples are always emitted (unfiltered).

Queries used in tests for review and future reference: https://onenr.io/08jqegodOjl

Ready for review!

alvarocabanas
alvarocabanas previously approved these changes Oct 31, 2024
Copy link
Contributor

@alvarocabanas alvarocabanas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just left a NIT that is to add a comment pointing out the matchers only apply to ProcessSample, but overall I think now it's well tested and does what it's supposed to. This Matcher function was designed to be generic but is only applied to ProcessSample, and that has given an obfuscation that was error prone. I think your change adding the check explicitly add the beginning of the functions makes it much clearer, and the tests you added avoid this to happen again.

pkg/metrics/sampler/matcher_test.go Show resolved Hide resolved
pkg/metrics/sampler/matcher.go Outdated Show resolved Hide resolved
return matcher
// NewIncludeSampleMatchFn returns a function `func(sample) bool` that determinesif the sample
// should be included (true) as an event or not (false). Note that this is NOT the negation
// of `NewExcludeSampleMatchFn`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also add a comment that it only works for ProcessSample.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 1a9f8d8!

gsanchezgavier
gsanchezgavier previously approved these changes Nov 5, 2024
Copy link
Contributor

@gsanchezgavier gsanchezgavier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic looks solid to me . I preferred your first approach to apply the sample filter on the Include fn rather than the matcher but is just a nit

Comment on lines 1212 to 1224
// Decides wether an event will be included or not.
// This kind of filtering only applies to ProcessSamples,
// so that's what we check here.
func (c *context) IncludeEvent(event any) bool {
shouldInclude := c.shouldIncludeEvent(event)
shouldExclude := c.shouldExcludeEvent(event)

return shouldInclude || !shouldExclude
switch event.(type) {
// rule is applied to process samples
case *process_sample_types.ProcessSample, *process_sample_types.FlatProcessSample:
shouldInclude := c.shouldIncludeEvent(event)
shouldExclude := c.shouldExcludeEvent(event)

return shouldInclude || !shouldExclude
default:
// other samples are included
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I actually liked this approach to evaluate the type of sample even before calling the matcher functions.

Copy link
Contributor Author

@DavSanchez DavSanchez Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have retrieved it in d386f89! LMKWYT


mlog.
WithField(config.TracesFieldName, config.FeatureTrace).
Trace("EnableProcessMetrics is TRUE and rules are NOT defined, ALL process metrics will be ENABLED")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite a minor thing but this log will be printed en case also when exclude matchers are added and no include ones.

Copy link
Contributor Author

@DavSanchez DavSanchez Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the latest commit (d386f89) solve this? I reworded a bit.

DavSanchez and others added 8 commits November 5, 2024 21:07
This should make the current tests fail and uncover the source of the issue
There were many different places where it was checked that the sample was a process sample or not, and the implementations were different each time. At the cost of a little function duplication I have removed the redundant checks and (hopefully) streamlined code

test: remove redundant test

tested function will never be applied to non-processsamples

fix: do not exclude when no exclusion rules are defined

docs: improve function names and state purpose

style: remove redundant function
style: print rich representation of excluded sample
the removed tests were checking the non-processsample exclusion + FF enabled behavior. Given the non-processsample check is now done elsewhere, these tests are now invalid
@DavSanchez DavSanchez force-pushed the fix/exclude_matching_process_metrics branch from 69333cc to 303b9bf Compare November 5, 2024 21:07
Copy link
Contributor

@gsanchezgavier gsanchezgavier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

internal/agent/agent.go Show resolved Hide resolved
@DavSanchez DavSanchez merged commit 01ed9a4 into master Nov 6, 2024
25 checks passed
@DavSanchez DavSanchez deleted the fix/exclude_matching_process_metrics branch November 6, 2024 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants