-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to sample logs #9118
Conversation
processor/probabilisticsamplerprocessor/logprobabilisticsampler.go
Outdated
Show resolved
Hide resolved
processor/probabilisticsamplerprocessor/logprobabilisticsampler.go
Outdated
Show resolved
Hide resolved
processor/probabilisticsamplerprocessor/logprobabilisticsampler_test.go
Outdated
Show resolved
Hide resolved
processor/probabilisticsamplerprocessor/logprobabilisticsampler_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like adding this capability, but I have some concerns on consistency with trace sampling. I understand that log sampling is not part of the specification, but I believe we should do our best to make all signals being treated consistently, if possible.
-
When Trace ID information is present in log, the log should be sampled consistently with the trace.
This one might be non trivial to implement but perhaps we could at least have an option to sample logs basing on Trace ID, if present. Ideally, this could be more sophisticated and consider both severity-based and traceid-based sampling when making decisions. -
There's no way to override the sampling decision for logs, unlike for spans. I think we could consider supporting
sampling.priority
just as it is supported for tracing. This one is not part of specification right now but I think this mechanism could be useful here as well
Also, I think Specification should be filled with expectations on the log sampling, though it's a separate track
I am curious to hear other's thoughts on this
I agree with this sentiment and would think that we need to work through how to ensure that any log sampling is well defined in the specification and can have consistent interoperability before we start implementation. The work @jmacd has done on consistent probability sampling can perhaps be a guide here. |
I'll start a PR for log sampling. I like the ideas that @pmm-sumo puts forth, and I'll do my best to transcribe them into a spec. I will change the PR to adopt those ideas. |
Thank you @atoulme I think this PR is a great start. One idea I had in the meantime (which would solve the |
The work has started here: open-telemetry/opentelemetry-specification#2482 FWIW I think given the feedback I'll drop all severity-related sampling. Instead, we can use the overriding |
So I think we have two directions here. Either keep The second direction is to extend the capabilities a bit. Wanted to run this idea by you @atoulme @jpkrohling I think the sampling for logs could be made more powerful and generic by applying part of the previous approach you had for severity. Logs are not troubled by the need to collect spans into traces, so we have more freedom here without resorting to I'm specifically thinking about adding a list of rules (attributes) to match and assign different sampling priorities. E.g. perhaps being able to use something akin to following config: processors:
probabilistic_sampler/logs:
policies:
- key: sampling.priority
value: 1
sampling_percentage: 100
- key: deployment.environment
value: test
sampling_percentage: 20
- key: deployment.environment
value: production
sampling_percentage: 100 |
@@ -29,5 +29,51 @@ processors: | |||
sampling_percentage: 15.3 | |||
``` | |||
|
|||
The probabilistic sampler supports sampling logs according to their trace ID, or by a specific log record attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, not by resource attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No?
|
||
The following configuration options can be modified: | ||
- `hash_seed` (no default, optional): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed. | ||
- `sampling_percentage` (default = 0, required): Percentage at which logs are sampled; >= 100 samples all logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 0 mean here? Nothing is going to be sampled? If it's required, what does it mean to have a default?
The following configuration options can be modified: | ||
- `hash_seed` (no default, optional): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed. | ||
- `sampling_percentage` (default = 0, required): Percentage at which logs are sampled; >= 100 samples all logs | ||
- `trace_id_sampling` (default = true, optional): Whether to use the log record trace ID to sample the log record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for entries without a trace ID?
} | ||
|
||
var _ config.Processor = (*Config)(nil) | ||
|
||
// Validate checks if the processor configuration is valid | ||
func (cfg *Config) Validate() error { | ||
if cfg.SamplingPercentage < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there conflicting options? For instance, what does it mean to have a sampling source when using this processor in a traces pipeline?
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description:
Add ability to sample logs probabilistically using a log ID derived from the log timestamp and severity level.
Allows to set different sampling rates based on severity.
Link to tracking Issue:
#9117
Testing:
Unit tests
Documentation:
README and testdata content.