Add ability to sample logs #9118

atoulme · 2022-04-08T07:00:30Z

Description:
Add ability to sample logs probabilistically using a log ID derived from the log timestamp and severity level.
Allows to set different sampling rates based on severity.

Link to tracking Issue:
#9117

Testing:
Unit tests

Documentation:
README and testdata content.

processor/probabilisticsamplerprocessor/config.go

processor/probabilisticsamplerprocessor/logprobabilisticsampler.go

processor/probabilisticsamplerprocessor/logprobabilisticsampler_test.go

pmm-sumo

I like adding this capability, but I have some concerns on consistency with trace sampling. I understand that log sampling is not part of the specification, but I believe we should do our best to make all signals being treated consistently, if possible.

When Trace ID information is present in log, the log should be sampled consistently with the trace.
This one might be non trivial to implement but perhaps we could at least have an option to sample logs basing on Trace ID, if present. Ideally, this could be more sophisticated and consider both severity-based and traceid-based sampling when making decisions.
There's no way to override the sampling decision for logs, unlike for spans. I think we could consider supporting sampling.priority just as it is supported for tracing. This one is not part of specification right now but I think this mechanism could be useful here as well

Also, I think Specification should be filled with expectations on the log sampling, though it's a separate track

I am curious to hear other's thoughts on this

Aneurysm9 · 2022-04-08T23:53:07Z

I like adding this capability, but I have some concerns on consistency with trace sampling. I understand that log sampling is not part of the specification, but I believe we should do our best to make all signals being treated consistently, if possible.

I agree with this sentiment and would think that we need to work through how to ensure that any log sampling is well defined in the specification and can have consistent interoperability before we start implementation. The work @jmacd has done on consistent probability sampling can perhaps be a guide here.

atoulme · 2022-04-09T04:26:08Z

I'll start a PR for log sampling. I like the ideas that @pmm-sumo puts forth, and I'll do my best to transcribe them into a spec. I will change the PR to adopt those ideas.

pmm-sumo · 2022-04-09T17:43:07Z

Thank you @atoulme I think this PR is a great start.

One idea I had in the meantime (which would solve the sampling.priority concern I described earlier) is adding a capability to provide sampling rates by attribute as well.So not only by severity but also by any specified record-level or resource-level attribute key/value. This would significantly extend the usecases for the log sampling (since now it would be also possible to have separate rates per service, etc.)

atoulme · 2022-04-09T20:38:29Z

The work has started here: open-telemetry/opentelemetry-specification#2482

FWIW I think given the feedback I'll drop all severity-related sampling. Instead, we can use the overriding sampling.priority approach mentioned here.

Changes adressed

processor/probabilisticsamplerprocessor/README.md

pmm-sumo · 2022-04-11T19:47:28Z

So I think we have two directions here. Either keep probabilisticprocessor very basic/simple also for logs (which has its merits) and I think the current version is pretty close to it.

The second direction is to extend the capabilities a bit. Wanted to run this idea by you @atoulme @jpkrohling

I think the sampling for logs could be made more powerful and generic by applying part of the previous approach you had for severity. Logs are not troubled by the need to collect spans into traces, so we have more freedom here without resorting to groupbytrace and such

I'm specifically thinking about adding a list of rules (attributes) to match and assign different sampling priorities. E.g. perhaps being able to use something akin to following config:

processors:
  probabilistic_sampler/logs:
    policies:
      - key: sampling.priority
        value: 1
        sampling_percentage: 100
      - key: deployment.environment
        value: test
        sampling_percentage: 20
      - key: deployment.environment
        value: production
        sampling_percentage: 100

atoulme · 2022-04-11T21:33:39Z

@pmm-sumo this is why I'm working on adding severity support for attributesprocessor (#9132), so I can add a priority to an attribute to the log record which would then be interpreted by this sampler.

jpkrohling · 2022-04-18T18:53:36Z

processor/probabilisticsamplerprocessor/README.md

@@ -29,5 +29,51 @@ processors:
    sampling_percentage: 15.3
 ```

+The probabilistic sampler supports sampling logs according to their trace ID, or by a specific log record attribute.


So, not by resource attributes?

jpkrohling · 2022-04-18T18:58:07Z

processor/probabilisticsamplerprocessor/README.md

+
+The following configuration options can be modified:
+- `hash_seed` (no default, optional): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
+- `sampling_percentage` (default = 0, required): Percentage at which logs are sampled; >= 100 samples all logs


What does 0 mean here? Nothing is going to be sampled? If it's required, what does it mean to have a default?

jpkrohling · 2022-04-18T18:59:55Z

processor/probabilisticsamplerprocessor/README.md

+The following configuration options can be modified:
+- `hash_seed` (no default, optional): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
+- `sampling_percentage` (default = 0, required): Percentage at which logs are sampled; >= 100 samples all logs
+- `trace_id_sampling` (default = true, optional): Whether to use the log record trace ID to sample the log record.


What happens for entries without a trace ID?

jpkrohling · 2022-04-18T19:02:28Z

processor/probabilisticsamplerprocessor/config.go

 }

 var _ config.Processor = (*Config)(nil)

 // Validate checks if the processor configuration is valid
 func (cfg *Config) Validate() error {
+	if cfg.SamplingPercentage < 0 {


Are there conflicting options? For instance, what does it mean to have a sampling source when using this processor in a traces pipeline?

github-actions · 2022-05-03T05:16:37Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2022-05-17T05:16:56Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

atoulme requested a review from a team April 8, 2022 07:00

atoulme requested a review from jpkrohling as a code owner April 8, 2022 07:00

github-actions bot assigned codeboten Apr 8, 2022

atoulme force-pushed the log_sampling branch from c543e38 to cba81c1 Compare April 8, 2022 07:01

Add ability to sample logs

e2d5dee

atoulme force-pushed the log_sampling branch from cba81c1 to e2d5dee Compare April 8, 2022 07:07

TylerHelmuth reviewed Apr 8, 2022

View reviewed changes

processor/probabilisticsamplerprocessor/config.go Show resolved Hide resolved

jpkrohling reviewed Apr 8, 2022

View reviewed changes

pmm-sumo previously requested changes Apr 8, 2022

View reviewed changes

code review: validator for negative rates

e64b06a

atoulme mentioned this pull request Apr 9, 2022

Add spec for probabilistic log sampling open-telemetry/opentelemetry-specification#2482

Closed

atoulme requested a review from pmm-sumo April 9, 2022 22:39

Redo with a different approach, using attributes

fefae2b

atoulme force-pushed the log_sampling branch from d6b00cc to fefae2b Compare April 10, 2022 04:11

Merge branch 'main' into log_sampling

1071a0e

pmm-sumo reviewed Apr 11, 2022

View reviewed changes

processor/probabilisticsamplerprocessor/README.md Show resolved Hide resolved

atoulme and others added 4 commits April 11, 2022 14:38

fix lint

b139384

Merge branch 'main' into log_sampling

7d39fed

Merge branch 'main' into log_sampling

3026f9a

Update CHANGELOG.md

990212c

jpkrohling reviewed Apr 18, 2022

View reviewed changes

github-actions bot added the Stale label May 3, 2022

github-actions bot closed this May 17, 2022

atoulme mentioned this pull request Oct 13, 2022

[processor/probabilisticsampler] Log sampling #14920

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to sample logs #9118

Add ability to sample logs #9118

atoulme commented Apr 8, 2022

pmm-sumo left a comment •

edited

Loading

Aneurysm9 commented Apr 8, 2022

atoulme commented Apr 9, 2022

pmm-sumo commented Apr 9, 2022

atoulme commented Apr 9, 2022

pmm-sumo commented Apr 11, 2022

atoulme commented Apr 11, 2022

jpkrohling Apr 18, 2022

atoulme Oct 13, 2022

jpkrohling Apr 18, 2022

jpkrohling Apr 18, 2022

jpkrohling Apr 18, 2022

github-actions bot commented May 3, 2022

github-actions bot commented May 17, 2022

Add ability to sample logs #9118

Add ability to sample logs #9118

Conversation

atoulme commented Apr 8, 2022

pmm-sumo left a comment • edited Loading

Choose a reason for hiding this comment

Aneurysm9 commented Apr 8, 2022

atoulme commented Apr 9, 2022

pmm-sumo commented Apr 9, 2022

atoulme commented Apr 9, 2022

pmm-sumo commented Apr 11, 2022

atoulme commented Apr 11, 2022

jpkrohling Apr 18, 2022

Choose a reason for hiding this comment

atoulme Oct 13, 2022

Choose a reason for hiding this comment

jpkrohling Apr 18, 2022

Choose a reason for hiding this comment

jpkrohling Apr 18, 2022

Choose a reason for hiding this comment

jpkrohling Apr 18, 2022

Choose a reason for hiding this comment

github-actions bot commented May 3, 2022

github-actions bot commented May 17, 2022

pmm-sumo left a comment •

edited

Loading