Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to sample logs #9118

Closed
wants to merge 8 commits into from
Closed

Conversation

atoulme
Copy link
Contributor

@atoulme atoulme commented Apr 8, 2022

Description:
Add ability to sample logs probabilistically using a log ID derived from the log timestamp and severity level.
Allows to set different sampling rates based on severity.

Link to tracking Issue:
#9117

Testing:
Unit tests

Documentation:
README and testdata content.

@atoulme atoulme requested a review from a team April 8, 2022 07:00
@atoulme atoulme requested a review from jpkrohling as a code owner April 8, 2022 07:00
Copy link
Contributor

@pmm-sumo pmm-sumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like adding this capability, but I have some concerns on consistency with trace sampling. I understand that log sampling is not part of the specification, but I believe we should do our best to make all signals being treated consistently, if possible.

  1. When Trace ID information is present in log, the log should be sampled consistently with the trace.
    This one might be non trivial to implement but perhaps we could at least have an option to sample logs basing on Trace ID, if present. Ideally, this could be more sophisticated and consider both severity-based and traceid-based sampling when making decisions.

  2. There's no way to override the sampling decision for logs, unlike for spans. I think we could consider supporting sampling.priority just as it is supported for tracing. This one is not part of specification right now but I think this mechanism could be useful here as well

Also, I think Specification should be filled with expectations on the log sampling, though it's a separate track

I am curious to hear other's thoughts on this

@Aneurysm9
Copy link
Member

I like adding this capability, but I have some concerns on consistency with trace sampling. I understand that log sampling is not part of the specification, but I believe we should do our best to make all signals being treated consistently, if possible.

I agree with this sentiment and would think that we need to work through how to ensure that any log sampling is well defined in the specification and can have consistent interoperability before we start implementation. The work @jmacd has done on consistent probability sampling can perhaps be a guide here.

@atoulme
Copy link
Contributor Author

atoulme commented Apr 9, 2022

I'll start a PR for log sampling. I like the ideas that @pmm-sumo puts forth, and I'll do my best to transcribe them into a spec. I will change the PR to adopt those ideas.

@pmm-sumo
Copy link
Contributor

pmm-sumo commented Apr 9, 2022

Thank you @atoulme I think this PR is a great start.

One idea I had in the meantime (which would solve the sampling.priority concern I described earlier) is adding a capability to provide sampling rates by attribute as well.So not only by severity but also by any specified record-level or resource-level attribute key/value. This would significantly extend the usecases for the log sampling (since now it would be also possible to have separate rates per service, etc.)

@atoulme
Copy link
Contributor Author

atoulme commented Apr 9, 2022

The work has started here: open-telemetry/opentelemetry-specification#2482

FWIW I think given the feedback I'll drop all severity-related sampling. Instead, we can use the overriding sampling.priority approach mentioned here.

@atoulme atoulme requested a review from pmm-sumo April 9, 2022 22:39
@pmm-sumo pmm-sumo dismissed their stale review April 11, 2022 19:29

Changes adressed

@pmm-sumo
Copy link
Contributor

So I think we have two directions here. Either keep probabilisticprocessor very basic/simple also for logs (which has its merits) and I think the current version is pretty close to it.

The second direction is to extend the capabilities a bit. Wanted to run this idea by you @atoulme @jpkrohling

I think the sampling for logs could be made more powerful and generic by applying part of the previous approach you had for severity. Logs are not troubled by the need to collect spans into traces, so we have more freedom here without resorting to groupbytrace and such

I'm specifically thinking about adding a list of rules (attributes) to match and assign different sampling priorities. E.g. perhaps being able to use something akin to following config:

processors:
  probabilistic_sampler/logs:
    policies:
      - key: sampling.priority
        value: 1
        sampling_percentage: 100
      - key: deployment.environment
        value: test
        sampling_percentage: 20
      - key: deployment.environment
        value: production
        sampling_percentage: 100

@atoulme
Copy link
Contributor Author

atoulme commented Apr 11, 2022

@pmm-sumo this is why I'm working on adding severity support for attributesprocessor (#9132), so I can add a priority to an attribute to the log record which would then be interpreted by this sampler.

@@ -29,5 +29,51 @@ processors:
sampling_percentage: 15.3
```

The probabilistic sampler supports sampling logs according to their trace ID, or by a specific log record attribute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, not by resource attributes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No?


The following configuration options can be modified:
- `hash_seed` (no default, optional): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
- `sampling_percentage` (default = 0, required): Percentage at which logs are sampled; >= 100 samples all logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 0 mean here? Nothing is going to be sampled? If it's required, what does it mean to have a default?

The following configuration options can be modified:
- `hash_seed` (no default, optional): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
- `sampling_percentage` (default = 0, required): Percentage at which logs are sampled; >= 100 samples all logs
- `trace_id_sampling` (default = true, optional): Whether to use the log record trace ID to sample the log record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for entries without a trace ID?

}

var _ config.Processor = (*Config)(nil)

// Validate checks if the processor configuration is valid
func (cfg *Config) Validate() error {
if cfg.SamplingPercentage < 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there conflicting options? For instance, what does it mean to have a sampling source when using this processor in a traces pipeline?

@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2022

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 3, 2022
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants