-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Stratified Sampling Policy for Tailsampling processor #41877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d2edd94
05b7187
96de03c
e14ef2a
b937a77
c7d8ad4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| # Use this changelog template to create an entry for release notes. | ||
|
|
||
| # One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
| change_type: enhancement | ||
|
|
||
| # The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver) | ||
| component: processor/tailsamplingprocessor | ||
|
|
||
| # A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
| note: "Added stratified sampling policy to the tailsampling processor" | ||
|
|
||
| # Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. | ||
| issues: [40917] | ||
|
|
||
| # (Optional) One or more lines of additional information to render under the primary note. | ||
| # These lines will be padded with 2 spaces and then inserted directly into the document. | ||
| # Use pipe (|) for multiline entries. | ||
| subtext: The current implementation of the probabilistic sampling policy in the tail sampling processor in the OpenTelemetry Collector Contrib repository randomly samples a percentage of traces. This approach does not ensure that all the application service workflows associated with different transaction types get a representation in the sampled set of traces. A user can initiate any specific application functionality/operation, which subsequently triggers a corresponding subset of service components (or a workflow). For instance, in an e-commerce application designed using a microservices architecture, distinct operations, such as browsing, adding to a cart, and others will invoke different microservices. Each functionality will invoke service components in a defined order, with the invocation order representing a subgraph within the broader application workflow. Defining this subgraph of service components for servicing a request as the trajectory, for a sampled set of traces to truly represent an application and thus be of more value to the downstream tasks, all the trajectories must get at least one representation in the sampled set of traces for the given sampling interval. This new sampling policy, called the stratified sampling policy, samples a new trajectory whenever it is encountered for the first time within a sampling interval. If a trajectory has already been observed within that interval, the policy will revert to a probabilistic sampling approach, where trajectories are selected based on predefined probabilities. This ensures that newly encountered trajectories are prioritized for sampling while maintaining flexibility for previously seen trajectories. | ||
|
|
||
| # If your change doesn't affect end users or the exported elements of any package, | ||
| # you should instead start your pull request title with [chore] or use the "Skip Changelog" label. | ||
| # Optional: The change log or logs in which this entry should be included. | ||
| # e.g. '[user]' or '[user, api]' | ||
| # Include 'user' if the change is relevant to end users. | ||
| # Include 'api' if there is a change to a library API. | ||
| # Default: '[user]' | ||
| change_logs: [user] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,6 +27,8 @@ const ( | |
| // StringAttribute sample traces that an attribute, of type string, matching | ||
| // one of the listed values. | ||
| StringAttribute PolicyType = "string_attribute" | ||
| // Stratified Probabilistic samples a given percentage of traces considering the trace trajectory as well. | ||
| StratifiedProbabilistic PolicyType = "stratified" | ||
| // RateLimiting allows all traces until the specified limits are satisfied. | ||
| RateLimiting PolicyType = "rate_limiting" | ||
| // Composite allows defining a composite policy, combining the other policies in one | ||
|
|
@@ -60,6 +62,8 @@ type sharedPolicyCfg struct { | |
| NumericAttributeCfg NumericAttributeCfg `mapstructure:"numeric_attribute"` | ||
| // Configs for probabilistic sampling policy evaluator. | ||
| ProbabilisticCfg ProbabilisticCfg `mapstructure:"probabilistic"` | ||
| // Configs for stratified probabilistic sampling policy evaluator. | ||
| StratifiedProbabilisticCfg StratifiedProbabilisticCfg `mapstructure:"stratified"` | ||
| // Configs for status code filter sampling policy evaluator. | ||
| StatusCodeCfg StatusCodeCfg `mapstructure:"status_code"` | ||
| // Configs for string attribute filter sampling policy evaluator. | ||
|
|
@@ -170,6 +174,18 @@ type ProbabilisticCfg struct { | |
| SamplingPercentage float64 `mapstructure:"sampling_percentage"` | ||
| } | ||
|
|
||
| // StratifiedProbabilisticCfg holds the configurable settings to create a stratified probabilistic | ||
| // sampling policy evaluator. | ||
| type StratifiedProbabilisticCfg struct { | ||
| // HashSalt allows one to configure the hashing salts. This is important in scenarios where multiple layers of collectors | ||
| // have different sampling rates: if they use the same salt all passing one layer may pass the other even if they have | ||
| // different sampling rates, configuring different salts avoids that. | ||
| HashSalt string `mapstructure:"hash_salt"` | ||
|
Comment on lines
+180
to
+183
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please consider using the pkg/sampling support in this repository instead of a hash-based approach. OpenTelemetry systems are expected to observe the W3C TraceContext Level 2 specification, which means there are 56 bits of randomness available in one of two ways implemented by that library. We do not encourage hash-based sampling, see the approach we've taken to upgrade in probabilisticsampling processor, which is also the subject of this (current) blog post draft: open-telemetry/opentelemetry.io#7735. Moreover, there are other probability samplers in this component's configuration: I would expect them all to use the same approach, whatever it is, and would prefer to keep this code as simple as possible. |
||
| // SamplingPercentage is the percentage rate at which traces are going to be sampled. Defaults to zero, i.e.: no sample. | ||
| // Values greater or equal 100 are treated as "sample all traces". | ||
| SamplingPercentage float64 `mapstructure:"sampling_percentage"` | ||
| } | ||
|
|
||
| // StatusCodeCfg holds the configurable settings to create a status code filter sampling | ||
| // policy evaluator. | ||
| type StatusCodeCfg struct { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the description I read, I am not sure the term "Stratified" has been quite earned, though from the PR description of ("at least once") sampling, there's something useful here. Note that the
compositesampling policy of this component is similar to what you're proposing, too, except (IIUC) you're adding an at-least-once fallback instead of a default-bucket approach.If, as I take it, what you're trying to achieve is not based specifically on this at-least-once principal, but instead you are aiming just to achieve good coverage across all values in a key-space, then I support, but it leaves me with questions for this configuration struct. I would imagine wanting a rate-limited sampler that tries to achieve balance, which means estimating the most-frequent values in the set and assigning (somehow) the percentage to use for the remaining bunch. (This is what the
compositesampler policy in this component does.) I believe from looking into this problem, that the best answer would be only to configure a rate limit and nothing else; let the component figure out what sampling probabilities to use for which strata and also let the component control the relative weight of the "other bunch", which is to say: how much weight of the distribution falls into the default bucket vs how much is explicitly managed with a fixed-size lookup table used to calculate the probability that will achieve the intended rate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/probabilisticsamplerprocessor/README.md, which describes two modes of sampling that do not use any "salt".