-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[probabilistic sampling processor] encoded sampling probability (support OTEP 235) #31894
Conversation
…tor-contrib into jmacd/tvaluesampler
…tor-contrib into jmacd/tvaluesampler
…ype similar to configcomprsesion.CompressionType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from small details, I think this looks good. I would also ask you to think about which telemetry you'd want from this component in case of a bug report, and add metrics for them. Do we need to keep a histogram of Rs and Ts?
logger.Debug(description, zap.Error(err)) | ||
var se samplerError | ||
if errors.As(err, &se) { | ||
logger.Info(description, zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is part of the hot path, I'd rather have a metric recording the number of errors that happened, with samplerError being a dimension of the metric (error=sampler, error=other).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me suggest something different: let them all be (debug) logs for this PR, but then convert the metrics for this component as a whole from OC to OTel, using mdatagen. On that PR, you could add the new metric for the error conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of making the logs into Debug() and Info()-level. I also do not expect any of the Info()-level log statements to be common, for they require a misbehaving SDK or some other form of data corruption, as opposed to misconfigured sampling. Because these messages signal corruption of some kind, I think they should be logs. My expectation is that logs are sampled at runtime (or they should be) to avoid impacting the hot path.
With that said, I feel that a new metric is not helpful -- it just means the customer has to monitor something new when we have metric signals already. There is a single metric output by this code specifically, which will have a "policy=missing_randomness" when these errors arise.
We also have (or at least desire) standard pipeline metrics, which ought to provide a standard way to count how many spans succeed or fail. If the sampler is configured with FailClosed=true and these missing_randomness conditions are happening, the result will be loss of spans. I do not want the user to have to discover a new metric for this, because there ought to be a standard metric for rejected items. All processors should be able to count the number of items that are rejected for malformed data.
if errors.Is(err, sampling.ErrInconsistentSampling) { | ||
// This is working-as-intended. You can't lower | ||
// the threshold, it's illogical. | ||
logger.Debug(description, zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here about the metric vs. log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize: I think the existing metric's "policy=missing_randomness" is a useful-enough signal. Personally, I want standard pipeline metrics so that every component doesn't have to invent a bespoke metric definition.
} | ||
} | ||
if err := carrier.reserialize(); err != nil { | ||
logger.Info(description, zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and same here :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some sort of grievous corruption, and I personally do not want every component inventing new metrics to monitor for things we never expect to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(partial)
@@ -18,6 +18,7 @@ require ( | |||
go.opentelemetry.io/otel/metric v1.27.0 | |||
go.opentelemetry.io/otel/trace v1.27.0 | |||
go.uber.org/goleak v1.3.0 | |||
go.uber.org/multierr v1.11.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, now using errors.Join
. 95ecbae
@jmacd, please let me know once this is ready for another round. |
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
My personal opinion is that we should not invent new metrics to monitor for bugs, that is what logs are good at and if we think logs do not perform well enough, we should reconsider -- metrics are not clearly better for performance, compared with sampled logs. Moreover, I want us to encourage use of standard pipeline metrics. Practically all real processors are going to encounter errors that would cause data to be dropped or malformed in some way, and we shouldn't need new metrics for every one of them. For your question about histograms of R and T; there is something I would recommend, but not a histogram of T or R values (and both of these could be high cardinality), and probably not at default-verbosity level. What we do care about, and I'm open to suggestions, is that the sum of adjusted counts after sampling is expected to equal the sum of adjusted counts before sampling. This is a probabilistic argument, so the match is not exact. We should have a metric that counts items by their adjusted count. I would argue that such a metric should be standardized and comparable with standard pipeline metrics, so if Another direction to take this question is that the span-to-metrics connector should be able to use the adjusted counts so that it counts the number of representative spans, not the number of actual spans. This sounds more useful to me than a pipeline metric of adjusted count, and anyway I do not prefer to use metrics as a debugging signal. If there are bugs, I would recommend users connect a debugging exporter and review the output data. |
These test failures are independent, #33520. |
// Convert the accept threshold to a reject threshold, | ||
// then shift it into 56-bit value. | ||
reject := numHashBuckets - scaledSamplerate | ||
reject56 := uint64(reject) << 42 | ||
|
||
threshold, _ := sampling.UnsignedToThreshold(reject56) | ||
threshold, _ := sampling.UnsignedToThreshold(reject56) | ||
|
||
return &hashingSampler{ | ||
tvalueThreshold: threshold, | ||
hashSeed: cfg.HashSeed, | ||
return &hashingSampler{ | ||
tvalueThreshold: threshold, | ||
hashSeed: cfg.HashSeed, | ||
|
||
// Logs specific: | ||
logsTraceIDEnabled: cfg.AttributeSource == traceIDAttributeSource, | ||
logsRandomnessSourceAttribute: cfg.FromAttribute, | ||
// Logs specific: | ||
logsTraceIDEnabled: cfg.AttributeSource == traceIDAttributeSource, | ||
logsRandomnessSourceAttribute: cfg.FromAttribute, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sampler SIG, see here. Reference: open-telemetry/semantic-conventions#793 (comment)
|
||
### Equalizing | ||
|
||
This mode uses the same randomness mechanism as the propotional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mode uses the same randomness mechanism as the propotional | |
This mode uses the same randomness mechanism as the proportional |
@@ -105,16 +106,16 @@ func Test_tracesamplerprocessor_SamplingPercentageRange(t *testing.T) { | |||
}, | |||
numBatches: 1e5, | |||
numTracesPerBatch: 2, | |||
acceptableDelta: 0.01, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these adjustments are to get a faster but less-flaky test, what do you think of the idea of adding a loop? You can try a few times to get a result in the acceptable range; it will finish as soon as it sees an acceptable result.
This is the idea behind the "Eventually" function in the stretchr/testify
library (we don't use it here but the concept is sound).
- `sampling_percentage` (32-bit floating point, required): Percentage at which items are sampled; >= 100 samples all items, 0 rejects all items. | ||
- `hash_seed` (32-bit unsigned integer, optional, default = 0): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed. | ||
- `fail_closed` (boolean, optional, default = true): Whether to reject items with sampling-related errors. | ||
- `sampling_precision` (integer, optional, default = 4): Determines the number of hexadecimal digits used to encode the sampling threshold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe include the range of valid values here (1-14)?
@@ -45,6 +75,14 @@ type Config struct { | |||
// despite errors using priority. | |||
FailClosed bool `mapstructure:"fail_closed"` | |||
|
|||
// SamplingPrecision is how many hex digits of sampling | |||
// threshold will be encoded, from 1 up to 14. Default is 4. | |||
// 0 is treated as full precision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not according to invalid_zero.yaml.
@@ -230,46 +376,82 @@ func consistencyCheck(rnd randomnessNamer, _ samplingCarrier) error { | |||
// | |||
// Extending this logic, we round very small probabilities up to the | |||
// minimum supported value(s) which varies according to sampler mode. | |||
func makeSampler(cfg *Config) dataSampler { | |||
func makeSampler(cfg *Config, isLogs bool) dataSampler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add an explanation of 'isLogs' to the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are mainly cosmetic; in general, we need this; approving to start to push the train out of the station.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that older and newer collectors can co-exist and come to the same decisions, I think this is ready to go. The logs vs. metrics matter can be addressed later, and changed if we think it's easier to respond to possible bug reports.
Description: Creates new sampler modes named "equalizing" and "proportional". Preserves existing functionality under the mode named "hash_seed".
Fixes #31918
This is the final step in a sequence, the whole of this work was factored into 3+ PRs, including the new
pkg/sampling
and the previous step, #31946. The two new Sampler modes enable mixing OTel sampling SDKs with Collectors in a consistent way.The existing hash_seed mode is also a consistent sampling mode, which makes it possible to have a 1:1 mapping between its decisions and the OTEP 235 randomness and threshold values. Specifically, the 14-bit hash value and sampling probability are mapped into 56-bit R-value and T-value encodings, so that all sampling decisions in all modes include threshold information.
This implements the semantic conventions of open-telemetry/semantic-conventions#793, namely the
sampling.randomness
andsampling.threshold
attributes used for logs where there is no tracestate.The default sampling mode remains HashSeed. We consider a future change of default to Proportional to be desirable, because:
Link to tracking Issue:
Draft for open-telemetry/opentelemetry-specification#3602.
Previously #24811, see also open-telemetry/oteps#235
Part of #29738
Testing: New testing has been added.
Documentation: ✅