fix(tag_cardinality_limit transform): mitigate performance bottleneck#19281
fix(tag_cardinality_limit transform): mitigate performance bottleneck#19281
Conversation
|
Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
|
/ci-run-regression |
| let storage = match &mode { | ||
| Mode::Exact => TagValueSetStorage::Set(HashSet::with_capacity(value_limit)), | ||
| Mode::Probabilistic(config) => { | ||
| let num_bits = config.cache_size_per_key / 8; // Convert bytes to bits |
There was a problem hiding this comment.
This conversion was wrong and BloomFilter::with_size takes the size in bytes, not bits, as an argument.
Regression Detector ResultsRun ID: c95e23d0-49e2-4044-b555-06f889b77ec5 Explanation A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI". For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Experiments that were not declared erratic but were detected as being so, coefficient of variation cutoff 0.10:
Fine details of change detection per experiment.
|
| let storage = match &mode { | ||
| Mode::Exact => TagValueSetStorage::Set(HashSet::with_capacity(value_limit)), | ||
| Mode::Probabilistic(config) => { | ||
| let num_bits = config.cache_size_per_key / 8; // Convert bytes to bits |
| /// Count of items inserted into the bloom filter. | ||
| /// We manually track this because `BloomFilter::count` has O(n) time complexity. |
There was a problem hiding this comment.
It also represents a different value. The bloom filter, if it is counting the bits in the filter, can only return the number of reduced items, which will frequently (usually?) be less than the number of actual items inserted in the filter.
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
|
|
||
| pub(crate) const fn default_cache_size() -> usize { | ||
| 5000 * 1024 // 5KB | ||
| 5 * 1024 // 5KB |
There was a problem hiding this comment.
This is a potential behavior change, in that it will cause more false positives for sites that have not explictly configured the cache size.
| /// Count of items inserted into the bloom filter. | ||
| /// We manually track this because `BloomFilter::count` has O(n) time complexity. |
There was a problem hiding this comment.
It also represents a different value. The bloom filter, if it is counting the bits in the filter, can only return the number of reduced items, which will frequently (usually?) be less than the number of actual items inserted in the filter.
Datadog ReportBranch report: ✅ |
Regression Detector ResultsRun ID: 248f4a82-ca98-49e0-9ed4-c72d63176611 Explanation A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI". For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Experiments that were not declared erratic but were detected as being so, coefficient of variation cutoff 0.10:
Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: f687d12f-ea75-4dc8-b89c-1c92d7b5cfac Explanation A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI". For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Experiments that were not declared erratic but were detected as being so, coefficient of variation cutoff 0.10:
Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: 9df2330e-d484-49f2-8f6f-ac8c16fabb60 Explanation A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI". For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Experiments that were not declared erratic but were detected as being so, coefficient of variation cutoff 0.10:
Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: a9ace7b3-f1b2-47e9-bd95-462ace0cff85 Explanation A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI". For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Experiments that were not declared erratic but were detected as being so, coefficient of variation cutoff 0.10:
Fine details of change detection per experiment.
|
Regression Detector ResultsRun ID: e618fae7-14d2-4e5f-abba-28ff37c68c13 Explanation A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI". For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
Changes in experiment optimization goals with confidence ≥ 90.00% and |Δ mean %| ≥ 5.00%:
Experiments that were not declared erratic but were detected as being so, coefficient of variation cutoff 0.10:
Fine details of change detection per experiment.
|
|
This is blocked from merging due to a bug in the regression detector that is flagging positive changes as blocking. I'll merge manually. |
This avoids calling BloomFilter::count which has a time complexity of O(n) where n is the number of bits in the filter.
127ab11 to
42683ef
Compare
This avoids calling
BloomFilter::countwhich has a time complexity ofO(n)wherenis the number of bits in the filter.