ES|QL query approximation: move sample correction to data node by jan-elastic · Pull Request #144005 · elastic/elasticsearch

jan-elastic · 2026-03-11T10:46:19Z

Currently, sample correction happens on the coordinator node. When the data nodes send exact stats, they send null buckets, which indicates no sample correction is needed. This works if all data nodes send sampled or exact stats, but fails when some send sampled stats and others exact stats. (The coordinator receives null and non-null buckets, aggregates them, gets a non-null total bucket, and corrects it for sampling.)

This is solved by moving the sampling to the data nodes. In the case of exact stats, all buckets equal to the exact stats are sent, which gives zero variance.

~~At the moment, the rounding also happens on the data nodes, leading to round-off errors for some stats. This will be solved in a follow-up PR.~~

The rounding is still done on the coordinator node to minimize round-off errors. Therefore, a new aggregation CountApproximate is introduced.

elasticsearchmachine · 2026-03-11T10:47:01Z

Pinging @elastic/ml-core (Team:ML)

luigidellaquila

LGTM, thanks @jan-elastic

As for the off-line discussion, if you have further fixes to add here, please do, I'll have a look

x-pack/plugin/esql/qa/testFixtures/src/main/resources/approximation.csv-spec

luigidellaquila

LGTM, thanks @jan-elastic!

As a follow-up, I'd suggest to add approximation to the Generative tests. See how we do it for unmapped fields, it should be more or less the same.

luigidellaquila · 2026-03-13T12:33:23Z

.../org/elasticsearch/xpack/esql/expression/function/scalar/approximate/ConfidenceInterval.java

+        // so return null instead. TODO: this criterion is not ideal, and should be revisited.
+        // Allow a little bit of numerical imprecision in the consistency check, which can happen
+        // due to round-off errors when aggregating zero-variance stats (e.g. AVG(x) BY x).
+        if (lower - 1e-12 * Math.abs(lower) <= bestEstimate && bestEstimate <= upper + 1e-12 * Math.abs(upper)) {


Is there is a specific calculation behind the choice of 1e-12 as tolerance? I guess we just consider it small enough?
Maybe we should track this TODO with an issue

A double has 52 bits for the significant digits, so that's approx 10^-16 of relative precision.

10^-12 is a very small number, but still a lot larger than this precision.

jan-elastic · 2026-03-13T12:51:00Z

.../src/main/java/org/elasticsearch/compute/aggregation/CountApproximateAggregatorFunction.java

+
+import java.util.List;
+
+public class CountApproximateAggregatorFunction implements AggregatorFunction {


This is basically CountAggregatorFunction with Long -> Double.

Do we need to prevent this code duplication somehow?

I don't have a strong opinion, apart from countMasked() and blockIndex() the rest of the code looks the same but the types are different, so we won't save much with a refactoring

OK, I'll leave it like this. The one thing I can see doing is generating both from a StringTemplate file, like some other classes are.

jan-elastic · 2026-03-13T15:45:45Z

The failing test is unrelated to my PR, and was already failing before this PR.

So merging (bypassing rules).

…elocations * upstream/main: (72 commits) [Test] Randomly disable sequence numbers in CcrTimeSeriesDataStreamsIT (elastic#143930) Fix AsyncSearchIndexServiceTests.testCircuitBreaker failure (elastic#144058) Refine GenerativeIT some more, this time with accounting for some added (elastic#144220) ESQL: Physical Planning on the Lookup Node (elastic#143707) Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats by with zero variance} elastic#144240 Trigger counter metrics in test for delta temporality measurements (elastic#144193) fix capabiltiy approximation_v3 (elastic#144230) [ci] Add PR pipeline for testing ipv6 and fix tests not working with ipv6 (elastic#140473) update (elastic#144095) Make from/to optional in TBUCKET when Kibana timestamp filter is present (elastic#144057) Extract reroute behavior from create-index request classes (elastic#144140) ESQL: Fix release build only failures (elastic#144122) ES|QL query approximation: move sample correction to data node (elastic#144005) Add indexing pressure tracking to OTLP endpoints (elastic#144009) Fix replica writes after _seq_no doc values are pruned (elastic#144180) allow tests to configure supportsLoadingConfig (elastic#144061) [ES|QL] Unmute testGiantTextFieldInSubqueryIntermediateResultsWithSort (elastic#144126) [ESQL][DOCS] Add CPS page (unpublished for moment) (elastic#144206) ESQL: Forbid "load" unmapped_fields for certain commands (elastic#144115) Add CCS Remote Views Detection (elastic#143384) ...

…ic#144005) * looser bounds * Move sample correction to data nodes. * Buckets equal best estimate for exact counts (-> zero-width CI). * update capability * ignore test * Don't round before divisions. * CSV test with approximation=false * looser CSV test bounds * Fix ApproximationTests * small refactor / renaming

jan-elastic added 5 commits March 11, 2026 11:07

looser bounds

1eb413e

Move sample correction to data nodes.

ce9cfa8

Buckets equal best estimate for exact counts (-> zero-width CI).

8ae54ec

update capability

d66448e

ignore test

edabb55

jan-elastic requested a review from luigidellaquila March 11, 2026 10:46

jan-elastic added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.4.0 labels Mar 11, 2026

jan-elastic added 2 commits March 13, 2026 09:28

Don't round before divisions.

7617263

CSV test with approximation=false

58b4ea6

luigidellaquila reviewed Mar 13, 2026

View reviewed changes

x-pack/plugin/esql/qa/testFixtures/src/main/resources/approximation.csv-spec Outdated Show resolved Hide resolved

x-pack/plugin/esql/qa/testFixtures/src/main/resources/approximation.csv-spec Outdated Show resolved Hide resolved

luigidellaquila approved these changes Mar 13, 2026

View reviewed changes

jan-elastic added 2 commits March 13, 2026 10:20

looser CSV test bounds

6170d37

Fix ApproximationTests

e84f3ae

jan-elastic requested a review from luigidellaquila March 13, 2026 12:20

luigidellaquila approved these changes Mar 13, 2026

View reviewed changes

jan-elastic commented Mar 13, 2026

View reviewed changes

small refactor / renaming

2bb5e65

jan-elastic merged commit f8a3f5e into main Mar 13, 2026
34 of 37 checks passed

jan-elastic deleted the esql-approximation-move-sample-correction-to-data-node branch March 13, 2026 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ES|QL query approximation: move sample correction to data node#144005

ES|QL query approximation: move sample correction to data node#144005
jan-elastic merged 10 commits intomainfrom
esql-approximation-move-sample-correction-to-data-node

jan-elastic commented Mar 11, 2026 •

edited

Loading

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

luigidellaquila left a comment

Uh oh!

Uh oh!

Uh oh!

luigidellaquila left a comment

Uh oh!

luigidellaquila Mar 13, 2026

Uh oh!

jan-elastic Mar 13, 2026 •

edited

Loading

Uh oh!

jan-elastic Mar 13, 2026

Uh oh!

luigidellaquila Mar 13, 2026

Uh oh!

jan-elastic Mar 13, 2026

Uh oh!

jan-elastic commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		import java.util.List;

		public class CountApproximateAggregatorFunction implements AggregatorFunction {

Conversation

jan-elastic commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 11, 2026

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

luigidellaquila Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

jan-elastic Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jan-elastic Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

luigidellaquila Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

jan-elastic Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

jan-elastic commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jan-elastic commented Mar 11, 2026 •

edited

Loading

jan-elastic Mar 13, 2026 •

edited

Loading