Skip to content

ES|QL query approximation: move sample correction to data node#144005

Merged
jan-elastic merged 10 commits intomainfrom
esql-approximation-move-sample-correction-to-data-node
Mar 13, 2026
Merged

ES|QL query approximation: move sample correction to data node#144005
jan-elastic merged 10 commits intomainfrom
esql-approximation-move-sample-correction-to-data-node

Conversation

@jan-elastic
Copy link
Copy Markdown
Contributor

@jan-elastic jan-elastic commented Mar 11, 2026

Currently, sample correction happens on the coordinator node. When the data nodes send exact stats, they send null buckets, which indicates no sample correction is needed. This works if all data nodes send sampled or exact stats, but fails when some send sampled stats and others exact stats. (The coordinator receives null and non-null buckets, aggregates them, gets a non-null total bucket, and corrects it for sampling.)

This is solved by moving the sampling to the data nodes. In the case of exact stats, all buckets equal to the exact stats are sent, which gives zero variance.

At the moment, the rounding also happens on the data nodes, leading to round-off errors for some stats. This will be solved in a follow-up PR.

The rounding is still done on the coordinator node to minimize round-off errors. Therefore, a new aggregation CountApproximate is introduced.

@jan-elastic jan-elastic added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.4.0 labels Mar 11, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Copy Markdown
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jan-elastic

As for the off-line discussion, if you have further fixes to add here, please do, I'll have a look

Copy link
Copy Markdown
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jan-elastic!

As a follow-up, I'd suggest to add approximation to the Generative tests. See how we do it for unmapped fields, it should be more or less the same.

// so return null instead. TODO: this criterion is not ideal, and should be revisited.
// Allow a little bit of numerical imprecision in the consistency check, which can happen
// due to round-off errors when aggregating zero-variance stats (e.g. AVG(x) BY x).
if (lower - 1e-12 * Math.abs(lower) <= bestEstimate && bestEstimate <= upper + 1e-12 * Math.abs(upper)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there is a specific calculation behind the choice of 1e-12 as tolerance? I guess we just consider it small enough?
Maybe we should track this TODO with an issue

Copy link
Copy Markdown
Contributor Author

@jan-elastic jan-elastic Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A double has 52 bits for the significant digits, so that's approx 10^-16 of relative precision.

10^-12 is a very small number, but still a lot larger than this precision.


import java.util.List;

public class CountApproximateAggregatorFunction implements AggregatorFunction {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically CountAggregatorFunction with Long -> Double.

Do we need to prevent this code duplication somehow?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, apart from countMasked() and blockIndex() the rest of the code looks the same but the types are different, so we won't save much with a refactoring

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll leave it like this. The one thing I can see doing is generating both from a StringTemplate file, like some other classes are.

@jan-elastic
Copy link
Copy Markdown
Contributor Author

The failing test is unrelated to my PR, and was already failing before this PR.

So merging (bypassing rules).

@jan-elastic jan-elastic merged commit f8a3f5e into main Mar 13, 2026
34 of 37 checks passed
@jan-elastic jan-elastic deleted the esql-approximation-move-sample-correction-to-data-node branch March 13, 2026 15:46
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 13, 2026
…elocations

* upstream/main: (72 commits)
  [Test] Randomly disable sequence numbers in CcrTimeSeriesDataStreamsIT (elastic#143930)
  Fix AsyncSearchIndexServiceTests.testCircuitBreaker failure (elastic#144058)
  Refine GenerativeIT some more, this time with accounting for some added (elastic#144220)
  ESQL: Physical Planning on the Lookup Node (elastic#143707)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats by with zero variance} elastic#144240
  Trigger counter metrics in test for delta temporality measurements (elastic#144193)
  fix capabiltiy approximation_v3 (elastic#144230)
  [ci] Add PR pipeline for testing ipv6 and fix tests not working with ipv6 (elastic#140473)
  update (elastic#144095)
  Make from/to optional in TBUCKET when Kibana timestamp filter is present (elastic#144057)
  Extract reroute behavior from create-index request classes (elastic#144140)
  ESQL: Fix release build only failures (elastic#144122)
  ES|QL query approximation: move sample correction to data node (elastic#144005)
  Add indexing pressure tracking to OTLP endpoints (elastic#144009)
  Fix replica writes after _seq_no doc values are pruned (elastic#144180)
  allow tests to configure supportsLoadingConfig (elastic#144061)
  [ES|QL] Unmute testGiantTextFieldInSubqueryIntermediateResultsWithSort (elastic#144126)
  [ESQL][DOCS] Add CPS page (unpublished for moment) (elastic#144206)
  ESQL: Forbid "load" unmapped_fields for certain commands (elastic#144115)
  Add CCS Remote Views Detection (elastic#143384)
  ...
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Mar 16, 2026
…ic#144005)

* looser bounds

* Move sample correction to data nodes.

* Buckets equal best estimate for exact counts (-> zero-width CI).

* update capability

* ignore test

* Don't round before divisions.

* CSV test with approximation=false

* looser CSV test bounds

* Fix ApproximationTests

* small refactor / renaming
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
…ic#144005)

* looser bounds

* Move sample correction to data nodes.

* Buckets equal best estimate for exact counts (-> zero-width CI).

* update capability

* ignore test

* Don't round before divisions.

* CSV test with approximation=false

* looser CSV test bounds

* Fix ApproximationTests

* small refactor / renaming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning >non-issue Team:ML Meta label for the ML team v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants