Skip to content

Cleanup ES|QL T-Digest code duplication, add memory accounting#143662

Merged
JonasKunz merged 15 commits intoelastic:mainfrom
JonasKunz:esql-tdigest-cleanup
Mar 6, 2026
Merged

Cleanup ES|QL T-Digest code duplication, add memory accounting#143662
JonasKunz merged 15 commits intoelastic:mainfrom
JonasKunz:esql-tdigest-cleanup

Conversation

@JonasKunz
Copy link
Copy Markdown
Contributor

@JonasKunz JonasKunz commented Mar 5, 2026

  • removes the duplicate code around T-Digest to byte[] encoding/decoding
  • makes accesses to T-Digests stored in blocks allocation free by requiring a scratch input parameter (similar to how it works for BytesRefBlocks)
  • Adds the possibility to deep-copy T-Digests from blocks using a BreakingTDigestHolder with full memory accounting: This follows the pattern of BreakingBytesRefBuilder and will be used for the agg-state on first/last_over_time
  • Replaces the usages of TDigestState with TDigest: TDigestState is just a decorator with serialization which don't use and don't want to use

@JonasKunz JonasKunz added >non-issue :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL labels Mar 5, 2026
@elasticsearchmachine elasticsearchmachine added v9.4.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 5, 2026
@JonasKunz JonasKunz changed the title Cleanup ES|QL T-Digest code duplication, reduce allocations Cleanup ES|QL T-Digest code duplication, add memory accounting Mar 5, 2026
@JonasKunz JonasKunz marked this pull request as ready for review March 5, 2026 12:22
@JonasKunz JonasKunz requested a review from kkrik-es March 5, 2026 12:22
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

blocks[offset] = blockFactory.newConstantTDigestBlock(resultHolder, 1);
blocks[offset + 1] = blockFactory.newConstantBooleanBlockWith(true, 1);
try (var tempHolder = BreakingTDigestHolder.create(breaker)) {
tempHolder.set(merger, sum, min, max);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to count?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to remove it, because it is not needed and redundant. The count always is the sum of the counts of all centroids in the merged TDigest.

@kkrik-es
Copy link
Copy Markdown
Member

kkrik-es commented Mar 5, 2026

No reason to include everything in a single PR. I'll get back to reviewing once you move BreakingBytesRefBuilder to a separate one. Agents can do that in minutes.

@JonasKunz JonasKunz requested a review from kkrik-es March 5, 2026 14:43
return dateRangeToString(from, to);
};
case TDIGEST -> (block, offset, scratch) -> ((TDigestBlock) block).getTDigestHolder(offset);
case TDIGEST -> (block, offset, scratch) -> ((TDigestBlock) block).getTDigestHolder(offset, new TDigestHolder());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap this in a helper that only takes offset as arg.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to, as that would encourage the non-allocation free usage.
This method with the scratch parameter is a common pattern for block access, e.g. see BytesRefBlock:
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/compute/src/main/generated-src/org/elasticsearch/compute/data/BytesRefBlock.java#L38

Copy link
Copy Markdown
Member

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments, approving to get you going.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (2)
  • Team:Delivery
  • Team:Search - Inference

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: a7767d6a-2cdc-433c-a70d-f393d866ff06

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JonasKunz JonasKunz enabled auto-merge (squash) March 6, 2026 08:13
@JonasKunz JonasKunz merged commit cea9eb7 into elastic:main Mar 6, 2026
36 checks passed
@JonasKunz JonasKunz deleted the esql-tdigest-cleanup branch March 6, 2026 09:45
spinscale pushed a commit to spinscale/elasticsearch that referenced this pull request Mar 6, 2026
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 6, 2026
…locations

* upstream/main: (153 commits)
  ES|QL: Update docs for TOP_SNIPPETS and DECAY (elastic#143739)
  Correctly include endpoint id in log msg in AuthorizationPoller (elastic#143743)
  Bar searching or sorting on _seq_no when disabled (elastic#143600)
  Generalize `testClientCancellation` test (elastic#143586)
  JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction (elastic#143702)
  Track recycler pages in circuit breaker (elastic#143738)
  [ESQL] Enable distributed pipeline breakers for external sources via FragmentExec (elastic#143696)
  Adding 'mode' and 'codec' fields to ES monitoring template (elastic#143673)
  [ESQL] Columnar I/O and vectorized block conversion for external sources (elastic#143703)
  Fix flaky MMR diversification YAML tests (elastic#143706)
  ES|QL codegen: check builder arguments for vector support (elastic#143724)
  Add Views Security Model (elastic#141050)
  ESQL: Prevent pushdown of unmapped fields in filters and sorts (elastic#143460)
  Don't run seq_no pruning tests in release CI (elastic#143725)
  ESQL: Support intra-row field references in ROW command (elastic#140217)
  ES|QL: Remove implicit limit in FORK branches in CSV tests (elastic#143601)
  IndexRoutingTests with and without synthetic id (elastic#143566)
  Synthetic id upgrade test in serverless (elastic#142471)
  Disable "Review skipped" comments for PRs without specified labels (elastic#143728)
  Cleanup ES|QL T-Digest code duplication, add memory accounting (elastic#143662)
  ...
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/ES|QL Timeseries / metrics / PromQL / logsdb capabilities in ES|QL Team:StorageEngine v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants