Skip to content

Conversation

@JonasKunz
Copy link
Contributor

@JonasKunz JonasKunz commented Aug 22, 2025

Adds a barebones ES|QL and block type for exponential histograms.

To keep this PR as small as possible I've reduced the type to just storing the scale of a histogram and nothing else.
Everything else will be added in follow-up PRs.

Update: To better get the bigger picture, the block has been now fully fleshed out with all sub-blocks in this PR.

In this PR I've marked the shortcuts / disabled tests that definitely need work before we eventually can put this into tech-preview with TODO(b/133393).
Note that I've also excluded the type from some tests (e.g. TopN) without a TODO(b/133393): These are tests which cover functionality which I think won't be needed, at least for a tech-preview. Please carefully review them and let me know if those cover functionality which should work, so that I can add the TODO(b/133393) there aswell.

Initally this PR also included some CSV-tests. But I decided to remove them for now, as they would require implementing a blockloader, increasing the size of the PR unnecessarily. I'll add them back together with the blockloader in a followup PR.

@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.2.0 labels Aug 22, 2025
@JonasKunz JonasKunz force-pushed the exp-histo-esql branch 2 times, most recently from 1a4c296 to 65b3d0e Compare August 26, 2025 13:42
@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2025

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@JonasKunz JonasKunz changed the title PoC: ES|QL block type for exponential histograms ES|QL block type for exponential histograms Sep 5, 2025
@JonasKunz JonasKunz added :StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL >feature labels Sep 5, 2025
@JonasKunz JonasKunz marked this pull request as ready for review September 5, 2025 09:54
@JonasKunz JonasKunz requested a review from kkrik-es September 5, 2025 09:54
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@kkrik-es kkrik-es requested a review from dnhatn September 5, 2025 11:47
Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it to the esql experts to approve.

# Conflicts:
#	benchmarks/src/main/java/org/elasticsearch/benchmark/exponentialhistogram/ExponentialHistogramMergeBench.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
#	x-pack/plugin/mapper-exponential-histogram/src/main/java/org/elasticsearch/xpack/exponentialhistogram/ExponentialHistogramFieldMapper.java
@JonasKunz JonasKunz force-pushed the exp-histo-esql branch 2 times, most recently from 14d8614 to f491a55 Compare October 23, 2025 12:28
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the iterations. There are several things we need to harden for this block, but since it's guarded with the feature flag, let's get it in so you can start integrating. We can improve or fix these before removing the feature flag.


@Override
public int getValueCount(int position) {
return isNull(position) ? 0 : 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delegate to encodedHistograms instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the mapping of positions -> valueIndices -> subBlock-positions (see #133393 (comment)), imo the current implementation is clearer, as it delegates the details of this mapping to the isNull method?

Why would you want to delegate to delegate to encodedHistograms here?


@Override
public int getFirstValueIndex(int position) {
return position;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delegate to encodedHistograms instead? I think, eventually, we should throw UnsupportedOperationException for all these operations, but it's okay for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think delegating to encodedHistograms.getFirstValueIndex() would be wrong here: E.g. in theory encodedHistograms could be a BytesRefOrdinalBlock returning the same value index for different positions. However, we need to be able to load the other components (e.g. min, max) given just a valueIndex, which then becomes impossible.

The current implementation uses the following model/semantics for positions and value indices:

  • We use the sub-blocks as plain storage without multi-value support, but nullability. E.g. they are effectively one-dimensional arrays to us, where each position either contains a single or no value.
  • We don't make any assumptions about the value-indices of the sub-blocks. We always access them via subBlock.getValue(subBlock.getFirstValueIndex(subBlockPosition)). This means we adhere to the Block interface guarantees.
  • Given a valueIndex within a ExponentialHistogramBlocks we need to be able to load all components (= the values from the subblocks) for that given valueIndex. We need to do a "array lookup" at the given index into the subblocks. We do by using valueIndex in the context of ExponentialHistogramBlock as positions into the subblocks (= basically one-dimensional arrays).
  • For now, ExponentialHistogramBlocks don't support multi-values. Therefore we use the simple mapping of the valueIndex of a ExponentialHistogramBlocks being exactly the position. We omit the firstValueIndexes indirection which otherwise would be needed. We can later add multi-value support easily by adding such a firstValueIndexes array to the ExponentialHistogramBlocks. Note that this won't change the fact that the sub-blocks are non-multivalued, we will still continue to use them as one-dimensional arrays. We will just use firstValueIndexes to change the mapping from ExponentialHistogramBlock-positions to valueIndexes. A valueIndex will still correspond to a position in a sub-block.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sorry to block you. Happy hunting on the next bits.

@JonasKunz JonasKunz merged commit ec0f171 into elastic:main Oct 28, 2025
34 checks passed
@JonasKunz JonasKunz deleted the exp-histo-esql branch October 28, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL Team:StorageEngine v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants