-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Add TopN support for exponential histograms #137313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f3a516b to
7ba2b30
Compare
7ba2b30 to
d2e8ffa
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
|
This should wait on #137368 |
# Conflicts: # x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/ExponentialHistogramArrayBlock.java # x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/ExponentialHistogramBlock.java # x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/data/ExponentialHistogramBlockAccessor.java
0a1ab00 to
bae17ea
Compare
|
#137368 is merged and conflicts have been fixed, so this one is ready for review again. |
dnhatn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some smaller comments, but LGTM. Thanks Jonas!
| } | ||
|
|
||
| @Override | ||
| public void serializeExponentialHistogram(int valueIndex, SerializedOutput out, BytesRef scratch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valueIndex should be position instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intended usage pattern for serializeExponentialHistogram is just like for getters on blocks:
for (int i=0; i<block.getPositionCount(); i++) {
for (int j = 0; j < block.getValueCount(i); j++) {
block.serializeExponentialHistogram(block.getFirstValueIndex(i) + j, ...)
}
}
So valueIndex instead of position is correct here.
Right now it is true for exponential histogram blocks that we just use the positions directly as valueIndex, but that will change when we support multi-values.
See also this comment: #133393 (comment)
I'll add a comment in ExponentialHistogramArrayBlock explaining the mapping of positions and valueIndices to positions in the sub-blocks.
| @Override | ||
| public void serializeExponentialHistogram(int valueIndex, SerializedOutput out, BytesRef scratch) { | ||
| long valueCount = valueCounts.getLong(valueCounts.getFirstValueIndex(valueIndex)); | ||
| out.appendLong(valueCounts.getLong(valueCounts.getFirstValueIndex(valueIndex))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valueCounts.getLong(valueCounts.getFirstValueIndex(valueIndex)) -> valueCount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are confusing this with the value count returned via getValueCount(position). This is a different value-count: It is the number of samples the histogram was generated from. I've added a comment to avoid this confusion.
| * @param out | ||
| * @param scratch | ||
| */ | ||
| void serializeExponentialHistogram(int valueIndex, SerializedOutput out, BytesRef scratch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valueIndex -> position
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #137313 (comment).
...gin/esql/compute/src/main/java/org/elasticsearch/compute/data/ExponentialHistogramBlock.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/compute/operator/topn/ResultBuilderForExponentialHistogram.java
Outdated
Show resolved
Hide resolved
| /** | ||
| * Abstraction to use for writing individual values via {@link #serializeExponentialHistogram(int, SerializedOutput, BytesRef)}. | ||
| */ | ||
| interface SerializedOutput { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to have these interfaces, but they're okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't want to expose the zero-threshold from the block to prevent it becoming a maintenance nightmare if we do changes to the disk format. That's why I want to keep the knowledge of its existence local to the block implementation.
I though pulling in a direct dependency on the TopNEncoder would not be a good idea, that's why I added these interfaces in between. If you prefer it, I can directly use the TopNEncoder in serializeExponentialHistogram instead in a follow-up
…-json
* upstream/main:
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
[LTR] Fix feature display order when using explain. (elastic#137671)
Remove extra RemoteClusterService instances in unit test (elastic#137647)
Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
Update bundled JDK to 25.0.1 (elastic#137640)
resolve indices for prefixed _all expressions (elastic#137330)
ESQL: Add TopN support for exponential histograms (elastic#137313)
allows field caps to be cross project (elastic#137530)
ESQL: Add exponential histogram percentile function (elastic#137553)
Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
Tighten on when THROTTLE decision can be returned (elastic#136794)
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
Add a test for two little known conditional processor paths (elastic#137645)
Extract a common ORIGIN constant (elastic#137612)
Remove early phase failure in batched (elastic#136889)
Returning correct index mode from get data streams api (elastic#137646)
[ML] Manage AD results indices (elastic#136065)
…-json
* upstream/main:
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=false} elastic#137691
Mute org.elasticsearch.xpack.inference.action.filter.ShardBulkInferenceActionFilterBasicLicenseIT testLicenseInvalidForInference {p0=true} elastic#137690
[LTR] Fix feature display order when using explain. (elastic#137671)
Remove extra RemoteClusterService instances in unit test (elastic#137647)
Fix `ComponentTemplatesFileSettingsIT.testSettingsApplied` (elastic#137669)
Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (elastic#137233)
Update bundled JDK to 25.0.1 (elastic#137640)
resolve indices for prefixed _all expressions (elastic#137330)
ESQL: Add TopN support for exponential histograms (elastic#137313)
allows field caps to be cross project (elastic#137530)
ESQL: Add exponential histogram percentile function (elastic#137553)
Wait for nodes to have downloaded databases in `GeoIpDownloaderIT` (elastic#137636)
Tighten on when THROTTLE decision can be returned (elastic#136794)
Mute org.elasticsearch.xpack.esql.qa.single_node.GenerativeMetricsIT test elastic#137655
Add a test for two little known conditional processor paths (elastic#137645)
Extract a common ORIGIN constant (elastic#137612)
Remove early phase failure in batched (elastic#136889)
Returning correct index mode from get data streams api (elastic#137646)
[ML] Manage AD results indices (elastic#136065)
Adds a
ValueExtractorand aResultBuilderfor exponential histograms. Sorting on histograms is not support, as they don't have a natural order.I really want to keep the implementation detail of how the sub-components of exponential histograms are laid out (e.g. the
zeroThresholdbeing a separate column instead of part of the encodedBytesRef) local to the block, block-builder and block-loader. The layout comes from theFieldMapperand I don't want changes there to ripple through the entire ES|QL code base. For that reason I added functions to serialize/deserialize individual histograms to the block and builder and used those from the TopNValueExtractorandResultBuilder.