-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Add exponential histogram percentile function #137553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
cb171b9
Implement histogram percentile scalar function
JonasKunz 4cb99b3
Add unit tests
JonasKunz c9a0c5c
Fix javadoc
JonasKunz 1bf6d3a
spotless
JonasKunz 44d3436
Remove left-over override
JonasKunz 6804bb6
Merge branch 'main' into esql-percentile-function
JonasKunz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
152 changes: 152 additions & 0 deletions
152
...csearch/xpack/esql/expression/function/scalar/histogram/HistogramPercentileEvaluator.java
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
131 changes: 131 additions & 0 deletions
131
...rg/elasticsearch/xpack/esql/expression/function/scalar/histogram/HistogramPercentile.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| package org.elasticsearch.xpack.esql.expression.function.scalar.histogram; | ||
|
|
||
| import org.elasticsearch.common.io.stream.NamedWriteableRegistry; | ||
| import org.elasticsearch.common.io.stream.StreamInput; | ||
| import org.elasticsearch.common.io.stream.StreamOutput; | ||
| import org.elasticsearch.compute.ann.Evaluator; | ||
| import org.elasticsearch.compute.data.DoubleBlock; | ||
| import org.elasticsearch.compute.operator.EvalOperator; | ||
| import org.elasticsearch.exponentialhistogram.ExponentialHistogram; | ||
| import org.elasticsearch.exponentialhistogram.ExponentialHistogramQuantile; | ||
| import org.elasticsearch.xpack.esql.core.expression.Expression; | ||
| import org.elasticsearch.xpack.esql.core.tree.NodeInfo; | ||
| import org.elasticsearch.xpack.esql.core.tree.Source; | ||
| import org.elasticsearch.xpack.esql.core.type.DataType; | ||
| import org.elasticsearch.xpack.esql.expression.function.FunctionInfo; | ||
| import org.elasticsearch.xpack.esql.expression.function.Param; | ||
| import org.elasticsearch.xpack.esql.expression.function.scalar.EsqlScalarFunction; | ||
| import org.elasticsearch.xpack.esql.expression.function.scalar.math.Cast; | ||
| import org.elasticsearch.xpack.esql.io.stream.PlanStreamInput; | ||
|
|
||
| import java.io.IOException; | ||
| import java.util.List; | ||
|
|
||
| import static org.elasticsearch.xpack.esql.core.expression.TypeResolutions.ParamOrdinal.DEFAULT; | ||
| import static org.elasticsearch.xpack.esql.core.expression.TypeResolutions.isType; | ||
|
|
||
| /** | ||
| * Extracts a percentile value from a single histogram value. | ||
| * Note that this function is currently only intended for usage in surrogates and not available as a user-facing function. | ||
| * Therefore, it is intentionally not registered in {@link org.elasticsearch.xpack.esql.expression.function.EsqlFunctionRegistry}. | ||
| */ | ||
| public class HistogramPercentile extends EsqlScalarFunction { | ||
|
|
||
| public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry( | ||
| Expression.class, | ||
| "HistogramPercentile", | ||
| HistogramPercentile::new | ||
| ); | ||
|
|
||
| private final Expression histogram; | ||
| private final Expression percentile; | ||
|
|
||
| @FunctionInfo(returnType = { "double" }) | ||
| public HistogramPercentile( | ||
| Source source, | ||
| @Param(name = "histogram", type = { "exponential_histogram" }) Expression histogram, | ||
| @Param(name = "percentile", type = { "double", "integer", "long", "unsigned_long" }) Expression percentile | ||
| ) { | ||
| super(source, List.of(histogram, percentile)); | ||
| this.histogram = histogram; | ||
| this.percentile = percentile; | ||
| } | ||
|
|
||
| private HistogramPercentile(StreamInput in) throws IOException { | ||
| this(Source.readFrom((PlanStreamInput) in), in.readNamedWriteable(Expression.class), in.readNamedWriteable(Expression.class)); | ||
| } | ||
|
|
||
| Expression histogram() { | ||
| return histogram; | ||
| } | ||
|
|
||
| Expression percentile() { | ||
| return percentile; | ||
| } | ||
|
|
||
| @Override | ||
| protected TypeResolution resolveType() { | ||
| return isType(histogram, dt -> dt == DataType.EXPONENTIAL_HISTOGRAM, sourceText(), DEFAULT, "exponential_histogram").and( | ||
| isType(percentile, DataType::isNumeric, sourceText(), DEFAULT, "numeric types") | ||
| ); | ||
| } | ||
|
|
||
| @Override | ||
| public DataType dataType() { | ||
| return DataType.DOUBLE; | ||
| } | ||
|
|
||
| @Override | ||
| public Expression replaceChildren(List<Expression> newChildren) { | ||
| return new HistogramPercentile(source(), newChildren.get(0), newChildren.get(1)); | ||
| } | ||
|
|
||
| @Override | ||
| public boolean foldable() { | ||
| return histogram.foldable() && percentile.foldable(); | ||
| } | ||
|
|
||
| @Override | ||
| protected NodeInfo<? extends Expression> info() { | ||
| return NodeInfo.create(this, HistogramPercentile::new, histogram, percentile); | ||
| } | ||
|
|
||
| @Override | ||
| public String getWriteableName() { | ||
| return ENTRY.name; | ||
| } | ||
|
|
||
| @Override | ||
| public void writeTo(StreamOutput out) throws IOException { | ||
| source().writeTo(out); | ||
| out.writeNamedWriteable(histogram); | ||
| out.writeNamedWriteable(percentile); | ||
| } | ||
|
|
||
| @Evaluator(warnExceptions = ArithmeticException.class) | ||
| static void process(DoubleBlock.Builder resultBuilder, ExponentialHistogram value, double percentile) { | ||
| if (percentile < 0.0 || percentile > 100.0) { | ||
| throw new ArithmeticException("Percentile value must be in the range [0, 100], got: " + percentile); | ||
| } | ||
| double result = ExponentialHistogramQuantile.getQuantile(value, percentile / 100.0); | ||
| if (Double.isNaN(result)) { // can happen if the histogram is empty | ||
| resultBuilder.appendNull(); | ||
| } else { | ||
| resultBuilder.appendDouble(result); | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public EvalOperator.ExpressionEvaluator.Factory toEvaluator(ToEvaluator toEvaluator) { | ||
| var fieldEvaluator = toEvaluator.apply(histogram); | ||
| var percentileEvaluator = Cast.cast(source(), percentile.dataType(), DataType.DOUBLE, toEvaluator.apply(percentile)); | ||
| return new HistogramPercentileEvaluator.Factory(source(), fieldEvaluator, percentileEvaluator); | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
...search/xpack/esql/expression/function/scalar/histogram/HistogramPercentileErrorTests.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| package org.elasticsearch.xpack.esql.expression.function.scalar.histogram; | ||
|
|
||
| import org.elasticsearch.xpack.esql.core.expression.Expression; | ||
| import org.elasticsearch.xpack.esql.core.tree.Source; | ||
| import org.elasticsearch.xpack.esql.core.type.DataType; | ||
| import org.elasticsearch.xpack.esql.expression.function.ErrorsForCasesWithoutExamplesTestCase; | ||
| import org.elasticsearch.xpack.esql.expression.function.TestCaseSupplier; | ||
| import org.hamcrest.Matcher; | ||
|
|
||
| import java.util.List; | ||
| import java.util.Set; | ||
|
|
||
| import static org.hamcrest.Matchers.equalTo; | ||
|
|
||
| public class HistogramPercentileErrorTests extends ErrorsForCasesWithoutExamplesTestCase { | ||
|
|
||
| @Override | ||
| protected List<TestCaseSupplier> cases() { | ||
| return paramsToSuppliers(HistogramPercentileTests.parameters()); | ||
| } | ||
|
|
||
| @Override | ||
| protected Expression build(Source source, List<Expression> args) { | ||
| return new HistogramPercentile(source, args.get(0), args.get(1)); | ||
| } | ||
|
|
||
| @Override | ||
| protected Matcher<String> expectedTypeErrorMatcher(List<Set<DataType>> validPerPosition, List<DataType> signature) { | ||
| return equalTo(typeErrorMessage(false, validPerPosition, signature, (v, p) -> switch (p) { | ||
| case 0 -> "exponential_histogram"; | ||
| case 1 -> "numeric types"; | ||
| default -> throw new IllegalArgumentException("Unexpected parameter position: " + p); | ||
| })); | ||
| } | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Is it a good or a bad practice to include the invalid value in the warning message? E.g.
Asindoesn't include the wrong value in its warnings.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the percentile is a constant with the integration, we should verify it and reject invalid values instead. But this should be okay now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you mean that if someone writes a query like
STATS PERCENTILE(my_histo, 200)the query shouldn't even get to the execution phase, but we should reject it with an error.Am I understanding this correctly?
I unfortunately don't know what the correct callback / hook would be to perform this verification. I looked in the
PERCENTILEaggregation and thePOWfunction for examples, but they don't seem to be doing this kind of verification either.If you could provide me with an example on how to do this, I can add it in a follow-up.