ESQL: Strings support for MAX and MIN aggregations by ivancea · Pull Request #111544 · elastic/elasticsearch

ivancea · 2024-08-02T10:30:45Z

Support Version, Keyword and Text in Max an Min aggregations.

The current implementation of both max and min does:

For non-grouping:

Store a BytesRef
When there's a max/min, copy it to the internal array. Grow it if needed

For grouping:

Keep an array of BytesRef (null by default: there's no "initial/default value" here, as there's no "MAX" value for a string)
Each BytesRef stores their own array, which will be grown as needed to copy the new max/min

Some notes:

It's not shrinking the arrays, as to avoid having to copy, and potentially grow it again
It's using raw arrays. But maybe it should use BigArrays to compute in the circuit breaker?

Part of #110346

…aggregations

# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Max.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Min.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/aggregate/MaxTests.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/expression/function/aggregate/MinTests.java

github-actions · 2024-08-02T10:30:58Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2024-08-02T10:31:08Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-08-02T10:31:09Z

Pinging @elastic/kibana-esql (ES|QL-ui)

elasticsearchmachine · 2024-08-02T10:31:31Z

Hi @ivancea, I've created a changelog YAML for you.

ivancea · 2024-08-02T10:47:55Z

@elasticmachine update branch

...gin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/BytesRefArrayState.java

nik9000 · 2024-08-07T16:02:04Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

+from apps
+| eval x = version
+| where id > 2
+| stats max(version), a = max(version), b = max(x), c = max(case(name == "iiiii", "100.0.0"::version, version));


Is this consistent with _search if you sort by the version field? Version goes through a lot to encode itself in a way where sorting the bytes does a nice semver sort and I don't recall precisely what we did about that in ESQL.

It looks like we preserve that sorting.

Yes, same as _search. It also uses the same logic as GreaterThan/LesserThan/MvSort/SORT (The compareTo())

nik9000 · 2024-08-07T16:04:07Z

.../esql/compute/src/main/java/org/elasticsearch/compute/aggregation/MaxBytesRefAggregator.java

+
+@Aggregator({ @IntermediateState(name = "max", type = "BYTES_REF"), @IntermediateState(name = "seen", type = "BOOLEAN") })
+@GroupingAggregator
+class MaxBytesRefAggregator {


I think it's worth a comment saying that we're comparing the raw bytes representation of the BytesRef. That should be a valid and good sort for most things because we try to represent them that way. But it's not always the kind of sort you want and it's worth calling it out in javadoc.

Added some comments to both aggregators, explaining that they use the bytes natural order

.../esql/compute/src/main/java/org/elasticsearch/compute/aggregation/MaxBytesRefAggregator.java

nik9000 · 2024-08-07T16:07:05Z

...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/Max.java

+            name = "field",
+            type = { "boolean", "double", "integer", "long", "date", "ip", "keyword", "text", "long", "version" }
+        ) Expression field
+    ) {


Do you think it's worth adding a NOTE to the docs that the MAX of a keyword and text field is the highest value, sorted by the utf-8 representation? That's the behavior we're committing to here, and I could see a world where folks will need collations. But the utf-8 one is a useful default.

Right now, other functions I checked use this same logic (the BytesRef compareTo), and behave the same as the SORT command.
So, I'm not sure adding this here would make much sense. If we want to explain it, I wonder if it would be better at ESQL level, instead of at function level

nik9000

LGTM. Though I'd modify the description to change the note about using raw arrays.

nik9000 · 2024-08-19T17:38:23Z

...gin/esql/compute/src/main/java/org/elasticsearch/compute/aggregation/AbstractArrayState.java

    }

-    final boolean hasValue(int groupId) {
+    boolean hasValue(int groupId) {


I think we wouldn't want to extend this and get seen if we're overriding this method.

Oh God. Fixed! Added instead a single boolean state, just to know whether using a vector or a block in toBlock()

nik9000 · 2024-08-19T17:40:00Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec

+from apps
+| eval x = version
+| where id > 2
+| stats max(version), a = max(version), b = max(x), c = max(case(name == "iiiii", "100.0.0"::version, version));


…s tests

Support Version, Keyword and Text in Max an Min aggregations. The current implementation of both max and min does: For non-grouping: - Store a BytesRef - When there's a max/min, copy it to the internal array. Grow it if needed For grouping: - Keep an array of BytesRef (null by default: there's no "initial/default value" here, as there's no "MAX" value for a string) - Each BytesRef stores their own array, which will be grown as needed to copy the new max/min Some notes: - It's not shrinking the arrays, as to avoid having to copy, and potentially grow it again - It's using raw arrays. But maybe it should use BigArrays to compute in the circuit breaker? Part of elastic#110346

## Summary Close elastic/elasticsearch#111544 Follow-on to elastic/elasticsearch#111544

Support Version, Keyword and Text in Max an Min aggregations. The current implementation of both max and min does: For non-grouping: - Store a BytesRef - When there's a max/min, copy it to the internal array. Grow it if needed For grouping: - Keep an array of BytesRef (null by default: there's no "initial/default value" here, as there's no "MAX" value for a string) - Each BytesRef stores their own array, which will be grown as needed to copy the new max/min Some notes: - It's not shrinking the arrays, as to avoid having to copy, and potentially grow it again - It's using raw arrays. But maybe it should use BigArrays to compute in the circuit breaker? Part of elastic#110346

ivancea added 15 commits July 31, 2024 13:48

Moved test case generators to AbstractFunctionTestCase and fixed for …

fb915d3

…aggregations

Max and Min tests, and fixes for testint folding and nulls

fe46b90

Updated AVG and WEIGHTED_AVG tests

743e010

Deprecated old DefaultChekcs method

cedb0f3

Remove repeated tests

efb188c

Fixed tests after fix in MIN/MAX type checks

87d90c3

Merge branch 'main' into aggregation-test-cases-types-nulls

e49f5ce

Updated after merge with validFunctionParameters change

59d9aca

Fixed VerifierTests

7e5521b

Added new available types to Max and Min, and function tests

8447df0

Added BytesRef state implementation

e446268

Minor fixes and added capability

f985f6b

CSV version tests

c4ada6f

CSV tests for max and min with strings

2ce4be6

ivancea added >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL ES|QL-ui Impacts ES|QL UI labels Aug 2, 2024

ivancea requested a review from nik9000 August 2, 2024 10:30

elasticsearchmachine added the v8.16.0 label Aug 2, 2024

ivancea and others added 2 commits August 2, 2024 12:31

Update docs/changelog/111544.yaml

1868f5f

Removed unused import

91762a6

elasticmachine and others added 2 commits August 2, 2024 20:47

Merge branch 'main' into max-min-aggs-bytesref

81647b5

Fixed tests using min and max

abd30fd

nik9000 reviewed Aug 7, 2024

View reviewed changes

ivancea added 4 commits August 12, 2024 13:24

Merge branch 'main' into max-min-aggs-bytesref

1bae79b

Use BreakingBytesRefBuilder in array state

1f96da2

Added copyBytes method

d205001

Added comments about BytesRef ordering to javadocs

81a6088

ivancea requested a review from nik9000 August 12, 2024 13:41

nik9000 approved these changes Aug 19, 2024

View reviewed changes

ivancea added 2 commits August 20, 2024 13:01

Merge branch 'main' into max-min-aggs-bytesref

55fd207

Removed seen array from BytesRefArrayState and add Max/Min aggregator…

59ebfed

…s tests

nik9000 approved these changes Aug 20, 2024

View reviewed changes

ivancea merged commit e3f378e into elastic:main Aug 20, 2024

ivancea deleted the max-min-aggs-bytesref branch August 20, 2024 13:25

drewdaemon mentioned this pull request Aug 22, 2024

[ES|QL] min and max allow strings elastic/kibana#191119

Merged

drewdaemon added a commit to elastic/kibana that referenced this pull request Aug 22, 2024

[ES|QL] min and max allow strings (#191119)

0c911c8

## Summary Close elastic/elasticsearch#111544 Follow-on to elastic/elasticsearch#111544

luigidellaquila mentioned this pull request May 23, 2025

[CI] EsqlClientYamlIT test {p0=esql/30_types/constant_keyword} failing #128341

Closed

luigidellaquila mentioned this pull request Oct 1, 2025

[ES|QL] Add support for MAX/MIN aggregation on version field type #102126

Closed

Conversation

ivancea commented Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 2, 2024

Uh oh!

elasticsearchmachine commented Aug 2, 2024

Uh oh!

elasticsearchmachine commented Aug 2, 2024

Uh oh!

elasticsearchmachine commented Aug 2, 2024

Uh oh!

ivancea commented Aug 2, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ivancea commented Aug 2, 2024 •

edited

Loading