ESQL: Add support for multivalue fields in Arrow output#114774
Merged
swallez merged 11 commits intoelastic:mainfrom Oct 21, 2024
Merged
ESQL: Add support for multivalue fields in Arrow output#114774swallez merged 11 commits intoelastic:mainfrom
swallez merged 11 commits intoelastic:mainfrom
Conversation
Collaborator
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Collaborator
|
Hi @swallez, I've created a changelog YAML for you. |
alex-spies
reviewed
Oct 18, 2024
alex-spies
reviewed
Oct 21, 2024
Contributor
alex-spies
left a comment
There was a problem hiding this comment.
Ok, all done!
Looks nice! Only thing I think we should do before wrapping up is parameterizing the tests for multivalued fixed and variable length data types over all data types that we support right now. The converter code for the different data types is quite copy-pasty but distinct - so a multivalue test for integer doesn't prevent regressions for double etc.
alex-spies
approved these changes
Oct 21, 2024
Contributor
alex-spies
left a comment
There was a problem hiding this comment.
Thanks for pointing me to testRandomTypesAndSize - indeed, looks like test coverage is lovely. Nice, LGTM!
Collaborator
💚 Backport successful
|
swallez
added a commit
to swallez/elasticsearch
that referenced
this pull request
Oct 21, 2024
elasticsearchmachine
pushed a commit
that referenced
this pull request
Oct 22, 2024
swallez
added a commit
to swallez/elasticsearch
that referenced
this pull request
Oct 22, 2024
elasticsearchmachine
pushed a commit
that referenced
this pull request
Oct 22, 2024
georgewallace
pushed a commit
to georgewallace/elasticsearch
that referenced
this pull request
Oct 25, 2024
jfreden
pushed a commit
to jfreden/elasticsearch
that referenced
this pull request
Nov 4, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds support for multi-valued fields to the ES|QL output in Apache Arrow format.
Also reduces payload size by outputting "all true" validity vectors as an empty block, as allowed by the protocol.
This is a follow-up to PR #109873.
Tests
The unit tests already covered all data types and random combinations of fields and sparse/dense/empty vectors. This test harness has been updated to randomly choose if a column is single or multi-valued. The generated multi-valued columns can contain single-valued blocks (i.e. the block's
firstValueIndexesstays null) to account for all possible combinations.This PR has also been tested with the Rust Arrow implementation.