Skip to content

ESQL: Add support for multivalue fields in Arrow output#114774

Merged
swallez merged 11 commits intoelastic:mainfrom
swallez:esql-arrow-multivalue
Oct 21, 2024
Merged

ESQL: Add support for multivalue fields in Arrow output#114774
swallez merged 11 commits intoelastic:mainfrom
swallez:esql-arrow-multivalue

Conversation

@swallez
Copy link
Copy Markdown
Contributor

@swallez swallez commented Oct 14, 2024

Adds support for multi-valued fields to the ES|QL output in Apache Arrow format.

Also reduces payload size by outputting "all true" validity vectors as an empty block, as allowed by the protocol.

This is a follow-up to PR #109873.

Tests

The unit tests already covered all data types and random combinations of fields and sparse/dense/empty vectors. This test harness has been updated to randomly choose if a column is single or multi-valued. The generated multi-valued columns can contain single-valued blocks (i.e. the block's firstValueIndexes stays null) to account for all possible combinations.

This PR has also been tested with the Rust Arrow implementation.

@swallez swallez requested a review from a team as a code owner October 14, 2024 21:55
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 14, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Oct 14, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @swallez, I've created a changelog YAML for you.

@astefan astefan requested a review from nik9000 October 18, 2024 09:41
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool.

I started by having a look at the tests. Only minor remarks so far.

Thanks @swallez !

@alex-spies alex-spies self-requested a review October 18, 2024 12:10
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, all done!

Looks nice! Only thing I think we should do before wrapping up is parameterizing the tests for multivalued fixed and variable length data types over all data types that we support right now. The converter code for the different data types is quite copy-pasty but distinct - so a multivalue test for integer doesn't prevent regressions for double etc.

Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing me to testRandomTypesAndSize - indeed, looks like test coverage is lovely. Nice, LGTM!

@alex-spies alex-spies added the auto-backport Automatically create backport pull requests when merged label Oct 21, 2024
@swallez swallez added v8.16.0 and removed v8.17.0 labels Oct 21, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.16

swallez added a commit to swallez/elasticsearch that referenced this pull request Oct 21, 2024
elasticsearchmachine pushed a commit that referenced this pull request Oct 22, 2024
…) (#115267)

* ESQL: Add support for multivalue fields in Arrow output (#114774)

* Do not use Java 21 APIs
swallez added a commit to swallez/elasticsearch that referenced this pull request Oct 22, 2024
elasticsearchmachine pushed a commit that referenced this pull request Oct 22, 2024
… (#115309)

* ESQL: Add support for multivalue fields in Arrow output (#114774)

* Do not use Java 21 APIs
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
@swallez swallez deleted the esql-arrow-multivalue branch November 4, 2024 11:29
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0 v8.17.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants