Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Oct 14, 2020

This PR adds:

  1. Basic DictionaryArray coercion (not cast) support in DataFusion
  2. A test in sql.rs demonstrating basic operations using DataFusion on DictionaryArray arrays

Note that the performance operating on DictionaryArrays is likely to leave a lot to be desired -- specifically almost any operation will cause the DictionaryArray to get unpacked to a normal array reducing most/all of any performance gains.

I plan to add additional performance improvements over time -- but I felt getting queries to run was the first important step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a refactor, it is not meant to change the semantics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the parameter names here to make this macro more consistent with its documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test demonstrates DictionaryArrays being used in DataFusion

@github-actions
Copy link

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this and LGTM. Thanks a lot, @alamb . Really sweet and easy to follow.

The comments on the tests helped a lot!

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, there's a comment from Jorge @alamb, let us know if you'd like to address it, otherwise we can merge this

@alamb alamb force-pushed the alamb/ARROW-10159-dictionary-array-coercion-take-3 branch from 434bd01 to cd319b0 Compare October 18, 2020 11:35
@alamb
Copy link
Contributor Author

alamb commented Oct 18, 2020

LGTM, there's a comment from Jorge @alamb, let us know if you'd like to address it, otherwise we can merge this

Thanks @nevi-me -- I responded to @jorgecarleitao and I have rebased this PR against master. I think it is good to go, but I will wait for @jorgecarleitao 's response

@nevi-me nevi-me closed this in 551ab49 Oct 18, 2020
@alamb alamb deleted the alamb/ARROW-10159-dictionary-array-coercion-take-3 branch October 19, 2020 09:56
kszucs pushed a commit that referenced this pull request Oct 19, 2020
This PR  adds:
1. Basic `DictionaryArray` coercion (not cast) support in DataFusion
2. A test in `sql.rs` demonstrating basic operations using DataFusion on `DictionaryArray` arrays

Note that the performance operating on `DictionaryArrays` is likely to leave a lot to be desired -- specifically almost any operation will cause the `DictionaryArray` to get unpacked to a normal array reducing most/all of any performance gains.

I plan to add additional performance improvements over time -- but I felt getting queries to run was the first important step.

Closes #8463 from alamb/alamb/ARROW-10159-dictionary-array-coercion-take-3

Authored-by: alamb <[email protected]>
Signed-off-by: Neville Dipale <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants