Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CASE expression to produce dictionary-encoded arrays in some cases #11605

Open
andygrove opened this issue Jul 22, 2024 · 1 comment
Labels
enhancement New feature or request performance Make DataFusion faster

Comments

@andygrove
Copy link
Member

andygrove commented Jul 22, 2024

Is your feature request related to a problem or challenge?

Here is an example CASE expression that produces an array containing exactly two distinct values:

CASE WHEN expr THEN 1 ELSE 0 END

We currently produce an array containing ones and zeros (and nulls). It seems like it would be beneficial to produce a dictionary-encoded array in this case to reduce memory pressure. This could also make subsequent operations more efficient in some cases.

@andygrove andygrove added enhancement New feature or request performance Make DataFusion faster labels Jul 22, 2024
@alamb
Copy link
Contributor

alamb commented Jul 23, 2024

I think implementing operations like this will be much easier if we are able to add LogicalTypes

#11513

(I am thinking specifically that we could have expressions dynamically decide what encoding to use for their output rather than having it hard coded)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Make DataFusion faster
Projects
None yet
Development

No branches or pull requests

2 participants