-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make_array
doesn't support mixed types with dictionaries
#841
Comments
make_array
doesn't support dictionary typesmake_array
doesn't support mixed types with dictionaries
@Kimahriman I've also been running into many issues around dictionary-encoded types. We have logic in |
Does it make sense to make an Expression equivalent of |
I belive this would be resolved by forwarding the
To me this seems like a bug in the upstream datafusion implementation. Would it not be better to address the bug there? e.g make that implementation have correct behavior around dictionary encoded types.
Seems like that would cause an unnecessary copy in the case of the |
I figured it would be simple, I can look into at least fixing that.
I agree this is mostly a DataFusion bug, but it's at least partially a comet thing for choosing to make the dictionaries in the first place. Mostly I guess I'm asking if it's worth coming up with a workaround for this specific issue or more generally any dictionary related issue that will let things work until DataFusion handles things correctly, which might be non-trivial to fix.
I agree that is the best case scenario. Since doing the unpacking as part of creating the array data structure might be a non-trivial fix (at least for someone like me who's just learning about DataFusion), is it worth making a way to opt-in to a workaround that will pre-unravel a dictionary before sending it into a DataFusion function that Comet expressions can opt-in to until a better more permanent fix is figured out. |
Ok I learned a little more about DataFusion so I think I understand what the options are now. ScalarUDFs support type coercion, which will automatically cast each expression to the right type, and that gets inserted in analysis. Because of this, there's nothing technically "wrong" or buggy in DataFusion. Possibly just sub-optimal of doing a cast instead of smartly handling mixed types. Obviously Comet isn't using the DataFusion analyzer, so that will never happen automatically here. And the type coercion theoretically handles the differences between dictionaries/non-dictionaries. Based on this, it seems like there's two options:
|
Describe the bug
Discovered this working on new array functions. DataFusion's
make_array
doesn't play nice with underlying Dictionary types. Kinda yet another issue related to apache/datafusion#11513 imo.Steps to reproduce
Two separate cases that are easy to recreate with existing unit test setup:
produces
Letting DataFusion infer the return type instead of specifying it results in
which seems like an internal Comet issue? Haven't dug into this but presumably is fixable.
And doing a mixed dictionary/non-dictonary like
produces
in datafusion/functions-nested/src/make_array.rs:231
Expected behavior
Not sure what the expected behavior or fix is. Either implement this function from scratch with better dictionary handling, or add some wrapper around invoking the UDF to flatten dictionary encoded arrays
Additional context
No response
The text was updated successfully, but these errors were encountered: