-
Notifications
You must be signed in to change notification settings - Fork 1.1k
arrow-ipc: Default to not preserving dict IDs #6788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The integration tests seem to be legitimately failing on this |
|
Is this the appropriate docs page to look at for trying to reproduce this locally? https://github.com/apache/arrow-rs/tree/main/arrow-integration-testing |
|
So diving in, it looks like ipc, and c-data work fine, it's just flight. And surprisingly even rust-to-rust seems broken, which is what I'm going to start with, by adding more tests to arrow-flight. |
63a1952 to
75ceb39
Compare
Previously the integration tests forced preserving dict IDs in some places and used the default in others. This worked fine previously because preserving dict IDs used to be the default, but it isn't anymore.
75ceb39 to
206f7f4
Compare
tustvold
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, I've labelled it an API change so it is rendered as a breaking change in the changelog
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also makes sense to me too
|
Thank you @brancz 🙏 |
|
Thank you!! |
* arrow-ipc: Default to not preserving dict IDs * arrow-integration-testing: Adapt to using default settings Previously the integration tests forced preserving dict IDs in some places and used the default in others. This worked fine previously because preserving dict IDs used to be the default, but it isn't anymore.
* arrow-ipc: Default to not preserving dict IDs * arrow-integration-testing: Adapt to using default settings Previously the integration tests forced preserving dict IDs in some places and used the default in others. This worked fine previously because preserving dict IDs used to be the default, but it isn't anymore. Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Which issue does this PR close?
Related to #5981
Rationale for this change
This is the first step towards removing the
dict_idfield as discussed in #5981. With this patch the default behavior changes to what the behavior will be once the field is fully removed.The previous behavior can still be restored by passing
with_preserve_dict_id(true), however, doing so is now deprecated and will be removed together with thedict_idin the next (March) DataFusion release.What changes are included in this PR?
Default to not preserving the dict ID from the schema field
dict_id.Are there any user-facing changes?
Not a breaking change to an API, but the default behavior changes.
@tustvold @alamb