-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10121: [C++] Fix emission of new dictionaries in IPC writer #8302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@wesm This is a first draft. It is functional (i.e. fixes the underlying issue) but doesn't try to detect deltas. |
cpp/src/arrow/ipc/writer.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this will populate last_dictionaries_ entries even when the length of the dictionary is zero? I recall there was some discussion about empty dictionaries (and whether they need to be written at all) at the start of a stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we can skip zero-length dictionaries explicitly, but won't it confuse the IPC reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if skipped dictionaries will work, but they are permitted by the format, so it should probably be investigated in a follow up issue
c414eec to
cd759d0
Compare
|
I still need to add tests for this (I'll do that once #8309 is merged). Detecting and writing delta dictionaries is not implemented, but that's a feature, not a bug, so it may also wait for another PR. |
cd759d0 to
3e5e6df
Compare
When a dictionary changes from the previous batch, we should re-emit it.
3e5e6df to
4e5dac1
Compare
|
I can take a final look this morning. |
cpp/src/arrow/ipc/writer.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if skipped dictionaries will work, but they are permitted by the format, so it should probably be investigated in a follow up issue
When a dictionary changes from the previous batch, it is emitted again in the IPC stream.
If this happens when writing the IPC file format, an error is returned.