Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Move "dictionary" member from DictionaryType to ArrayData to allow for changing dictionaries between Array chunks #19492

Closed
asfimport opened this issue Aug 29, 2018 · 4 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Aug 29, 2018

There are a couple of inter-related issues:

  • Cases where a system might send the schema without the dictionaries, and the user wishes to reason about the schema and its types without knowing the dictionary values

  • Dictionaries that are changing, e.g. using delta dictionary messages

    arrow::DictionaryType has no "linkage" to any external object. I propose adding a "LinkedDictionaryType" or something similar (purely a C++ construct), which functionally would be a subclass of DictionaryType, which would allow a type to be created which will obtain its dictionary later through some kind of "Dictionary provider" interface. There is something similar in Java already. This would allow a dictionary to evolve via delta dictionaries, or for a dictionary to be retrieved later e.g. through an RPC or IPC layer

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-3144. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
The context for this issue is the design of an Arrow-native RPC system (ie ARROW-249). A "get info" request may return the schema without the dictionaries (which could be large), and the dictionaries would be sent later when the dataset is actually requested. Without some improved solution at the C++ API level, we would be unable to deserialize the schema IPC message without the corresponding dictionary batches

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
See related discussion in patch for ARROW-554 #3165 (comment)

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
I suggest we handle dictionaries in Flight in the 0.14 release cycle

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
Issue resolved by pull request 4316
#4316

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants