Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

Replace dictionary proxies with nested dictionaries 01/N #661

Merged
merged 1 commit into from
Sep 18, 2023

Conversation

ienkovich
Copy link
Contributor

This patch series aims to remove the StringDictionaryProxy class by extending StringDictionary functionality to cover all use cases of the proxy class. The reason is that the dictionary encoding used for ResultSets is incompatible with the dictionary encoding used by ArrowStorage and expected by execution kernels. It means we simply cannot use dictionary-encoded columns from ResultSets and need to convert them (currently, ResultSetRegistry lacks such functionality, see #588). Instead of implementing such a conversion, I want to change the coding so that we wouldn't need to convert anything.

There are a few major changes required to achieve this:

  1. Stop using negative indices in dictionary-encoded columns in ResultSets. It is possible because proxies already use dictionary generations, i.e. max index value that can be used to address the base dictionary strings. Therefore we can simply use higher values to encode strings owned by the proxy.
  2. Enable deep nesting levels. Since we can use execution results as a query input, we need the ability to use an existing dictionary hierarchy as a base for a new dictionary. This means either proxy and dictionary should have the same API, or we simply use a single class (I chose to use StringDictionary only)
  3. Switch to a single API dictionary in storage and execution to smoothly run queries on execution results

I try to make changes in many small steps to simplify the review process. So, many patches are very trivial refactoring or a small part of the new functionality.

This patch removes unused StringDictionaryProxy code.

@@ -534,19 +519,6 @@ class StringLocalCallback : public StringDictionary::StringCallback {
}
};

// Union strings from both StringDictionaryProxies into *this as transients.
// Return id_map: sdp_rhs:string_id -> this:string_id for each string in sdp_rhs.
StringDictionaryProxy::IdMap StringDictionaryProxy::transientUnion(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised that this is unused.

@ienkovich ienkovich merged commit 54dbc06 into main Sep 18, 2023
@ienkovich ienkovich deleted the ienkovich/nested-dict-01 branch September 18, 2023 22:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants