[WIP] Add FilterCompiler to SPI#12697
Closed
mbasmanova wants to merge 66 commits into
Closed
Conversation
We already use "<UNPARTITIONED>" for unpartitioned table input, and this commit would unify the representation of unpartitioned table in output as well.
Unify duplicate-finder-maven-plugin overrides
…tion name lookupFunction is mainly used by tests and optimization rules to quickly locate a known function. It should only work for internal functions. There's no need to use a QualifiedName. Other function namespaces should not have this interface.
Add support for readPreferenceTags that lead the mongodb connector to read a specific sharded cluster to configuration properties. The properties are split tag sets as a character '&' and specified a tag set as a comma-separated list of colon-separated key-value paris. For example, mongodb.read-preference-tags=dc:east,use:reporting&use:reporting
Previous OrcFileWriter is not columnarized. Rename it to OrcRecordWriter. Abstract out FileWriter for preparation of introducing new Orc writer.
The optimized ORC writer will serve as rewriter for Raptor. However it needs the schema of the table in order to read/write data. Encode the schema info in splits.
bindChannels is to replace toRowExpression in LocalExecutionPlanner. It generates channel info for expressions as well as optimizes them.
Also, mark JDBC_ERROR as retryable.
Avoid using anonymous row type, as this type cannot be serialized by the Hive connector that is used as a temporary table connector for exchange materialization
line length reflecting PR feedback on wording
Presto builds hashtable for MapBlocks eagerly when constructing the MapBlock even it's not needed in the query. Building a hashtable could take up to 30% CPU of the scan cost on a map column. This commit defers the hashtable build to the time it's needed in SeekKey(). Note that we only do this to the MapBlock, not the MapBlockBuilder to avoid complex synchronization problems. The MapBlockBuilder will always build the hashtable.
Subfield represents a column or a nested subfield of an array, map or struct. Subfield consists of a column name and a path to the subfield. Each element of a path is either a field name of a struct of a subscript of an array or a map. Here are some examples: a, a[2], a["xxx"], a.b.c, a.b[4].c. Subfield will be used to specify a set of subfields that are referenced by the query to allow connector to prune unused subfields. Subfield will also be used by the Hive connector to extract simple (TupleDomain) filters (e.g. a[2] > 10) that can be applied while decoding ORC data.
38caf92 to
f901c5a
Compare
f901c5a to
602332e
Compare
602332e to
eccfc9e
Compare
Contributor
Author
|
Superseded by #12729 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.