Skip to content

[WIP] Add FilterCompiler to SPI#12697

Closed
mbasmanova wants to merge 66 commits into
prestodb:aria-scan-researchfrom
mbasmanova:filter-compiler
Closed

[WIP] Add FilterCompiler to SPI#12697
mbasmanova wants to merge 66 commits into
prestodb:aria-scan-researchfrom
mbasmanova:filter-compiler

Conversation

@mbasmanova
Copy link
Copy Markdown
Contributor

No description provided.

shixuan-fan and others added 27 commits April 17, 2019 13:38
We already use "<UNPARTITIONED>" for unpartitioned table input, and
this commit would unify the representation of unpartitioned table in
output as well.
Unify duplicate-finder-maven-plugin overrides
…tion name

lookupFunction is mainly used by tests and optimization rules to quickly
locate a known function. It should only work for internal functions.
There's no need to use a QualifiedName. Other function namespaces should
not have this interface.
Add support for readPreferenceTags that lead the mongodb connector to read a specific sharded cluster to configuration properties. The properties are split tag sets as a character '&' and specified a tag set as a comma-separated list of colon-separated key-value paris. For example, mongodb.read-preference-tags=dc:east,use:reporting&use:reporting
Previous OrcFileWriter is not columnarized. Rename it to
OrcRecordWriter. Abstract out FileWriter for preparation of introducing
new Orc writer.
The optimized ORC writer will serve as rewriter for Raptor. However it
needs the schema of the table in order to read/write data. Encode the
schema info in splits.
@mbasmanova mbasmanova added the aria Presto Aria performance improvements label Apr 19, 2019
James Sun and others added 22 commits April 23, 2019 00:55
bindChannels is to replace toRowExpression in LocalExecutionPlanner.
It generates channel info for expressions as well as optimizes them.
Also, mark JDBC_ERROR as retryable.
Avoid using anonymous row type, as this type cannot be serialized by the
Hive connector that is used as a temporary table connector for exchange
materialization
line length

reflecting PR feedback on wording
Presto builds hashtable for MapBlocks eagerly when constructing the
MapBlock even it's not needed in the query. Building a hashtable could
take up to 30% CPU of the scan cost on a map column. This commit defers
the hashtable build to the time it's needed in SeekKey(). Note that we
only do this to the MapBlock, not the MapBlockBuilder to avoid complex
synchronization problems. The MapBlockBuilder will always build the
hashtable.
Subfield represents a column or a nested subfield of an array, map or struct.
Subfield consists of a column name and a path to the subfield. Each element
of a path is either a field name of a struct of a subscript of an array or
a map. Here are some examples: a, a[2], a["xxx"], a.b.c, a.b[4].c.

Subfield will be used to specify a set of subfields that are referenced by the
query to allow connector to prune unused subfields. Subfield will also be used
by the Hive connector to extract simple (TupleDomain) filters (e.g. a[2] > 10)
that can be applied while decoding ORC data.
@mbasmanova
Copy link
Copy Markdown
Contributor Author

Superseded by #12729

@mbasmanova mbasmanova closed this Apr 26, 2019
@mbasmanova mbasmanova deleted the filter-compiler branch May 24, 2019 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aria Presto Aria performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants