[WIP] Add FilterCompiler to SPI by mbasmanova · Pull Request #12697 · prestodb/presto

mbasmanova · 2019-04-19T03:46:42Z

No description provided.

We already use "<UNPARTITIONED>" for unpartitioned table input, and this commit would unify the representation of unpartitioned table in output as well.

Unify duplicate-finder-maven-plugin overrides

…tion name lookupFunction is mainly used by tests and optimization rules to quickly locate a known function. It should only work for internal functions. There's no need to use a QualifiedName. Other function namespaces should not have this interface.

Extracted-From: https://github.com/prestosql/presto

Add support for readPreferenceTags that lead the mongodb connector to read a specific sharded cluster to configuration properties. The properties are split tag sets as a character '&' and specified a tag set as a comma-separated list of colon-separated key-value paris. For example, mongodb.read-preference-tags=dc:east,use:reporting&use:reporting

Previous OrcFileWriter is not columnarized. Rename it to OrcRecordWriter. Abstract out FileWriter for preparation of introducing new Orc writer.

The optimized ORC writer will serve as rewriter for Raptor. However it needs the schema of the table in order to read/write data. Encode the schema info in splits.

bindChannels is to replace toRowExpression in LocalExecutionPlanner. It generates channel info for expressions as well as optimizes them.

Also, mark JDBC_ERROR as retryable.

Avoid using anonymous row type, as this type cannot be serialized by the Hive connector that is used as a temporary table connector for exchange materialization

line length reflecting PR feedback on wording

Presto builds hashtable for MapBlocks eagerly when constructing the MapBlock even it's not needed in the query. Building a hashtable could take up to 30% CPU of the scan cost on a map column. This commit defers the hashtable build to the time it's needed in SeekKey(). Note that we only do this to the MapBlock, not the MapBlockBuilder to avoid complex synchronization problems. The MapBlockBuilder will always build the hashtable.

Subfield represents a column or a nested subfield of an array, map or struct. Subfield consists of a column name and a path to the subfield. Each element of a path is either a field name of a struct of a subscript of an array or a map. Here are some examples: a, a[2], a["xxx"], a.b.c, a.b[4].c. Subfield will be used to specify a set of subfields that are referenced by the query to allow connector to prune unused subfields. Subfield will also be used by the Hive connector to extract simple (TupleDomain) filters (e.g. a[2] > 10) that can be applied while decoding ORC data.

mbasmanova · 2019-04-26T16:21:00Z

Superseded by #12729

shixuan-fan and others added 27 commits April 17, 2019 13:38

Remove redundant cast

3d9ec1c

Mark unpartitioned table in HiveWrittenPartitions

1f00cf2

We already use "<UNPARTITIONED>" for unpartitioned table input, and this commit would unify the representation of unpartitioned table in output as well.

Fix incorrect condition in ExchangeNode

95cbe76

Add DiscardingOutputBuffer

e6052c9

Postpone temporary tables split enumeration

db4b9a1

Support dependent stages in SqlQueryScheduler

e2608cd

Add exchange materialization tests

73a7cd5

Refactor presto-hive pom

fa3a615

Unify duplicate-finder-maven-plugin overrides

Run exchange materialization tests in a separate build

c386452

Reduce usage of resolveFunction

4727843

Add FunctionHandleResolver

9cbe203

Add abstraction to FunctionHandle

8634f9c

Rename PrestoNode to InternalNode

44a9c77

Extracted-From: https://github.com/prestosql/presto

Replace all usages of spi Node with InternalNode

869decb

Extracted-From: https://github.com/prestosql/presto

Deprecate Node.getHttpUri and reduce usages of Node.getHostAndPort

2e199b0

Extracted-From: https://github.com/prestosql/presto

Auto-resolve EXCEEDED_TIME_LIMIT regardless of CPU time increase

56722ff

Remove Raptor OrcStorageManager::appendRow

91599cf

Rename Raptor OrcFileWriter to OrcRecordWriter

41ee519

Previous OrcFileWriter is not columnarized. Rename it to OrcRecordWriter. Abstract out FileWriter for preparation of introducing new Orc writer.

Use single source of truth for ORC timezone

7bbd393

Add optimized ORC writer to Raptor FileWriter

8ab71a2

Abstract out Raptor FileRewriter

c070f5b

Add column types to Raptor split

28709c8

The optimized ORC writer will serve as rewriter for Raptor. However it needs the schema of the table in order to read/write data. Encode the schema info in splits.

Introduce OrcPageFileRewriter

8d643f4

Add optimized writer integration tests

5b3c7d0

Add testDelete for Raptor smoke tests

6727da1

mbasmanova added the aria Presto Aria performance improvements label Apr 19, 2019

mbasmanova requested review from elonazoulay and highker April 19, 2019 03:46

James Sun and others added 22 commits April 23, 2019 00:55

Extract bindChannels helper in LocalExecutionPlanner

abbac15

bindChannels is to replace toRowExpression in LocalExecutionPlanner. It generates channel info for expressions as well as optimizes them.

Support JdbcErrorCode to Verifier exception classifier

220c0fe

Also, mark JDBC_ERROR as retryable.

Add compression kind config for Raptor

87e0cca

Allow Raptor to compress data with ZSTD

eaf7e7b

Change approx_distinct aggregation intermediate type

3bd2160

Change min_by, max_by aggregations intermediate type

28e45a9

Avoid using anonymous row type, as this type cannot be serialized by the Hive connector that is used as a temporary table connector for exchange materialization

Change qdigest aggregations intermediate type

aa20c07

Move presto-server to assembly plugin

3e23829

Use nexus-staging-maven-plugin

2d900d0

Introduce RowExpressionRewriter

a32ceb5

Introduce RowExpressionNodeInliner

397c53c

Add partsupplier to H2QueryRunner

46f3605

Allow materializing partitioning inferred by join

0b5ce5c

Closes prestodb#12708 -- Clarify split_to_map documentation

44a23c9

line length reflecting PR feedback on wording

Add release notes for 0.219

e7077bf

Detach Node from LateralJoinNode and ApplyNode

5f72687

Handle null constants for FilterStatsCalculator

0237e98

Add isCastFunction to StandardFunctionResolution

34d8c22

Allow custom expressions in DomainTranslator

0fe7da1

Introduce SubfieldExtractor

9dc505d

mbasmanova force-pushed the filter-compiler branch from 38caf92 to f901c5a Compare April 26, 2019 16:16

Introduce RowExpressionPredicateCompiler

da5ecd6

mbasmanova force-pushed the filter-compiler branch from f901c5a to 602332e Compare April 26, 2019 16:17

Add PredicateCompiler to ConnectorContext

eccfc9e

mbasmanova force-pushed the filter-compiler branch from 602332e to eccfc9e Compare April 26, 2019 16:18

mbasmanova closed this Apr 26, 2019

mbasmanova deleted the filter-compiler branch May 24, 2019 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add FilterCompiler to SPI#12697

[WIP] Add FilterCompiler to SPI#12697
mbasmanova wants to merge 66 commits into
prestodb:aria-scan-researchfrom
mbasmanova:filter-compiler

mbasmanova commented Apr 19, 2019

Uh oh!

mbasmanova commented Apr 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

mbasmanova commented Apr 19, 2019

Uh oh!

mbasmanova commented Apr 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants