Adding presto-pinot with pushdown working#13504
Conversation
9ebfaa7 to
ae9bb18
Compare
ab074c2 to
563b1a5
Compare
|
I think this PR is ready enough to start to be reviewed. It depends on #13521. It also subsumes #13413 and has implemented all of the changes suggested in there. (cc: @haibow) I know it is kind of big .. it has almost no changes to the presto-main except:
The build will fail until https://github.com/agrawaldevesh/presto-pinot-driver can be pushed to mavencentral and the presto-pinot-driver version upgraded to 0.1.0. I am working with @wenleix on that. I am also happy to meet in person and give a code overview. But a good place to start is from the PinotConnectorPlanOptimizer.java file. |
highker
left a comment
There was a problem hiding this comment.
High-level comments only. The dependency looks much cleaner!
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorId.java
Outdated
Show resolved
Hide resolved
...ot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
...ot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-plugin/src/main/java/com/facebook/presto/pinot/PinotPlugin.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnector.java
Outdated
Show resolved
Hide resolved
...-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotPageSourceProvider.java
Outdated
Show resolved
Hide resolved
...-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotPageSourceProvider.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotSplit.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConfig.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnection.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnection.java
Outdated
Show resolved
Hide resolved
...to-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorFactory.java
Outdated
Show resolved
Hide resolved
...to-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorFactory.java
Outdated
Show resolved
Hide resolved
563b1a5 to
de579bf
Compare
|
It is ok to have many top-level modules |
de579bf to
0bca794
Compare
sachdevs
left a comment
There was a problem hiding this comment.
Thanks for the contribution @agrawaldevesh! A couple of high level comments to get started:
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Outdated
Show resolved
Hide resolved
0bca794 to
284b4de
Compare
highker
left a comment
There was a problem hiding this comment.
Finished reviewing classes PinotExpression, PinotFilterExpressionConverter, PinotProjectExpressionConverter, and PinotQueryGenerator
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotExpression.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGenerator.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGenerator.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGenerator.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGenerator.java
Outdated
Show resolved
Hide resolved
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotSegmentPageSource.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotSessionProperties.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotConfig.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/test/java/com/facebook/presto/pinot/MetadataUtil.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PushdownUtils.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PushdownUtils.java
Outdated
Show resolved
Hide resolved
284b4de to
34e8204
Compare
highker
left a comment
There was a problem hiding this comment.
@agrawaldevesh, there are still some amount of comments not addressed? Could you go through those in https://github.com/prestodb/presto/pull/13504/files ? Make sure to open each file and exam the comment.
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotExpression.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
highker
left a comment
There was a problem hiding this comment.
There are many nits and small coding style to be fixed. I only skimmed through the files rather than digging into each individual. @agrawaldevesh, could do a thorough exam on all the files to make sure the coding style is right based on the existing comments? Thanks!
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Outdated
Show resolved
Hide resolved
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Outdated
Show resolved
Hide resolved
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Outdated
Show resolved
Hide resolved
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotMetricsStat.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotSessionProperties.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Hmm ... something to discuss about: I want to use these for presto-aresdb connector too. We originally had them in presto-spi as a part of our fork. Is there a designated common module that connectors can share code in ? Or I can always have the presto-aresdb-toolkit depend on presto-pinot-toolkit but that would look weird.
It would be funny to create another module named presto-pinot-aresdb-utils to house these :)
But I can rename them to something pinot specific for now but this is generic.
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PushdownUtils.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PushdownUtils.java
Outdated
Show resolved
Hide resolved
9661d49 to
1830602
Compare
|
@highker ... I just did a pass over the entire patch and I think you will find that a lot of your common nits have been addressed :-) |
hellium01
left a comment
There was a problem hiding this comment.
Some initial comments.
There was a problem hiding this comment.
Another way is to add a getStage() method in ConnectorPlanOptimizer but current design is fine given we won't (probably shouldn't) have more than two stages.
...t-toolkit/src/test/java/com/facebook/presto/pinot/query/TestPinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This comment might not be necessary?
...t-toolkit/src/test/java/com/facebook/presto/pinot/query/TestPinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
testPartialPredicatePushdown?
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotErrorCode.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Do we want to put parenthesis around sub clauses:
format("( (%s) %s (%s))", ...)
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
| int aggregations = 0; | ||
| boolean groupByExists = false; | ||
|
|
||
| for (AggregationColumnNode expression : computeAggregationNodes(node).orElseThrow(() -> new PinotException(PINOT_UNSUPPORTED_EXPRESSION, Optional.empty(), "Unsupported aggregation node " + node))) { |
There was a problem hiding this comment.
nit: instead of lengthy computeAggregationNodes orElseThrow, can we just throw inside computeAggregationNode?
highker
left a comment
There was a problem hiding this comment.
The failure is related. Could you fix it really quick?
highker
left a comment
There was a problem hiding this comment.
@agrawaldevesh, could you update your commit message body with the following text? (Basically I just wrap your words into 70 chars per line and removed some out-of-date statement)
Adding presto-pinot with pushdown
This presto-pinot connector works as follows:
* It builds upon the new plan connector optimizer framework to push the
maximal query to the pinot system.
* By default, aggregation and limit queries are send to the pinot broker,
while just filter queries are send to the servers parallelly. This is
useful when joining against another (say hive) table wherein the join
needs to be parallelized and segments from pinot (fetched directly from
the pinot servers) need to be joined with hive splits. If you wish to
not send any queries to the broker, please set the catalog session
property prefer_broker_queries to false.
* You can disable parallel queries (useful for low latency deployment)
we do at Uber, by setting the catalog property
forbid_segmnt_queries=false.
* The connector needs a way to talk to the controller (to fetch the
brokers and the table schema). You can either provide the controller
URI or the URI of the Muttley proxy (an RPC proxy mechanism we use at
Uber). It is also possible to use the pinot-rest-proxy instead of
going to the pinot-broker (again something we use for our test pinot
clusters at Uber).
In addition the pinot connector would split the filter expressions such
that expressions that can be pushed down to pinot are done so, and those
that aren't are handled in presto.
It handles the full range of pinot PQL and respects pinot's weirdnesses
to guarantee that all results are still in the expected SQL semantics.
Usage:
Place the catalog file like so in the etc/catalog directory. The schema
name is immaterial and it can be anything. (since pinot does not have a
notion of schema). Example catalog files can be found below:
$ cat local_run/etc/catalog/pinotnotparallel.properties
connector.name=pinot
pinot.controller-url=controller_host:9000
pinot.scan-parallelism-enabled=false
$ cat local_run/etc/catalog/pinotparallel.properties
connector.name=pinot
pinot.controller-url=controller_host:9000
pinot.scan-parallelism-enabled=true
1830602 to
e8b3174
Compare
e8b3174 to
af6bb6a
Compare
This presto-pinot connector works as follows: * It builds upon the new plan connector optimizer framework to push the maximal query to the pinot system. * By default, aggregation and limit queries are send to the pinot broker, while just filter queries are send to the servers parallelly. This is useful when joining against another (say hive) table wherein the join needs to be parallelized and segments from pinot (fetched directly from the pinot servers) need to be joined with hive splits. If you wish to not send any queries to the broker, please set the catalog session property prefer_broker_queries to false. * You can disable parallel queries (useful for low latency deployment) we do at Uber, by setting the catalog property forbid_segment_queries=false. * The connector needs a way to talk to the controller (to fetch the brokers and the table schema). You can either provide the controller URI or the URI of the Muttley proxy (an RPC proxy mechanism we use at Uber). It is also possible to use the pinot-rest-proxy instead of going to the pinot-broker (again something we use for our test pinot clusters at Uber). In addition the pinot connector would split the filter expressions such that expressions that can be pushed down to pinot are done so, and those that aren't are handled in presto. It handles the full range of pinot PQL and respects pinot's weirdnesses to guarantee that all results are still in the expected SQL semantics. Usage: See the newly added connector/pinot.rst documentation file.
af6bb6a to
95b935b
Compare
|
Tested and fixed. Good to be merged in. |
…o's date_trunc This will only be called by Presto connector (prestodb/presto#13504) and it has identically semantics to presto's SQL date_trunc, albeit with a timezone specialization. Please see details on presto's date_trunc function here: https://prestodb.github.io/docs/current/functions/datetime.html This is needed so that the presto's date_trunc invocations can be faithfully translated as is to this new function. Its a new function so that it is trivial to roll out to harmlessly without a lot of regression testing. Without this function, we cannot handle timezones nor week truncations in the existing pinot's dateTimeConvert function. It basically copies the PrestoDB code: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/DateTimeFunctions.java, (specializations of date_trunc for TIMESTAMP and TIMESTAMP_WITH_TIME_ZONE I am even checking in the zone-index.properties used by presto to ensure that even the time zones are 1:1 b/w presto and this function. (sync'd to the latest prestodb/presto repo) Understanding this UDF requires knowledge of the joda-time API. I am not documenting this heavily since it is a copy of the Presto UDF.
…o's date_trunc (#4740) This will only be called by Presto connector (prestodb/presto#13504) and it has identically semantics to presto's SQL date_trunc, albeit with a timezone specialization. Please see details on presto's date_trunc function here: https://prestodb.github.io/docs/current/functions/datetime.html This is needed so that the presto's date_trunc invocations can be faithfully translated as is to this new function. Its a new function so that it is trivial to roll out to harmlessly without a lot of regression testing. Without this function, we cannot handle timezones nor week truncations in the existing pinot's dateTimeConvert function. It basically copies the PrestoDB code: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/DateTimeFunctions.java, (specializations of date_trunc for TIMESTAMP and TIMESTAMP_WITH_TIME_ZONE I am even checking in the zone-index.properties used by presto to ensure that even the time zones are 1:1 b/w presto and this function. (sync'd to the latest prestodb/presto repo) Understanding this UDF requires knowledge of the joda-time API. I am not documenting this heavily since it is a copy of the Presto UDF.
…o's date_trunc (apache#4740) This will only be called by Presto connector (prestodb/presto#13504) and it has identically semantics to presto's SQL date_trunc, albeit with a timezone specialization. Please see details on presto's date_trunc function here: https://prestodb.github.io/docs/current/functions/datetime.html This is needed so that the presto's date_trunc invocations can be faithfully translated as is to this new function. Its a new function so that it is trivial to roll out to harmlessly without a lot of regression testing. Without this function, we cannot handle timezones nor week truncations in the existing pinot's dateTimeConvert function. It basically copies the PrestoDB code: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/DateTimeFunctions.java, (specializations of date_trunc for TIMESTAMP and TIMESTAMP_WITH_TIME_ZONE I am even checking in the zone-index.properties used by presto to ensure that even the time zones are 1:1 b/w presto and this function. (sync'd to the latest prestodb/presto repo) Understanding this UDF requires knowledge of the joda-time API. I am not documenting this heavily since it is a copy of the Presto UDF.
…o's date_trunc (apache#4740) This will only be called by Presto connector (prestodb/presto#13504) and it has identically semantics to presto's SQL date_trunc, albeit with a timezone specialization. Please see details on presto's date_trunc function here: https://prestodb.github.io/docs/current/functions/datetime.html This is needed so that the presto's date_trunc invocations can be faithfully translated as is to this new function. Its a new function so that it is trivial to roll out to harmlessly without a lot of regression testing. Without this function, we cannot handle timezones nor week truncations in the existing pinot's dateTimeConvert function. It basically copies the PrestoDB code: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/DateTimeFunctions.java, (specializations of date_trunc for TIMESTAMP and TIMESTAMP_WITH_TIME_ZONE I am even checking in the zone-index.properties used by presto to ensure that even the time zones are 1:1 b/w presto and this function. (sync'd to the latest prestodb/presto repo) Understanding this UDF requires knowledge of the joda-time API. I am not documenting this heavily since it is a copy of the Presto UDF.
Adding presto-pinot with pushdown
This presto-pinot connector works as follows:
maximal query to the pinot system.
while just filter queries are send to the servers parallelly. This is
useful when joining against another (say hive) table wherein the join
needs to be parallelized and segments from pinot (fetched directly from
the pinot servers) need to be joined with hive splits. If you wish to
not send any queries to the broker, please set the catalog session
property prefer_broker_queries to false.
we do at Uber, by setting the catalog property
forbid_segment_queries=false.
brokers and the table schema). You can either provide the controller
URI or the URI of the Muttley proxy (an RPC proxy mechanism we use at
Uber). It is also possible to use the pinot-rest-proxy instead of
going to the pinot-broker (again something we use for our test pinot
clusters at Uber).
In addition the pinot connector would split the filter expressions such
that expressions that can be pushed down to pinot are done so, and those
that aren't are handled in presto.
It handles the full range of pinot PQL and respects pinot's weirdnesses
to guarantee that all results are still in the expected SQL semantics.
Usage: See the newly added connector/pinot.rst documentation file.