Skip to content

Adding Pinot SQL endpoint support#14704

Merged
highker merged 1 commit intoprestodb:masterfrom
xiangfu0:pinot-connector-new-sql
Jul 17, 2020
Merged

Adding Pinot SQL endpoint support#14704
highker merged 1 commit intoprestodb:masterfrom
xiangfu0:pinot-connector-new-sql

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Jun 23, 2020

The motivation of this PR to add Pinot SQL endpoint support is that Pinot Community is moving to SQL syntax/endpoint on query side to support more features and also plan to deprecate PQL queries.

This PR:

  • Support Pinot new SQL endpoint for broker queries.
  • Enable this feature by settingpinot.use-pinot-sql-for-broker-queries=true.
  • Support aggregation/group by/oder by pushdown to query Pinot broker.
== RELEASE NOTES ==

Pinot Changes
* Add Pinot SQL endpoint support.

@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch 5 times, most recently from 130a8e7 to 5671504 Compare June 24, 2020 04:59
@agrawaldevesh
Copy link
Contributor

Awesome improvement ! Can you please add some more details about the design/approach in the PR description. It would help with reviewing since the PR is rather large. Thanks.

Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't looked at the tests

@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch from c183758 to bc33a1a Compare June 25, 2020 10:23
@xiangfu0 xiangfu0 requested a review from agrawaldevesh June 25, 2020 10:33
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of the comments are still yet to be addressed ? Also, it would be great to call out in the PR description what the general approach was and what parts are like semantically new vs just code movement.

Thanks ! (Getting really really close !)

@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch from e86d1b2 to a1ce7cf Compare July 1, 2020 11:24
@xiangfu0 xiangfu0 requested a review from agrawaldevesh July 2, 2020 12:15
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments on beefing up the test coverage to cover the salient differences b/w the SQL and the PQL end points.

Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions on your responses please :-)

Also, I think we should update the PR description to add more motivation on why we are interested in moving to the SQL end point. Perhaps you can say that the PQL end point is going to be deprecated or something to that effect ?

Thanks !

@xiangfu0 xiangfu0 requested a review from agrawaldevesh July 6, 2020 11:25
@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch 4 times, most recently from c0c4aa5 to 58fccab Compare July 7, 2020 10:34
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much much better testing !

A few nits on comments on stuff.

But my biggest remaining question is around my lack of understanding of the new outputs field in the PinotQueryGeneratorContext: Why it can't be used for PQL and the special handling needed for it in visitProject. Please see the comments inline.

@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch from 58fccab to 433db34 Compare July 7, 2020 22:52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like line 319 above, should this also be Map<> newSelections = new HashMap<> ? (instead of new LinkedHashMap)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the reason is that in our PQL generation test, we create AggregationPlanNode by setting the Aggregations directly, if we use HashMap then generatedPql cannot guarantee the ordering of aggregation functions. Otherwise we need to set a ProjectNode always outside an AggregationPlanNode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets please fix the tests instead of complicating the production code like this. Even if it means more changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that in visitAggregation method, I didn't explicitly generate outputs, and infer it when calling context.withAggregation by using outputs.addAll(newSelection.keySet()). This is fine when selection is an ordered map.
So the change I made is to generate outputs in visitAggregation method.

@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch from 433db34 to c2c5eda Compare July 10, 2020 00:42
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to look really good. Can you also please mark the resolved comments as such. Its getting hard to review the incremental changes esp because Github doesn't deal with force pushes that nicely :-)

I think my only remaining comment here is about not having to use a LinkedHashMap for the selections just to make the test code pass. I would rather we change the test code.

@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch from c2c5eda to 4698ab7 Compare July 10, 2020 09:32
@xiangfu0
Copy link
Contributor Author

Starting to look really good. Can you also please mark the resolved comments as such. Its getting hard to review the incremental changes esp because Github doesn't deal with force pushes that nicely :-)

I think my only remaining comment here is about not having to use a LinkedHashMap for the selections just to make the test code pass. I would rather we change the test code.

The issue is that in visitAggregation method, I didn't explicitly generate outputs, and infer it when calling context.withAggregation by using outputs.addAll(newSelection.keySet()). This is fine when selection is an ordered map.
So the change I made is to generate outputs in visitAggregation method.

The best of it is NO test code changes!

@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch 2 times, most recently from 932461e to 374a419 Compare July 10, 2020 10:49
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just last few nits. LGTM otherwise !

@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch 2 times, most recently from e5ebdb2 to b752cd9 Compare July 10, 2020 23:03
@xiangfu0
Copy link
Contributor Author

Just last few nits. LGTM otherwise !

Many many thanks!!!

Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@highker, @sachdevs please help review this PR.

@fx19880617 and I have done several passes over this and I am pretty satisfied with the end result: The necessary refactoring needed to support both PQL and SQL is clean, in that it tries to share maximal code and the new abstractions introduced also make more sense. There is great test coverage to boot.

Thanks.

@highker highker requested a review from zhenxiao July 12, 2020 06:51
@highker
Copy link

highker commented Jul 12, 2020

@zhenxiao, could you help to take a look at the PR also?

@highker
Copy link

highker commented Jul 15, 2020

@fx19880617, can we squash all commits into one?

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@highker highker self-assigned this Jul 15, 2020
Copy link
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work @fx19880617
looks good to me
a few small things

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible we make counter a private data member, and return counter here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem is that the Broker page source is one shot: It does not have a notion of partial or a cursor: Either the number of completed positions is zero or it is the entire number of rows being returned.

The current usage of completed positions (if you search for the callers of this function), is to basically show a progress indicator while the query is running. That does not fit the broker page source semantics: Because the counter will only change basically when the query is finished (ie when the counter is numRows, then the page source is already done and the query is being torn down)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think counter is completed positions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replied above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could outputs be Set? LinkedHashSet is an implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would very much like to keep it as a LinkedHashSet ... I can't tell you the number of times this has bit me: If order is needed then it should be statically checked.

@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch from b752cd9 to e9b1cee Compare July 16, 2020 01:22
@highker
Copy link

highker commented Jul 17, 2020

@fx19880617, could you squash the commits and push? I will merge afterwards.

- Enable this feature by setting `pinot.use-pinot-sql-for-broker-queries=true`
- Support aggregation/group by/oder by pushdown to query Pinot broker
@xiangfu0 xiangfu0 force-pushed the pinot-connector-new-sql branch from e9b1cee to 5948ba3 Compare July 17, 2020 07:05
@xiangfu0
Copy link
Contributor Author

@fx19880617, could you squash the commits and push? I will merge afterwards.

Done! Many Thanks!!!

@highker highker merged commit 589325b into prestodb:master Jul 17, 2020
@xiangfu0 xiangfu0 deleted the pinot-connector-new-sql branch July 17, 2020 20:08
@caithagoras caithagoras mentioned this pull request Jul 28, 2020
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants