Skip to content

Pushdown DistinctLimitNode in Pinot Connector#14863

Merged
highker merged 1 commit intoprestodb:masterfrom
xiangfu0:pinot_distinct_limit_node_pushdown
Jul 24, 2020
Merged

Pushdown DistinctLimitNode in Pinot Connector#14863
highker merged 1 commit intoprestodb:masterfrom
xiangfu0:pinot_distinct_limit_node_pushdown

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Jul 21, 2020

Push down DistinctLimitNode to Pinot Query.

For Presto query: SELECT DISTINCT flightnum FROM airlinestats LIMIT 10,
We will pushdown below query to Pinot:

  • SQL format: SELECT FlightNum FROM airlineStats GROUP BY FlightNum LIMIT 10.
  • PQL format: SELECT count(*) FROM airlineStats GROUP BY FlightNum TOP 10.

Below is the generated query plan for SQL mode.

presto:default> explain select distinct flightnum from airlinestats limit 10;
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 - Output[flightnum] => [flightnum:integer]
         Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
     - RemoteStreamingExchange[GATHER] => [flightnum:integer]
             Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
         - TableScan[TableHandle {connectorId='pinot', connectorHandle='PinotTableHandle{connectorId=pinot, schemaName=default, tableName=airlineStats, isQueryShort=Optional[true], expectedColumnHandles=Optional[[PinotColumnHandle{columnName=FlightNum, dataType=integer, type=REGULAR}]], pinotQuery=Optional[GeneratedPinotQuery{query=SELECT FlightNum FROM airlineStats GROUP BY FlightNum LIMIT 10, format=SQL, table=airlineStats, expectedColumnIndices=[], groupByClauses=1, haveFilter=false, isQueryShort=true}]}', layout='Optional[PinotTableHandle{connectorId=pinot, schemaName=default, tableName=airlineStats, isQueryShort=Optional[true], expectedColumnHandles=Optional[[PinotColumnHandle{columnName=FlightNum, dataType=integer, type=REGULAR}]], pinotQuery=Optional[GeneratedPinotQuery{query=SELECT FlightNum FROM airlineStats GROUP BY FlightNum LIMIT 10, format=SQL, table=airlineStats, expectedColumnIndices=[], groupByClauses=1, haveFilter=false, isQueryShort=true}]}]'}] => [flightnum:integer]
                 Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
                 flightnum := PinotColumnHandle{columnName=FlightNum, dataType=integer, type=REGULAR}

Below is the generated query plan for PQL mode.

presto:default> explain select distinct flightnum from airlinestats limit 10;

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 - Output[flightnum] => [flightnum:integer]
         Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
     - RemoteStreamingExchange[GATHER] => [flightnum:integer]
             Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
         - TableScan[TableHandle {connectorId='pinot', connectorHandle='PinotTableHandle{connectorId=pinot, schemaName=default, tableName=airlineStats, isQueryShort=Optional[true], expectedColumnHandles=Optional[[PinotColumnHandle{columnName=FlightNum, dataType=integer, type=REGULAR}]], pinotQuery=Optional[GeneratedPinotQuery{query=SELECT count(*) FROM airlineStats GROUP BY FlightNum TOP 10, format=PQL, table=airlineStats, expectedColumnIndices=[0, -1], groupByClauses=1, haveFilter=false, isQueryShort=true}]}', layout='Optional[PinotTableHandle{connectorId=pinot, schemaName=default, tableName=airlineStats, isQueryShort=Optional[true], expectedColumnHandles=Optional[[PinotColumnHandle{columnName=FlightNum, dataType=integer, type=REGULAR}]], pinotQuery=Optional[GeneratedPinotQuery{query=SELECT count(*) FROM airlineStats GROUP BY FlightNum TOP 10, format=PQL, table=airlineStats, expectedColumnIndices=[0, -1], groupByClauses=1, haveFilter=false, isQueryShort=true}]}]'}] => [flightnum:integer]
                 Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
                 flightnum := PinotColumnHandle{columnName=FlightNum, dataType=integer, type=REGULAR}

(1 row)
== RELEASE NOTES ==

Pinot Changes
* Pushdown DistinctLimitNode to Pinot Query in SQL mode.

@xiangfu0 xiangfu0 changed the title Adding DistinctLimit pushdown support in Pinot Connector Pushdown DistinctLimitNode in Pinot Connector Jul 21, 2020
@xiangfu0
Copy link
Contributor Author

@agrawaldevesh Could you help review this PR, thanks!

@xiangfu0 xiangfu0 force-pushed the pinot_distinct_limit_node_pushdown branch from 381af86 to 9576843 Compare July 21, 2020 06:02
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still trying to understand the implementation better -- some very high level questions inline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have forgotten your old Pinot-SQL PR :-) -- but didn't we have a special derived class for the SQL generation ? Instead of using if useSqlSyntax

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for pushing down a new PlanNode: DistinctLimitPlanNode.
Query SELECT DISTINCT colA will be parsed to an AggregationPlanNode,
Query SELECT DISTINCT colA LIMIT X will be parsed to a DistinctLimitPlanNode.
For PQL side, since we still need to use the hidden column count(*), so I use withAggregation(...) for setting.
For SQL side, use withDistinctLimit to set groupbycolumns and limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PinotQueryGeneratorContext.withAggregation has some special handling for the case when distinctCount is already pushed down. (Search for PINOT_DISTINCT_COUNT_FUNCTION_NAME). Can you make sure that this doesn't conflict with that. Thanks

@xiangfu0 xiangfu0 force-pushed the pinot_distinct_limit_node_pushdown branch 3 times, most recently from 2f904ba to f26ce01 Compare July 21, 2020 17:56
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting very close

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PinotQueryGeneratorContext.withAggregation has some special handling for the case when distinctCount is already pushed down. (Search for PINOT_DISTINCT_COUNT_FUNCTION_NAME). Can you make sure that this doesn't conflict with that. Thanks

@xiangfu0 xiangfu0 force-pushed the pinot_distinct_limit_node_pushdown branch from f26ce01 to 81e5af0 Compare July 21, 2020 20:51
@xiangfu0
Copy link
Contributor Author

distinctCount pushdown won't be triggered by this. It's on top of count on a column with distinct mark.

@xiangfu0 xiangfu0 force-pushed the pinot_distinct_limit_node_pushdown branch from 81e5af0 to 02431aa Compare July 21, 2020 23:38
Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !. Thanks for iterating on this !

cc: @zhenxiao @highker @sachdevs Please have a look, adding you guys as reviewers.

@xiangfu0
Copy link
Contributor Author

LGTM !. Thanks for iterating on this !

cc: @zhenxiao @highker @sachdevs Please have a look, adding you guys as reviewers.

Thanks @agrawaldevesh

@xiangfu0 xiangfu0 force-pushed the pinot_distinct_limit_node_pushdown branch from 02431aa to bc25ce1 Compare July 22, 2020 04:56
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@highker highker self-assigned this Jul 23, 2020
@xiangfu0 xiangfu0 force-pushed the pinot_distinct_limit_node_pushdown branch 3 times, most recently from 29047db to cd30cbc Compare July 23, 2020 20:49
@xiangfu0 xiangfu0 force-pushed the pinot_distinct_limit_node_pushdown branch from cd30cbc to 4d05ea7 Compare July 24, 2020 00:14
@highker highker merged commit 6a55eb4 into prestodb:master Jul 24, 2020
@xiangfu0 xiangfu0 deleted the pinot_distinct_limit_node_pushdown branch July 24, 2020 05:39
@caithagoras caithagoras mentioned this pull request Jul 28, 2020
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants