Skip to content

Merge partition preference when adding exchange nodes#11262

Merged
jessesleeping merged 2 commits intoprestodb:masterfrom
jessesleeping:merge_partition_preference
Dec 19, 2019
Merged

Merge partition preference when adding exchange nodes#11262
jessesleeping merged 2 commits intoprestodb:masterfrom
jessesleeping:merge_partition_preference

Conversation

@jessesleeping
Copy link
Contributor

@jessesleeping jessesleeping commented Aug 14, 2018


Previously we only used merged partition preference when adding exchange for union. This commit allow us to merge partitioning preference when adding exchange for aggregation, window function and index join.

By merging partition preference we have chance to reduce data repartitions but may also result in increased skewness. We add a session property to control this behavior.


Current AddExchange behavior

When planning Aggregation, MarkDistinct, RowNumber and TopNRowNumber nodes, the current logic do the following:

  1. Calculate a PreferredProperties which merge the partition preference of the current node and its parent. (e.g. parent requires partitioning on (col_A), current node requires partitioning on (col_A, col_B) --> merged result is partitioning on (col_A)).
  2. Recursively call rewriter on child node, with the merged PreferredProperties (partition on (col_A)).
  3. Check if the returned partitioning property from child plan. If it's not what the current node requires ((col_A, col_B)), add a remote exchange on the current requirement ((col_A, col_B)).

This can result in adding extra remote exchange for certain query shape. For example:

Take the following aggregation query as an example:

SELECT COUNT(orderdate), custkey 
FROM (
    SELECT orderdate, custkey 
    FROM orders 
    GROUP BY orderdate, custkey
)
GROUP BY custkey

The previous plan has 2 repartition exchanges, partitioning on group by columns of the inner query and the outer query respectively:

OutputNode
    Aggregation (final, group by custkey)
        Exchange (repartition on custkey)        // Second exchange on custkey
            Aggregation (partial, group by custkey)
                Aggregation (final, group by custkey, orderdate)               
                   Exchange (repartition on custkey, orderdate)        // First exchange on custkey and orderdate
                        Aggregation (partial, group by custkey, orderdate)
                            TableScan (orders)

What this PR changes

The major confusion is between (1) passing the merged preference when planning child and (2) add exchange to the child plan based on unmerged preference. This PR added a session property to control this behavior only for Aggregation nodes.

  • With aggregation_partitioning_merging_strategy set to LEGACY, we preserve the previous behavior where we (1) pass the merged preference when planning child and (2) add exchange based on local preference.
  • With aggregation_partitioning_merging_strategy set to TOP_DOWN, we (1) pass the merged preference when planning child and (2) add exchange based on the merged preference.
  • With aggregation_partitioning_merging_strategy set to BOTTOM_UP, we (1) don't merge preference and (2) add exchange based on local preference.

For example, with partition_merging_strategy set to TOP_DOWN, the plan of the above example only has 1 remote exchange:

OutputNode
    Aggregation (single, group by custkey)
        Aggregation (final, group by custkey, orderdate)               
            Exchange (repartition on custkey)        // Here is the merged exchange
                Aggregation (partial, group by custkey, orderdate)
                    TableScan (orders)

Note that less remote exchange is not necessary always the optimal, it depends on the cardinality of partition columns.

Open discussions

  • With the current approach, we don't have an option to return to our previous behavior. Maybe we should add 2 flag, one is for using merged partition preference when planning child, the other one is for using merged partition preference when adding exchange for current node.

    • update: Added an enum property to control the partition preference merging strategy. There are currently 3 options: LEGACY, TOP_DOWN and BOTTOM_UP.
  • Open discussion before about controlling merging other properties and also top-down vs bottom-up partition preference (Merge partition preference when adding exchange nodes #11262 (comment)).

  • IndexJoin and Union is handled differently in the current logic. They handle parent partition preference on their own.

Todo

  • Add detail commit message before merging.
== RELEASE NOTES ==

General Changes
* Add new session property "aggregation_partitioning_merging_strategy" to control merging partitioning preference when adding repartition remote exchange around aggregation node. Default option is `LEGACY` and can be overwritten to `TOP_DOWN` or `BOTTOM_UP`.

@jessesleeping
Copy link
Contributor Author

Any suggestions on how to test the index join plan?

@sopel39
Copy link
Contributor

sopel39 commented Aug 14, 2018

Could you give an example of how plans change?

@jessesleeping jessesleeping force-pushed the merge_partition_preference branch from 38b2bce to b212137 Compare August 14, 2018 17:33
@jessesleeping
Copy link
Contributor Author

Take the following aggregation query as an example:

SELECT COUNT(orderdate), custkey 
FROM (
    SELECT orderdate, custkey 
    FROM orders 
    GROUP BY orderdate, custkey
)
GROUP BY custkey

The previous plan has 2 repartition exchanges, partitioning on group by columns of the inner query and the outer query respectively:

OutputNode
    Aggregation (final, group by custkey)
        Exchange (repartition on custkey)        // Second exchange on custkey
            Aggregation (partial, group by custkey)
                Aggregation (final, group by custkey, orderdate)               
                   Exchange (repartition on custkey, orderdate)        // First exchange on custkey and orderdate
                        Aggregation (partial, group by custkey, orderdate)
                            TableScan (orders)

After this two commits, if you set session property merge_partiton_preference to true, the plan will have only one repartition exchange. The inner one is merged with the outer one:

OutputNode
    Aggregation (single, group by custkey)
        Aggregation (final, group by custkey, orderdate)               
            Exchange (repartition on custkey)        // Here is the merged exchange
                Aggregation (partial, group by custkey, orderdate)
                    TableScan (orders)

The optimizer in the second case merged the partition preference of the inner query with the outer query. This behavior is not always optimal. It's possible that reducing the exchange increase the skewness. That's why we want to control this behavior by a session property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use abbreviations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also address this.

@sopel39
Copy link
Contributor

sopel39 commented Aug 16, 2018

Currently partitioning "merging" happens bottom up, e.g: when data is already partitioned on cust and we require partitioning on cust, orders for aggregation it won't be reshuffled again.

This PR essentially introduces partitioning preference merging in top-down approach. This is really valuable.

However, I would change merge_partiton_preference meaning so that it disables/enables both bottom-up and top-down partitioning preference merging. We had a customer case when aggregation on sym1, sym2 was failing because data was already partitioned on sym1 which had low cardinality and was highly skewed against sym2.

What do you think @martint ?

@martint
Copy link
Contributor

martint commented Aug 16, 2018

This PR essentially introduces partitioning preference merging in top-down approach. This is really valuable.

I haven't looked at the code, yet, but the goal of this change (which @jessesleeping and I discussed offline a few days ago) was to make the currently unavoidable greedy choice of merging of parent preferences optional via session property. It shouldn't introduce any structural capability that didn't previously exist.

@jessesleeping
Copy link
Contributor Author

We already had this behavior (i.e. mering partition preference top-down) when adding exchange for UNION. This PR extended that to other operations.

@sopel39
Copy link
Contributor

sopel39 commented Aug 17, 2018

was to make the currently unavoidable greedy choice of merging of parent preferences optional via session property. It shouldn't introduce any structural capability that didn't previously exist.

I think this PR only adds "merging of parent partitioning preferences" plus a switch for it.

There seem to be few related topics here:

  1. switch for merging all preferences top-down
  2. merging partitioning preferences top-down
  3. switch for merging partitioning preferences top-down
  4. switch for reusing partitioning bottom-up

This PR seems to add 2) and 3), but not 1)

I was suggesting to add 2) and combine 3) and 4) into a single switch: "reuse/merge partitioning preferences"

@jessesleeping
Copy link
Contributor Author

switch for reusing partitioning bottom-up
@sopel39 Is 4) equivalent to setting the current merge_partition_preferrence to false? In that way, the preferred partitioning will not consider parent partitioning preference. We will add exchange node based on the current node's partitioning preference.

@sopel39
Copy link
Contributor

sopel39 commented Aug 20, 2018

@jessesleeping there are two things:

  1. reuse parent preferred partitioning
  2. when child nodes naturally produce compatible preferences, should we use it or add explicit exchange?

@stale
Copy link

stale bot commented Apr 3, 2019

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

@stale stale bot unassigned martint Apr 3, 2019
@stale stale bot added the stale label Apr 3, 2019
@stale stale bot closed this Apr 10, 2019
@highker highker reopened this Nov 18, 2019
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 18, 2019

CLA Check
The committers are authorized under a signed CLA.

  • ✅ Jiexi Lin (d0c3228095c4d6132a118a5f36f60b5954c75ec9, b2121375133e4998d8110a46431874df8da0bd11)

@stale stale bot removed the stale label Nov 18, 2019
@highker highker removed the request for review from martint November 18, 2019 01:59
Copy link
Contributor

@shixuan-fan shixuan-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % minor nits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also address this.

@shixuan-fan
Copy link
Contributor

After we introduced a boolean property, do we still keep the old behavior? I might be reading it wrong, but It seems that we introduced two different new behaviors?

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave it for @shixuan-fan and @arhimondr to review given I'm not really familiar with this part of logic.

@jessesleeping jessesleeping force-pushed the merge_partition_preference branch from 75dab7e to eaaa0a2 Compare December 4, 2019 20:07
@jessesleeping
Copy link
Contributor Author

Update:

  • Address comments
  • Change the new session property from a boolean to a enum. The session property provides partition merging strategies of LEGACY, TOP_DOWN and BOTTOM_UP. I am open to better naming and behavior control methods.

@jessesleeping jessesleeping requested review from arhimondr, highker, shixuan-fan and wenleix and removed request for arhimondr and wenleix December 4, 2019 20:09
@wenleix wenleix requested a review from aweisberg December 4, 2019 20:15
@jessesleeping jessesleeping removed the request for review from highker December 4, 2019 21:09
@jessesleeping jessesleeping force-pushed the merge_partition_preference branch from eaaa0a2 to 886fdc3 Compare December 9, 2019 21:57
Copy link
Contributor

@shixuan-fan shixuan-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jessesleeping jessesleeping force-pushed the merge_partition_preference branch from 886fdc3 to c2445aa Compare December 9, 2019 23:04
@jessesleeping jessesleeping force-pushed the merge_partition_preference branch 2 times, most recently from fa5bb58 to b572b00 Compare December 18, 2019 04:01
@jessesleeping
Copy link
Contributor Author

  • Revive changes to nodes other than AggregationNode
  • Rename the session property
  • Refactor the test

Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I would go with the earlier change that only touches aggregation. I'll add the other operators for exact partitioning if I don't end up doing what I am doing with exact partitioning which is bailing out of mergeWithParent immediately.

@jessesleeping jessesleeping force-pushed the merge_partition_preference branch from b572b00 to afeeed0 Compare December 19, 2019 00:40
@jessesleeping
Copy link
Contributor Author

  • Revert changes to non-aggregation nodes.

@jessesleeping jessesleeping merged commit 82e90bf into prestodb:master Dec 19, 2019
@aweisberg aweisberg mentioned this pull request Jan 17, 2020
7 tasks
@caithagoras caithagoras mentioned this pull request Jan 22, 2020
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants