[SPARK-33935][SQL] Fix CBO cost function #30965

tanelk · 2020-12-29T18:28:25Z

What changes were proposed in this pull request?

Changed the cost function in CBO to match documentation.

Why are the changes needed?

The parameter spark.sql.cbo.joinReorder.card.weight is documented as:

The weight of cardinality (number of rows) for plan cost comparison in join reorder: rows * weight + size * (1 - weight).

The implementation in JoinReorderDP.betterThan does not match this documentaiton:

def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
      if (other.planCost.card == 0 || other.planCost.size == 0) {
        false
      } else {
        val relativeRows = BigDecimal(this.planCost.card) / BigDecimal(other.planCost.card)
        val relativeSize = BigDecimal(this.planCost.size) / BigDecimal(other.planCost.size)
        relativeRows * conf.joinReorderCardWeight +
          relativeSize * (1 - conf.joinReorderCardWeight) < 1
      }
    }

This different implementation has an unfortunate consequence:
given two plans A and B, both A betterThan B and B betterThan A might give the same results. This happes when one has many rows with small sizes and other has few rows with large sizes.

A example values, that have this fenomen with the default weight value (0.7):
A.card = 500, B.card = 300
A.size = 30, B.size = 80
Both A betterThan B and B betterThan A would have score above 1 and would return false.

This happens with several of the TPCDS queries.

The new implementation does not have this behavior.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New and existing UTs

tanelk · 2020-12-29T18:43:00Z

A bit more information:

The unstability of CBO has been noted before (#29638) and I think, that this is the main reason for this. There is also #29871, that tackles another reason for the unstability. That one should only impact, when the costs are equal, this here can impact more plans (see the example in the description)

An important thing to note is that this could change the the behavior of the spark.sql.cbo.joinReorder.card.weight config value, but luckily it seems, that it does so minimally.
I generated random values for the plan row counts and sizes, and found that the new cost function agrees most with the old cost function at the same weight value. This holds true for all the weight value, not only the default (0.7).

tanelk · 2020-12-29T18:44:55Z

@LuciferYang and @maropu
Could you take a look at this or perhaps point this to someone, who has worked more with the CBO.

SparkQA · 2020-12-29T19:24:33Z

Test build #133502 has finished for PR 30965 at commit 2308a8a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-29T19:52:53Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38091/

SparkQA · 2020-12-29T20:26:18Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38091/

maropu · 2020-12-30T13:05:39Z

cc: @wzhfy

maropu · 2020-12-30T13:11:38Z

I've checked the update and it seems fine. Could you add some tests based on the example in the PR description?

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala

tanelk · 2020-12-30T14:25:15Z

I've checked the update and it seems fine. Could you add some tests based on the example in the PR description?

Ahh, good point. Adding this I also realised, that I had the inequality check in the wrong order - this greatly reduced the amount of plan changes.

SparkQA · 2020-12-30T16:13:16Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38129/

SparkQA · 2020-12-30T16:24:19Z

Test build #133539 has finished for PR 30965 at commit 4b05711.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-30T16:43:24Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38129/

SparkQA · 2020-12-30T17:04:13Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38130/

SparkQA · 2020-12-30T17:37:30Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38130/

SparkQA · 2020-12-30T18:22:20Z

Test build #133534 has finished for PR 30965 at commit 6dbb9fe.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-30T19:09:49Z

Test build #133536 has finished for PR 30965 at commit 24f65b5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2021-01-04T12:48:32Z

...st/src/test/scala/org/apache/spark/sql/catalyst/optimizer/joinReorder/JoinReorderSuite.scala

+    assert(plan1.betterThan(plan2, conf))
+    assert(!plan2.betterThan(plan1, conf))


Thanks for adding this. This fix looks fine to me. cc: @cloud-fan @HyukjinKwon

cloud-fan · 2021-01-04T16:03:18Z

retest this please

LuciferYang · 2021-01-05T03:47:41Z

@tanelk Can you help to run UTs in sql/catalyst and sql/core in Scala 2.13 with this pr manually ? thx ~

SparkQA · 2021-01-05T04:05:23Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38210/

SparkQA · 2021-01-05T04:46:02Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38210/

SparkQA · 2021-01-05T05:48:44Z

Test build #133621 has finished for PR 30965 at commit 4b05711.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

### What changes were proposed in this pull request? Changed the cost function in CBO to match documentation. ### Why are the changes needed? The parameter `spark.sql.cbo.joinReorder.card.weight` is documented as: ``` The weight of cardinality (number of rows) for plan cost comparison in join reorder: rows * weight + size * (1 - weight). ``` The implementation in `JoinReorderDP.betterThan` does not match this documentaiton: ``` def betterThan(other: JoinPlan, conf: SQLConf): Boolean = { if (other.planCost.card == 0 || other.planCost.size == 0) { false } else { val relativeRows = BigDecimal(this.planCost.card) / BigDecimal(other.planCost.card) val relativeSize = BigDecimal(this.planCost.size) / BigDecimal(other.planCost.size) relativeRows * conf.joinReorderCardWeight + relativeSize * (1 - conf.joinReorderCardWeight) < 1 } } ``` This different implementation has an unfortunate consequence: given two plans A and B, both A betterThan B and B betterThan A might give the same results. This happes when one has many rows with small sizes and other has few rows with large sizes. A example values, that have this fenomen with the default weight value (0.7): A.card = 500, B.card = 300 A.size = 30, B.size = 80 Both A betterThan B and B betterThan A would have score above 1 and would return false. This happens with several of the TPCDS queries. The new implementation does not have this behavior. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New and existing UTs Closes #30965 from tanelk/SPARK-33935_cbo_cost_function. Authored-by: [email protected] <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit f252a93) Signed-off-by: Takeshi Yamamuro <[email protected]>

maropu · 2021-01-05T07:01:37Z

Thanks! Merged to master/3.1.

maropu · 2021-01-05T07:02:30Z

@tanelk Can you help to run UTs in sql/catalyst and sql/core in Scala 2.13 with this pr manually ? thx ~

Could you check this? Have this fix resolved the previous issue, too?

maropu · 2021-01-05T07:03:07Z

@tanelk Could you open a PR to fix it for branch-3.0/2.4?

tanelk · 2021-01-05T11:06:36Z

I created the pulls for 3.0 and 2.4: #31042 and #31043

tanelk · 2021-01-05T13:07:31Z

@tanelk Can you help to run UTs in sql/catalyst and sql/core in Scala 2.13 with this pr manually ? thx ~

The sql/catalyst passed.
With sql/core I had some issues:
at random tests it would fail with the following exception:

*** RUN ABORTED ***
  java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 100 retries (on a random free port)!

I assume you want the results of *PlanStability*Suite - these all passed.

maropu · 2021-01-05T13:31:10Z

The sql/catalyst passed.
I assume you want the results of PlanStabilitySuite - these all passed.

Awesome!

wzhfy · 2021-02-05T07:59:52Z

@tanelk Hi, sorry to see this so late.

IIRC the reason to use a relative value for rowCount and size, is to normalize them to a similar scale while comparing cost. Otherwise, one (size) may overwhelm the other (rowCount), then the weight and cost function become meaningless.

To resolve the stability issue, can we have a betterPlan() function instead of current betterThan()? Inside that plan, we can fix the comparison order based on rowCount or size. And at caller side, we can use !existingPlan.get.eq(betterPlan(existingPlan.get, newJoinPlan, conf)) to decide whether to update the best plan so far.

    def betterPlan(existing: JoinPlan, newPlan: JoinPlan, conf: SQLConf): JoinPlan = {
      // To fix the comparison order, set the one with smaller cardinality as the baseline.
      val (baseline, toCompare) = if (existing.planCost.card <= newPlan.planCost.card) {
        (existing, newPlan)
      } else {
        (newPlan, existing)
      }

      if (toCompare.planCost.card == 0 || toCompare.planCost.size == 0) {
        return existing
      }

      val relativeRows = BigDecimal(baseline.planCost.card) / BigDecimal(toCompare.planCost.card)
      val relativeSize = BigDecimal(baseline.planCost.size) / BigDecimal(toCompare.planCost.size)
      val relativeCost = relativeRows * conf.joinReorderCardWeight +
        relativeSize * (1 - conf.joinReorderCardWeight)
      if (relativeCost == 1) {
        // If they have same cost, we don't update the best plan and return the existing one.
        existing
      } else if (relativeCost < 1) {
        baseline
      } else {
        toCompare
      }
    }

What do you think?

cloud-fan · 2021-03-30T08:26:01Z

I think what @wzhfy said makes sense, @tanelk do you have time to try this idea?

tanelk · 2021-03-31T06:33:57Z

@wzhfy and @cloud-fan

I'm not a fan of adding up the relative costs.

A simple example, where the weight is 0.5:
If this plans size (bytes) is 2x larger, then no matter how many times more rows does the other plan have, the other plan will allways be considered to be better - 0.5*2 + 0.5*0.00000000000001 > 1.
This basically the same situation, where one cost overwhelms the other.

Perhaps this would be a best of both worlds:
(this.card / other.card) ^ cardWeight * (this.size / other.size) ^ (1 - cardWeight) < 1.
In short - multiply the relative costs instead of adding them.

cloud-fan · 2021-03-31T06:38:05Z

sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt

-                                            ColumnarToRow
-                                              InputAdapter
-                                                Scan parquet default.store_sales [ss_sold_date_sk,ss_item_sk,ss_customer_sk,ss_store_sk,ss_ext_sales_price]
+                BroadcastHashJoin [ss_item_sk,i_item_sk]


I think q19 exposes a problem. Previously this BroadcastHashJoin is run before the SortMergeJoin, which reduces the input data of shuffle, because this BroadcastHashJoin has a filter on the right side and likely makes this join very selective.

@tanelk , if the idea from @wzhfy doesn't look good to you, can you try with some other ideas and see if we can fix this issue?

I'll experiment with it a bit, but it might take a while.

Thanks for looking into it!

This actually caused a significant regression at q19 in TPC-DS benchmark (performed internally). Can we just revert it for now, and do it with actual performance numbers? I think that's safer and easier for everybody here. Seems like at least the plan change here was overlooked and the performance of this had to be clarified.

branch-2.4 has the same regression? This PR was back-ported into branch-2.4, too. cc: @viirya

+1 for @HyukjinKwon's suggestion. Especially for branch-2.4, I think it is better to avoid unexpected performance change.

I discussed with @cloud-fan offline. Shall we merge #32014 into master, branch-3.1 and branch-3.0 to fix the regression, and revert this one from branch-2.4? Spark 2.4 release is very soon and might be best to stay safe, and technically this was more like an improvement.

Agree with it. I prefer to make stable for branch-2.4.

yea let's revert it from 2.4 to be safe

Revert at #32020

This reverts commit 3e6a6b7 per the discussion at #30965 (comment). Closes #32020 from viirya/revert-SPARK-33935. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

### What changes were proposed in this pull request? Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes. ### Why are the changes needed? In #30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`. With that we caused a performance regressions in some queries - TPCDS q19 for example. The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`. Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios. Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other. For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.00000000000001 > 1`. When working with ratios, then it would be better to multiply them. The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1`. ### Does this PR introduce _any_ user-facing change? Comparison of the changed TPCDS v1.4 query execution times at sf=10: | absolute | multiplicative | | additive | -- | -- | -- | -- | -- | -- q12 | 145 | 137 | -5.52% | 141 | -2.76% q13 | 264 | 271 | 2.65% | 271 | 2.65% q17 | 4521 | 4243 | -6.15% | 4348 | -3.83% q18 | 758 | 466 | -38.52% | 480 | -36.68% q19 | 38503 | 2167 | -94.37% | 2176 | -94.35% q20 | 119 | 120 | 0.84% | 126 | 5.88% q24a | 16429 | 16838 | 2.49% | 17103 | 4.10% q24b | 16592 | 16999 | 2.45% | 17268 | 4.07% q25 | 3558 | 3556 | -0.06% | 3675 | 3.29% q33 | 362 | 361 | -0.28% | 380 | 4.97% q52 | 1020 | 1032 | 1.18% | 1052 | 3.14% q55 | 927 | 938 | 1.19% | 961 | 3.67% q72 | 24169 | 13377 | -44.65% | 24306 | 0.57% q81 | 1285 | 1185 | -7.78% | 1168 | -9.11% q91 | 324 | 336 | 3.70% | 337 | 4.01% q98 | 126 | 129 | 2.38% | 131 | 3.97% All times are in ms, the change is compared to the situation in the master branch (absolute). The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved. ### How was this patch tested? PlanStabilitySuite Closes #32014 from tanelk/SPARK-34922_cbo_better_cost_function. Lead-authored-by: Tanel Kiis <[email protected]> Co-authored-by: [email protected] <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

…e CBO ### What changes were proposed in this pull request? Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes. ### Why are the changes needed? In #30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`. With that we caused a performance regressions in some queries - TPCDS q19 for example. The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`. Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios. Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other. For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.00000000000001 > 1`. When working with ratios, then it would be better to multiply them. The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1`. ### Does this PR introduce _any_ user-facing change? Comparison of the changed TPCDS v1.4 query execution times at sf=10: | absolute | multiplicative | | additive | -- | -- | -- | -- | -- | -- q12 | 145 | 137 | -5.52% | 141 | -2.76% q13 | 264 | 271 | 2.65% | 271 | 2.65% q17 | 4521 | 4243 | -6.15% | 4348 | -3.83% q18 | 758 | 466 | -38.52% | 480 | -36.68% q19 | 38503 | 2167 | -94.37% | 2176 | -94.35% q20 | 119 | 120 | 0.84% | 126 | 5.88% q24a | 16429 | 16838 | 2.49% | 17103 | 4.10% q24b | 16592 | 16999 | 2.45% | 17268 | 4.07% q25 | 3558 | 3556 | -0.06% | 3675 | 3.29% q33 | 362 | 361 | -0.28% | 380 | 4.97% q52 | 1020 | 1032 | 1.18% | 1052 | 3.14% q55 | 927 | 938 | 1.19% | 961 | 3.67% q72 | 24169 | 13377 | -44.65% | 24306 | 0.57% q81 | 1285 | 1185 | -7.78% | 1168 | -9.11% q91 | 324 | 336 | 3.70% | 337 | 4.01% q98 | 126 | 129 | 2.38% | 131 | 3.97% All times are in ms, the change is compared to the situation in the master branch (absolute). The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved. ### How was this patch tested? PlanStabilitySuite Closes #32075 from tanelk/SPARK-34922_cbo_better_cost_function_3.1. Lead-authored-by: Tanel Kiis <[email protected]> Co-authored-by: [email protected] <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

…e CBO ### What changes were proposed in this pull request? Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes. ### Why are the changes needed? In #30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`. With that we caused a performance regressions in some queries - TPCDS q19 for example. The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`. Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios. Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other. For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.00000000000001 > 1`. When working with ratios, then it would be better to multiply them. The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1`. ### Does this PR introduce _any_ user-facing change? Comparison of the changed TPCDS v1.4 query execution times at sf=10: | absolute | multiplicative | | additive | -- | -- | -- | -- | -- | -- q12 | 145 | 137 | -5.52% | 141 | -2.76% q13 | 264 | 271 | 2.65% | 271 | 2.65% q17 | 4521 | 4243 | -6.15% | 4348 | -3.83% q18 | 758 | 466 | -38.52% | 480 | -36.68% q19 | 38503 | 2167 | -94.37% | 2176 | -94.35% q20 | 119 | 120 | 0.84% | 126 | 5.88% q24a | 16429 | 16838 | 2.49% | 17103 | 4.10% q24b | 16592 | 16999 | 2.45% | 17268 | 4.07% q25 | 3558 | 3556 | -0.06% | 3675 | 3.29% q33 | 362 | 361 | -0.28% | 380 | 4.97% q52 | 1020 | 1032 | 1.18% | 1052 | 3.14% q55 | 927 | 938 | 1.19% | 961 | 3.67% q72 | 24169 | 13377 | -44.65% | 24306 | 0.57% q81 | 1285 | 1185 | -7.78% | 1168 | -9.11% q91 | 324 | 336 | 3.70% | 337 | 4.01% q98 | 126 | 129 | 2.38% | 131 | 3.97% All times are in ms, the change is compared to the situation in the master branch (absolute). The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved. ### How was this patch tested? PlanStabilitySuite Closes #32076 from tanelk/SPARK-34922_cbo_better_cost_function_3.0. Lead-authored-by: Tanel Kiis <[email protected]> Co-authored-by: [email protected] <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

…e CBO ### What changes were proposed in this pull request? Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes. ### Why are the changes needed? In apache#30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`. With that we caused a performance regressions in some queries - TPCDS q19 for example. The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`. Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios. Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other. For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.00000000000001 > 1`. When working with ratios, then it would be better to multiply them. The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1`. ### Does this PR introduce _any_ user-facing change? Comparison of the changed TPCDS v1.4 query execution times at sf=10: | absolute | multiplicative | | additive | -- | -- | -- | -- | -- | -- q12 | 145 | 137 | -5.52% | 141 | -2.76% q13 | 264 | 271 | 2.65% | 271 | 2.65% q17 | 4521 | 4243 | -6.15% | 4348 | -3.83% q18 | 758 | 466 | -38.52% | 480 | -36.68% q19 | 38503 | 2167 | -94.37% | 2176 | -94.35% q20 | 119 | 120 | 0.84% | 126 | 5.88% q24a | 16429 | 16838 | 2.49% | 17103 | 4.10% q24b | 16592 | 16999 | 2.45% | 17268 | 4.07% q25 | 3558 | 3556 | -0.06% | 3675 | 3.29% q33 | 362 | 361 | -0.28% | 380 | 4.97% q52 | 1020 | 1032 | 1.18% | 1052 | 3.14% q55 | 927 | 938 | 1.19% | 961 | 3.67% q72 | 24169 | 13377 | -44.65% | 24306 | 0.57% q81 | 1285 | 1185 | -7.78% | 1168 | -9.11% q91 | 324 | 336 | 3.70% | 337 | 4.01% q98 | 126 | 129 | 2.38% | 131 | 3.97% All times are in ms, the change is compared to the situation in the master branch (absolute). The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved. ### How was this patch tested? PlanStabilitySuite Closes apache#32075 from tanelk/SPARK-34922_cbo_better_cost_function_3.1. Lead-authored-by: Tanel Kiis <[email protected]> Co-authored-by: [email protected] <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

Change CBO cost function

2308a8a

github-actions bot added the SQL label Dec 29, 2020

maropu reviewed Dec 30, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala Outdated Show resolved Hide resolved

Change CBO cost function

6dbb9fe

Change CBO cost function

24f65b5

tanelk changed the title ~~[WIP][SPARK-33935][SQL] Fix CBO cost function~~ [SPARK-33935][SQL] Fix CBO cost function Dec 30, 2020

Fix existing UT

4b05711

maropu reviewed Jan 4, 2021

View reviewed changes

cloud-fan approved these changes Jan 4, 2021

View reviewed changes

maropu closed this in f252a93 Jan 5, 2021

tanelk mentioned this pull request Jan 5, 2021

[SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent #29871

Closed

cloud-fan reviewed Mar 31, 2021

View reviewed changes

tanelk mentioned this pull request Mar 31, 2021

[SPARK-34922][SQL] Use a relative cost comparison function in the CBO #32014

Closed

viirya mentioned this pull request Apr 1, 2021

Revert "[SPARK-33935][SQL][2.4] Fix CBO cost function" #32020

Closed

This was referenced Apr 7, 2021

[SPARK-34922][SQL][3.1] Use a relative cost comparison function in the CBO #32075

Closed

[SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO #32076

Closed

		assert(plan1.betterThan(plan2, conf))
		assert(!plan2.betterThan(plan1, conf))

[SPARK-33935][SQL] Fix CBO cost function #30965

[SPARK-33935][SQL] Fix CBO cost function #30965

Uh oh!

Conversation

tanelk commented Dec 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

tanelk commented Dec 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tanelk commented Dec 29, 2020

Uh oh!

SparkQA commented Dec 29, 2020

Uh oh!

SparkQA commented Dec 29, 2020

Uh oh!

SparkQA commented Dec 29, 2020

Uh oh!

maropu commented Dec 30, 2020

Uh oh!

maropu commented Dec 30, 2020

Uh oh!

Uh oh!

tanelk commented Dec 30, 2020

Uh oh!

SparkQA commented Dec 30, 2020

Uh oh!

SparkQA commented Dec 30, 2020

Uh oh!

SparkQA commented Dec 30, 2020

Uh oh!

SparkQA commented Dec 30, 2020

Uh oh!

SparkQA commented Dec 30, 2020

Uh oh!

SparkQA commented Dec 30, 2020

Uh oh!

SparkQA commented Dec 30, 2020

Uh oh!

maropu Jan 4, 2021

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jan 4, 2021

Uh oh!

LuciferYang commented Jan 5, 2021

Uh oh!

SparkQA commented Jan 5, 2021

Uh oh!

SparkQA commented Jan 5, 2021

Uh oh!

SparkQA commented Jan 5, 2021

Uh oh!

maropu commented Jan 5, 2021

Uh oh!

maropu commented Jan 5, 2021

Uh oh!

maropu commented Jan 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tanelk commented Jan 5, 2021

Uh oh!

tanelk commented Jan 5, 2021

Uh oh!

maropu commented Jan 5, 2021

Uh oh!

wzhfy commented Feb 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Mar 30, 2021

Uh oh!

tanelk commented Mar 31, 2021

Uh oh!

cloud-fan Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanelk commented Dec 29, 2020 •

edited

Loading

tanelk commented Dec 29, 2020 •

edited

Loading

maropu commented Jan 5, 2021 •

edited

Loading

wzhfy commented Feb 5, 2021 •

edited

Loading

cloud-fan Mar 31, 2021 •

edited

Loading

HyukjinKwon Mar 31, 2021 •

edited

Loading

viirya Mar 31, 2021 •

edited

Loading

viirya Apr 1, 2021 •

edited

Loading