Skip to content

Conversation

@tanelk
Copy link
Contributor

@tanelk tanelk commented Apr 7, 2021

What changes were proposed in this pull request?

Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes.

Why are the changes needed?

In #30965 we changed to CBO cost comparison function so it would be "symetric": A.betterThan(B) now implies, that !B.betterThan(A).
With that we caused a performance regressions in some queries - TPCDS q19 for example.

The original cost comparison function used the ratios relativeRows = A.rowCount / B.rowCount and relativeSize = A.size / B.size. The changed function compared "absolute" cost values costA = w*A.rowCount + (1-w)*A.size and costB = w*B.rowCount + (1-w)*B.size.

Given the input from @wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios.

Originally A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1 was used. Besides being "non-symteric", this also can exhibit one overwhelming other.
For w=0.5 If A size (bytes) is at least 2x larger than B, then no matter how many times more rows does the B plan have, B will allways be considered to be better - 0.5*2 + 0.5*0.00000000000001 > 1.

When working with ratios, then it would be better to multiply them.
The proposed cost comparison function is: A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1.

Does this PR introduce any user-facing change?

Comparison of the changed TPCDS v1.4 query execution times at sf=10:

  absolute multiplicative   additive  
q12 145 137 -5.52% 141 -2.76%
q13 264 271 2.65% 271 2.65%
q17 4521 4243 -6.15% 4348 -3.83%
q18 758 466 -38.52% 480 -36.68%
q19 38503 2167 -94.37% 2176 -94.35%
q20 119 120 0.84% 126 5.88%
q24a 16429 16838 2.49% 17103 4.10%
q24b 16592 16999 2.45% 17268 4.07%
q25 3558 3556 -0.06% 3675 3.29%
q33 362 361 -0.28% 380 4.97%
q52 1020 1032 1.18% 1052 3.14%
q55 927 938 1.19% 961 3.67%
q72 24169 13377 -44.65% 24306 0.57%
q81 1285 1185 -7.78% 1168 -9.11%
q91 324 336 3.70% 337 4.01%
q98 126 129 2.38% 131 3.97%

All times are in ms, the change is compared to the situation in the master branch (absolute).
The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved.

How was this patch tested?

PlanStabilitySuite

@SparkQA
Copy link

SparkQA commented Apr 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41585/

@SparkQA
Copy link

SparkQA commented Apr 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41585/

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41585/

@SparkQA
Copy link

SparkQA commented Apr 7, 2021

Test build #137007 has finished for PR 32076 at commit a5b33bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137007/

@tanelk
Copy link
Contributor Author

tanelk commented Apr 7, 2021

@maropu

maropu pushed a commit that referenced this pull request Apr 8, 2021
…e CBO

### What changes were proposed in this pull request?

Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes.

### Why are the changes needed?

In #30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`.
With that we caused a performance regressions in some queries - TPCDS q19 for example.

The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`.

Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios.

Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other.
For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.00000000000001 > 1`.

When working with ratios, then it would be better to multiply them.
The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w  * relativeSize^(1-w) < 1`.

### Does this PR introduce _any_ user-facing change?

Comparison of the changed TPCDS v1.4 query execution times at sf=10:

  | absolute | multiplicative |   | additive |  
-- | -- | -- | -- | -- | --
q12 | 145 | 137 | -5.52% | 141 | -2.76%
q13 | 264 | 271 | 2.65% | 271 | 2.65%
q17 | 4521 | 4243 | -6.15% | 4348 | -3.83%
q18 | 758 | 466 | -38.52% | 480 | -36.68%
q19 | 38503 | 2167 | -94.37% | 2176 | -94.35%
q20 | 119 | 120 | 0.84% | 126 | 5.88%
q24a | 16429 | 16838 | 2.49% | 17103 | 4.10%
q24b | 16592 | 16999 | 2.45% | 17268 | 4.07%
q25 | 3558 | 3556 | -0.06% | 3675 | 3.29%
q33 | 362 | 361 | -0.28% | 380 | 4.97%
q52 | 1020 | 1032 | 1.18% | 1052 | 3.14%
q55 | 927 | 938 | 1.19% | 961 | 3.67%
q72 | 24169 | 13377 | -44.65% | 24306 | 0.57%
q81 | 1285 | 1185 | -7.78% | 1168 | -9.11%
q91 | 324 | 336 | 3.70% | 337 | 4.01%
q98 | 126 | 129 | 2.38% | 131 | 3.97%

All times are in ms, the change is compared to the situation in the master branch (absolute).
The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved.

### How was this patch tested?

PlanStabilitySuite

Closes #32076 from tanelk/SPARK-34922_cbo_better_cost_function_3.0.

Lead-authored-by: Tanel Kiis <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
@maropu
Copy link
Member

maropu commented Apr 8, 2021

Thanks! Merged to branch-3.0.

@maropu maropu closed this Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants