[SPARK-34922][SQL][3.1] Use a relative cost comparison function in the CBO #32075

tanelk · 2021-04-07T06:20:48Z

What changes were proposed in this pull request?

Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes.

Why are the changes needed?

In #30965 we changed to CBO cost comparison function so it would be "symetric": A.betterThan(B) now implies, that !B.betterThan(A).
With that we caused a performance regressions in some queries - TPCDS q19 for example.

The original cost comparison function used the ratios relativeRows = A.rowCount / B.rowCount and relativeSize = A.size / B.size. The changed function compared "absolute" cost values costA = w*A.rowCount + (1-w)*A.size and costB = w*B.rowCount + (1-w)*B.size.

Given the input from @wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios.

Originally A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1 was used. Besides being "non-symteric", this also can exhibit one overwhelming other.
For w=0.5 If A size (bytes) is at least 2x larger than B, then no matter how many times more rows does the B plan have, B will allways be considered to be better - 0.5*2 + 0.5*0.00000000000001 > 1.

When working with ratios, then it would be better to multiply them.
The proposed cost comparison function is: A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1.

Does this PR introduce any user-facing change?

Comparison of the changed TPCDS v1.4 query execution times at sf=10:

	absolute	multiplicative		additive
q12	145	137	-5.52%	141	-2.76%
q13	264	271	2.65%	271	2.65%
q17	4521	4243	-6.15%	4348	-3.83%
q18	758	466	-38.52%	480	-36.68%
q19	38503	2167	-94.37%	2176	-94.35%
q20	119	120	0.84%	126	5.88%
q24a	16429	16838	2.49%	17103	4.10%
q24b	16592	16999	2.45%	17268	4.07%
q25	3558	3556	-0.06%	3675	3.29%
q33	362	361	-0.28%	380	4.97%
q52	1020	1032	1.18%	1052	3.14%
q55	927	938	1.19%	961	3.67%
q72	24169	13377	-44.65%	24306	0.57%
q81	1285	1185	-7.78%	1168	-9.11%
q91	324	336	3.70%	337	4.01%
q98	126	129	2.38%	131	3.97%

All times are in ms, the change is compared to the situation in the master branch (absolute).
The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved.

How was this patch tested?

PlanStabilitySuite

SparkQA · 2021-04-07T09:33:11Z

Test build #137008 has finished for PR 32075 at commit 051e091.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-04-07T09:34:00Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41586/

SparkQA · 2021-04-07T09:39:52Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41586/

tanelk · 2021-04-07T11:46:40Z

@maropu

maropu · 2021-04-08T02:01:19Z

I've checked the failed tests are not related to this PR, so I'll merge this.

…e CBO ### What changes were proposed in this pull request? Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes. ### Why are the changes needed? In #30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`. With that we caused a performance regressions in some queries - TPCDS q19 for example. The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`. Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios. Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other. For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.00000000000001 > 1`. When working with ratios, then it would be better to multiply them. The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1`. ### Does this PR introduce _any_ user-facing change? Comparison of the changed TPCDS v1.4 query execution times at sf=10: | absolute | multiplicative | | additive | -- | -- | -- | -- | -- | -- q12 | 145 | 137 | -5.52% | 141 | -2.76% q13 | 264 | 271 | 2.65% | 271 | 2.65% q17 | 4521 | 4243 | -6.15% | 4348 | -3.83% q18 | 758 | 466 | -38.52% | 480 | -36.68% q19 | 38503 | 2167 | -94.37% | 2176 | -94.35% q20 | 119 | 120 | 0.84% | 126 | 5.88% q24a | 16429 | 16838 | 2.49% | 17103 | 4.10% q24b | 16592 | 16999 | 2.45% | 17268 | 4.07% q25 | 3558 | 3556 | -0.06% | 3675 | 3.29% q33 | 362 | 361 | -0.28% | 380 | 4.97% q52 | 1020 | 1032 | 1.18% | 1052 | 3.14% q55 | 927 | 938 | 1.19% | 961 | 3.67% q72 | 24169 | 13377 | -44.65% | 24306 | 0.57% q81 | 1285 | 1185 | -7.78% | 1168 | -9.11% q91 | 324 | 336 | 3.70% | 337 | 4.01% q98 | 126 | 129 | 2.38% | 131 | 3.97% All times are in ms, the change is compared to the situation in the master branch (absolute). The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved. ### How was this patch tested? PlanStabilitySuite Closes #32075 from tanelk/SPARK-34922_cbo_better_cost_function_3.1. Lead-authored-by: Tanel Kiis <tanel.kiis@gmail.com> Co-authored-by: tanel.kiis@gmail.com <tanel.kiis@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

maropu · 2021-04-08T02:02:18Z

Thanks! Merged to branch-3.1.

…e CBO ### What changes were proposed in this pull request? Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes. ### Why are the changes needed? In apache#30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`. With that we caused a performance regressions in some queries - TPCDS q19 for example. The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`. Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios. Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other. For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.00000000000001 > 1`. When working with ratios, then it would be better to multiply them. The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1`. ### Does this PR introduce _any_ user-facing change? Comparison of the changed TPCDS v1.4 query execution times at sf=10: | absolute | multiplicative | | additive | -- | -- | -- | -- | -- | -- q12 | 145 | 137 | -5.52% | 141 | -2.76% q13 | 264 | 271 | 2.65% | 271 | 2.65% q17 | 4521 | 4243 | -6.15% | 4348 | -3.83% q18 | 758 | 466 | -38.52% | 480 | -36.68% q19 | 38503 | 2167 | -94.37% | 2176 | -94.35% q20 | 119 | 120 | 0.84% | 126 | 5.88% q24a | 16429 | 16838 | 2.49% | 17103 | 4.10% q24b | 16592 | 16999 | 2.45% | 17268 | 4.07% q25 | 3558 | 3556 | -0.06% | 3675 | 3.29% q33 | 362 | 361 | -0.28% | 380 | 4.97% q52 | 1020 | 1032 | 1.18% | 1052 | 3.14% q55 | 927 | 938 | 1.19% | 961 | 3.67% q72 | 24169 | 13377 | -44.65% | 24306 | 0.57% q81 | 1285 | 1185 | -7.78% | 1168 | -9.11% q91 | 324 | 336 | 3.70% | 337 | 4.01% q98 | 126 | 129 | 2.38% | 131 | 3.97% All times are in ms, the change is compared to the situation in the master branch (absolute). The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved. ### How was this patch tested? PlanStabilitySuite Closes apache#32075 from tanelk/SPARK-34922_cbo_better_cost_function_3.1. Lead-authored-by: Tanel Kiis <tanel.kiis@gmail.com> Co-authored-by: tanel.kiis@gmail.com <tanel.kiis@gmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

tanelk added 7 commits April 7, 2021 08:48

Relative cost function

0b17bc9

Fix test

d2795ff

Update doc

3dbb015

Comment

4101d66

Comment

d74b89e

Comment

2ddbee6

Rerun plan stability

051e091

github-actions bot added the SQL label Apr 7, 2021

maropu closed this Apr 8, 2021

tanelk deleted the SPARK-34922_cbo_better_cost_function_3.1 branch June 15, 2021 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-34922][SQL][3.1] Use a relative cost comparison function in the CBO #32075

[SPARK-34922][SQL][3.1] Use a relative cost comparison function in the CBO #32075

Uh oh!

tanelk commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

tanelk commented Apr 7, 2021

Uh oh!

maropu commented Apr 8, 2021

Uh oh!

maropu commented Apr 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-34922][SQL][3.1] Use a relative cost comparison function in the CBO #32075

[SPARK-34922][SQL][3.1] Use a relative cost comparison function in the CBO #32075

Uh oh!

Conversation

tanelk commented Apr 7, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

tanelk commented Apr 7, 2021

Uh oh!

maropu commented Apr 8, 2021

Uh oh!

maropu commented Apr 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants