[SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO #32076

tanelk · 2021-04-07T06:21:04Z

What changes were proposed in this pull request?

Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes.

Why are the changes needed?

In #30965 we changed to CBO cost comparison function so it would be "symetric": A.betterThan(B) now implies, that !B.betterThan(A).
With that we caused a performance regressions in some queries - TPCDS q19 for example.

The original cost comparison function used the ratios relativeRows = A.rowCount / B.rowCount and relativeSize = A.size / B.size. The changed function compared "absolute" cost values costA = w*A.rowCount + (1-w)*A.size and costB = w*B.rowCount + (1-w)*B.size.

Given the input from @wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios.

Originally A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1 was used. Besides being "non-symteric", this also can exhibit one overwhelming other.
For w=0.5 If A size (bytes) is at least 2x larger than B, then no matter how many times more rows does the B plan have, B will allways be considered to be better - 0.5*2 + 0.5*0.00000000000001 > 1.

When working with ratios, then it would be better to multiply them.
The proposed cost comparison function is: A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1.

Does this PR introduce any user-facing change?

Comparison of the changed TPCDS v1.4 query execution times at sf=10:

	absolute	multiplicative		additive
q12	145	137	-5.52%	141	-2.76%
q13	264	271	2.65%	271	2.65%
q17	4521	4243	-6.15%	4348	-3.83%
q18	758	466	-38.52%	480	-36.68%
q19	38503	2167	-94.37%	2176	-94.35%
q20	119	120	0.84%	126	5.88%
q24a	16429	16838	2.49%	17103	4.10%
q24b	16592	16999	2.45%	17268	4.07%
q25	3558	3556	-0.06%	3675	3.29%
q33	362	361	-0.28%	380	4.97%
q52	1020	1032	1.18%	1052	3.14%
q55	927	938	1.19%	961	3.67%
q72	24169	13377	-44.65%	24306	0.57%
q81	1285	1185	-7.78%	1168	-9.11%
q91	324	336	3.70%	337	4.01%
q98	126	129	2.38%	131	3.97%

All times are in ms, the change is compared to the situation in the master branch (absolute).
The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved.

How was this patch tested?

PlanStabilitySuite

SparkQA · 2021-04-07T09:18:06Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41585/

SparkQA · 2021-04-07T09:18:07Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41585/

AmplabJenkins · 2021-04-07T09:18:11Z

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41585/

SparkQA · 2021-04-07T11:13:37Z

Test build #137007 has finished for PR 32076 at commit a5b33bd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2021-04-07T11:16:01Z

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137007/

tanelk · 2021-04-07T11:46:44Z

@maropu

…e CBO ### What changes were proposed in this pull request? Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes. ### Why are the changes needed? In #30965 we changed to CBO cost comparison function so it would be "symetric": `A.betterThan(B)` now implies, that `!B.betterThan(A)`. With that we caused a performance regressions in some queries - TPCDS q19 for example. The original cost comparison function used the ratios `relativeRows = A.rowCount / B.rowCount` and `relativeSize = A.size / B.size`. The changed function compared "absolute" cost values `costA = w*A.rowCount + (1-w)*A.size` and `costB = w*B.rowCount + (1-w)*B.size`. Given the input from wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios. Originally `A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1` was used. Besides being "non-symteric", this also can exhibit one overwhelming other. For `w=0.5` If `A` size (bytes) is at least 2x larger than `B`, then no matter how many times more rows does the `B` plan have, `B` will allways be considered to be better - `0.5*2 + 0.5*0.00000000000001 > 1`. When working with ratios, then it would be better to multiply them. The proposed cost comparison function is: `A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1`. ### Does this PR introduce _any_ user-facing change? Comparison of the changed TPCDS v1.4 query execution times at sf=10: | absolute | multiplicative | | additive | -- | -- | -- | -- | -- | -- q12 | 145 | 137 | -5.52% | 141 | -2.76% q13 | 264 | 271 | 2.65% | 271 | 2.65% q17 | 4521 | 4243 | -6.15% | 4348 | -3.83% q18 | 758 | 466 | -38.52% | 480 | -36.68% q19 | 38503 | 2167 | -94.37% | 2176 | -94.35% q20 | 119 | 120 | 0.84% | 126 | 5.88% q24a | 16429 | 16838 | 2.49% | 17103 | 4.10% q24b | 16592 | 16999 | 2.45% | 17268 | 4.07% q25 | 3558 | 3556 | -0.06% | 3675 | 3.29% q33 | 362 | 361 | -0.28% | 380 | 4.97% q52 | 1020 | 1032 | 1.18% | 1052 | 3.14% q55 | 927 | 938 | 1.19% | 961 | 3.67% q72 | 24169 | 13377 | -44.65% | 24306 | 0.57% q81 | 1285 | 1185 | -7.78% | 1168 | -9.11% q91 | 324 | 336 | 3.70% | 337 | 4.01% q98 | 126 | 129 | 2.38% | 131 | 3.97% All times are in ms, the change is compared to the situation in the master branch (absolute). The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved. ### How was this patch tested? PlanStabilitySuite Closes #32076 from tanelk/SPARK-34922_cbo_better_cost_function_3.0. Lead-authored-by: Tanel Kiis <[email protected]> Co-authored-by: [email protected] <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

maropu · 2021-04-08T02:04:17Z

Thanks! Merged to branch-3.0.

tanelk added 6 commits April 7, 2021 09:17

Relative cost function

8f2e5f6

Fix test

b2f141a

Update doc

906c17c

Comment

d40200c

Comment

a7b5eb3

Comment

a5b33bd

maropu closed this Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO #32076

[SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO #32076

Uh oh!

tanelk commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

AmplabJenkins commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

AmplabJenkins commented Apr 7, 2021

Uh oh!

tanelk commented Apr 7, 2021

Uh oh!

maropu commented Apr 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO #32076

[SPARK-34922][SQL][3.0] Use a relative cost comparison function in the CBO #32076

Uh oh!

Conversation

tanelk commented Apr 7, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

AmplabJenkins commented Apr 7, 2021

Uh oh!

SparkQA commented Apr 7, 2021

Uh oh!

AmplabJenkins commented Apr 7, 2021

Uh oh!

tanelk commented Apr 7, 2021

Uh oh!

maropu commented Apr 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants