-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34922][SQL] Use a relative cost comparison function in the CBO #32014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } else { | ||
| val relativeRows = BigDecimal(this.planCost.card) / BigDecimal(other.planCost.card) | ||
| val relativeSize = BigDecimal(this.planCost.size) / BigDecimal(other.planCost.size) | ||
| Math.pow(relativeRows.doubleValue(), conf.joinReorderCardWeight) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we update the config doc?
| val relativeRows = BigDecimal(this.planCost.card) / BigDecimal(other.planCost.card) | ||
| val relativeSize = BigDecimal(this.planCost.size) / BigDecimal(other.planCost.size) | ||
| Math.pow(relativeRows.doubleValue(), conf.joinReorderCardWeight) * | ||
| Math.pow(relativeSize.doubleValue(), 1 - conf.joinReorderCardWeight) < 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this symmetric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, it's kind of normalize the row count and bytes size with Math.pow(_, m) and Math.pow(_, 1- m), then calculate the relative ratios and compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, then, when the left side of the comparison is x for A.betterThan(B), then for B.betterThan(A) it will be 1/x. One of them will be greater than 1 and other smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, after normalization the formula is still row_count1 * size1 / (row_count2 * size2), so if one is x the other must be 1/x.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #136768 has finished for PR 32014 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #136780 has finished for PR 32014 at commit
|
|
The change LGTM. Can we re-generate the golden files to fix conflicts? |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #136799 has finished for PR 32014 at commit
|
| thisCost < otherCost | ||
| if (other.planCost.card == 0 || other.planCost.size == 0) { | ||
| false | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about leaving some comments about why we need to use relative values here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments to this method.
|
cc: @wzhfy |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #136851 has finished for PR 32014 at commit
|
|
Unfortunately there are conflicts again... |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #136954 has finished for PR 32014 at commit
|
|
Test build #136958 has finished for PR 32014 at commit
|
|
The GA test failures is not related to this PR, so merged to master. |
|
Could you open PRs to backport this for branch-3.1/3.0, too? |
What changes were proposed in this pull request?
Changed the cost comparison function of the CBO to use the ratios of row counts and sizes in bytes.
Why are the changes needed?
In #30965 we changed to CBO cost comparison function so it would be "symetric":
A.betterThan(B)now implies, that!B.betterThan(A).With that we caused a performance regressions in some queries - TPCDS q19 for example.
The original cost comparison function used the ratios
relativeRows = A.rowCount / B.rowCountandrelativeSize = A.size / B.size. The changed function compared "absolute" cost valuescostA = w*A.rowCount + (1-w)*A.sizeandcostB = w*B.rowCount + (1-w)*B.size.Given the input from @wzhfy we decided to go back to the relative values, because otherwise one (size) may overwhelm the other (rowCount). But this time we avoid adding up the ratios.
Originally
A.betterThan(B) => w*relativeRows + (1-w)*relativeSize < 1was used. Besides being "non-symteric", this also can exhibit one overwhelming other.For
w=0.5IfAsize (bytes) is at least 2x larger thanB, then no matter how many times more rows does theBplan have,Bwill allways be considered to be better -0.5*2 + 0.5*0.00000000000001 > 1.When working with ratios, then it would be better to multiply them.
The proposed cost comparison function is:
A.betterThan(B) => relativeRows^w * relativeSize^(1-w) < 1.Does this PR introduce any user-facing change?
Comparison of the changed TPCDS v1.4 query execution times at sf=10:
All times are in ms, the change is compared to the situation in the master branch (absolute).
The proposed cost function (multiplicative) significantlly improves the performance on q18, q19 and q72. The original cost function (additive) has similar improvements at q18 and q19. All other chagnes are within the error bars and I would ignore them - perhaps q81 has also improved.
How was this patch tested?
PlanStabilitySuite