-
Notifications
You must be signed in to change notification settings - Fork 180
Enhance the cost computing mechanism and push down context #4353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
# Conflicts: # integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_sum_enhancement.json
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
| YAMLGenerator.Feature.ALWAYS_QUOTE_NUMBERS_AS_STRINGS); // Quote numeric strings | ||
| yamlFactory.enable(YAMLGenerator.Feature.INDENT_ARRAYS_WITH_INDICATOR); | ||
| YAML_MAPPER = new ObjectMapper(yamlFactory); | ||
| YAML_MAPPER.enable(SerializationFeature.ORDER_MAP_ENTRIES_BY_KEYS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this config, otherwise for the plan of v2 which contains multi-children for one level, it will put the children at beginning and the operator name at the end
# Conflicts: # integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_simple_pattern_agg_push.json # integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_patterns_simple_pattern_agg_push.json
Signed-off-by: Heng Qian <[email protected]>
| return this.stream().anyMatch(action -> action.digest().equals(digest)); | ||
| } | ||
|
|
||
| public synchronized OpenSearchRequestBuilder createOrGetRequestBuilder() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add java doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync offline, let' move this method out
| * than non-pushdown operators. | ||
| */ | ||
| @Override | ||
| public @Nullable RelOptCost computeSelfCost(RelOptPlanner planner, RelMetadataQuery mq) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a UT for these two method to ensure the behaviour would be stable.
| case SORT -> dCpu += dRows; | ||
| // Refer the org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(Aggregate rel,...) | ||
| case COLLAPSE -> { | ||
| dRows = dRows / 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Try to reduce the rows count by 1 to make the cost cheaper slightly than non-push down. | ||
| // Because we'd like to push down LIMIT even when the fetch in LIMIT is greater than | ||
| // dRows. | ||
| case LIMIT -> dRows = Math.min(dRows, ((LimitDigest) operation.digest()).limit()) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be negative with -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be but VolcanoPlanner in Clacite will replace it with TINY cost it's less than 0.
| // Calculate the cost of script filter by multiplying the selectivity of the filter and | ||
| // the factor amplified by script count. | ||
| dCpu += | ||
| NumberUtil.multiply( | ||
| NumberUtil.multiply(dRows, RelMdUtil.guessSelectivity(filterDigest.condition())), | ||
| Math.pow(1.1, filterDigest.scriptCount())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will Math.pow(1.1, filterDigest.scriptCount() - 1) be better? It ensures that when there is only one script, the cost of pushed down script filter is smaller than that the same filter in memory ( dRow * guessedFactor * estimateRowCountFactor < dRow * guessedFactor, otherwise it's something like dRow * guessedFactor * 1.1 * estimateRowCountFactor, which is not necessarily smaller than dRow * guessedFactor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of estimateRowCountFactor is 0.9, so the cost of pushed-down filter is smaller.
On the other hand, the biggest reduction of cost from pushed-down filter is not from the cost reduction on filter iteself, but the final output rows count of the scan with filter push down.
Signed-off-by: Heng Qian <[email protected]>
# Conflicts: # integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_timestamp_push.json # integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown2.json # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java # opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
# Conflicts: # integ-test/src/test/resources/expectedOutput/ppl/explain_patterns_simple_pattern_agg_push.json
Signed-off-by: Heng Qian <[email protected]>
|
@LantaoJin There are some enhancement on plan with this PR: And #4377 is no longer needed after this PR. I've removed the code and keep the yaml test only. |
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4353-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 585a04c5de2dedcb3e7c2abf130846caac4dee50
# Push it to GitHub
git push --set-upstream origin backport/backport-4353-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
…h-project#4353) * Make PushDownContext support caching the built request builder Signed-off-by: Heng Qian <[email protected]> * Move PushDownContext outside of the AbstractCalciteIndexScan Signed-off-by: Heng Qian <[email protected]> * Refactor cost computing Signed-off-by: Heng Qian <[email protected]> * migrate explain IT from json format to yaml format Signed-off-by: Heng Qian <[email protected]> * Fix opensearch request reuse by equivalent operator Signed-off-by: Heng Qian <[email protected]> * fix IT after merging main Signed-off-by: Heng Qian <[email protected]> * remove plans assert in execution IT Signed-off-by: Heng Qian <[email protected]> * Merging main Signed-off-by: Heng Qian <[email protected]> * Add UT for cost evaluation of AbstractCalciteIndexScan Signed-off-by: Heng Qian <[email protected]> * merging main Signed-off-by: Heng Qian <[email protected]> * merging main Signed-off-by: Heng Qian <[email protected]> * merging main Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]> (cherry picked from commit 585a04c) Signed-off-by: Heng Qian <[email protected]>
…n context #4353 (#4399) * Enhance the cost computing mechanism and push down context (#4353) * Make PushDownContext support caching the built request builder Signed-off-by: Heng Qian <[email protected]> * Move PushDownContext outside of the AbstractCalciteIndexScan Signed-off-by: Heng Qian <[email protected]> * Refactor cost computing Signed-off-by: Heng Qian <[email protected]> * migrate explain IT from json format to yaml format Signed-off-by: Heng Qian <[email protected]> * Fix opensearch request reuse by equivalent operator Signed-off-by: Heng Qian <[email protected]> * fix IT after merging main Signed-off-by: Heng Qian <[email protected]> * remove plans assert in execution IT Signed-off-by: Heng Qian <[email protected]> * Merging main Signed-off-by: Heng Qian <[email protected]> * Add UT for cost evaluation of AbstractCalciteIndexScan Signed-off-by: Heng Qian <[email protected]> * merging main Signed-off-by: Heng Qian <[email protected]> * merging main Signed-off-by: Heng Qian <[email protected]> * merging main Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]> (cherry picked from commit 585a04c) Signed-off-by: Heng Qian <[email protected]> * fix compiling Signed-off-by: Heng Qian <[email protected]> * fix IT Signed-off-by: Heng Qian <[email protected]> --------- Signed-off-by: Heng Qian <[email protected]>
Description
This PR includes changes:
computeSelfCostfromestimateRowCount. IncomputeSelfCost, we've implemented more proper cost computing mechanism rather than relying on rows count only.The new cost computing mechanism has advantages in several cases:
Scan(PushDownContext[LIMIT, PROJECT])will get smaller cost thanScan(PushDownContext[PROJECT, LIMIT])although they produce the same request. With this PR, they two will get the equivilent cost. And then the optimizer will finally choose the plan with push down sequence more close to the original logical plan.Scan(PushDownContext[AGG, PROJECT]) - Aggregatealthough the 2 agg could be merged. With this PR, the optimizer will choose the scan with a merged agg pushed down.https://github.com/opensearch-project/sql/pull/4279#discussion_r2354020132with several scripts in our agg builder, the optimizer will still choose it since scirpt count won't affect the cost of the plan. This PR will address it.Related Issues
Resolves #4312
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.