-
Notifications
You must be signed in to change notification settings - Fork 181
Pushdown sort by complex expressions to scan #4750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pushdown sort by complex expressions to scan #4750
Conversation
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteSortCommandIT.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteSortCommandIT.java
Outdated
Show resolved
Hide resolved
| project.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE); | ||
|
|
||
| // Branch 1: Check if complex expressions are already sorted by scan | ||
| if (handleComplexExpressionsSortedByScan(call, project, toTraits, toCollation)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some IT and ExplainIT for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of sort expression pushdown ITs are for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: which one for example?
and can this predicate move to definition of ExpandCollationOnProjectExprRule.Config (can reduce node copy)
...search/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Outdated
Show resolved
Hide resolved
...rch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/SortExpressionInfo.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/util/OpenSearchRelOptUtil.java
Show resolved
Hide resolved
Signed-off-by: Songkan Tang <[email protected]>
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java
Outdated
Show resolved
Hide resolved
|
|
||
| // Check if the scan has sort expressions pushed down | ||
| if (scan.getPushDownContext().stream() | ||
| .noneMatch(operation -> operation.type() == PushDownType.SORT_EXPR)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the pushed SORT(i.e. SORT_EXPR with all digests are simple expression) can satisfy sort collation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible. The cost computation has coarse results by multiplying field count with 1.1. So VolcanoPlanner will prefer regular field sort pushdown.
Another option is to calculate the accurate complex expression count and multiply this count with 1.1. Put a check in the rule to not pushdown if all of expressions are simple expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, in this query ... | sort a | eval b = a + 1 | sort b, when pushing the sort b, this method will always return false. But I expect it return true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it returns false in this case but doesn't matter. Now, if all of expressions are simple, it will go through regular pushDownSort. Field sort exists in regular traitset and is propagated well. It will be handled by handleSimpleExpressionFieldSorts method in ExpandCollationOnProjectExprRule.
| Integer offsetValue = LimitIndexScanRule.extractOffsetValue(sort.offset); | ||
| newScan = (CalciteLogicalIndexScan) scan.pushDownLimit(sort, limitValue, offsetValue); | ||
| } else { | ||
| // Attempt to push down sort expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still push down sort although scanProvidesRequiredCollation is true? I think we could just remove the sort directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking if first sort by [a, b] expressions is already in scan. The second sort by [a] expression could narrow down the sort effort. We could still allow the pushdown.
For topK pushdown, it could be also applied. If sort by [a, b], limit 10 is pushed, say the second sort by [a], limit 2 comes in, we should allow it pushdown as well. This may need replace existing SORT_EXPR and LIMIT pushdown context.
But right now, it has the problem that multiple different sort complex expressions doesn't work well yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second sort by [a] expression could narrow down the sort effort. We could still allow the pushdown.
Shall we do the same thing for the above branch then? It only push down limit when scanProvidesRequiredCollation is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisit the logic, for already pushed down topK and if the scan collation satisfies the output collation, we can still leave the old sort in the scan(remove the new sort in the meantime). It's kind of keeping semantics. For regular sort pushdown, we can override by the new sort.
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
Outdated
Show resolved
Hide resolved
| OSRequestBuilderAction action = | ||
| requestBuilder -> { | ||
| for (SortExprDigest info : sortExprDigests) { | ||
| requestBuilder.pushDownSortExpression( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Will this method throw exception by any chance? We plan to make the request builder transformation process lazy. We should put any logic which may throw exception outside of the lambda function, so it can avoid failure on the final plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, move most of logic that could throw exception outside. And use a lazy pushdownSortSuppliers method instead.
| sortExprDigest.getDirection().name())); | ||
|
|
||
| Script script = scriptExpr.getScript(); | ||
| if (script != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if script is null? Seems nothing happens. We better add assertion or throw exception if that's never possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should never happen. I neglect this check.
| case SORT_EXPR -> { | ||
| @SuppressWarnings("unchecked") | ||
| List<SortExprDigest> sortKeys = (List<SortExprDigest>) operation.digest(); | ||
| dCpu += NumberUtil.multiply(dRows, 1.1 * sortKeys.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should filter the simple expr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, it will only count complex expr. The calculation is fine grained.
Signed-off-by: Songkan Tang <[email protected]>
| RexNode sortExpr = digest.getExpression(); | ||
| assert sortExpr instanceof RexCall : "sort expression should be RexCall"; | ||
| Map<String, Object> directionParams = new LinkedHashMap<>(); | ||
| directionParams.put(PlanUtils.NULL_DIRECTION, digest.getNullDirection().name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these 2 params be combined into 1 like MISSING_MIN or MISSING_MAX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a blocking comment. We have chance to do this when resolving conflict with #4795. These 2 PR both pass parameter and should have conflict. Let's see which will be merged first.
| if (value == null) { | ||
| boolean isAscending = direction == Direction.ASCENDING; | ||
| boolean isNullFirst = nullDirection == NullDirection.FIRST; | ||
| return isAscending == isNullFirst ? Double.NEGATIVE_INFINITY : Double.NaN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be Double.POSITIVE_INFINITY for Double.NaN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Double's constant definition,
Double.POSITIVE_INFINITY is equal to the value of Double.longBitsToDouble(0x7ff0000000000000L).
Double.NaN is equal to the value of Double.longBitsToDouble(0x7ff8000000000000L).
compareTo() treats NaN is larger. However operator of comparisons like '>', '=', '<' always return false if comparing with NaN because NaN is not a number.
I searched the sorting order online. It indicates most of SQL engine like PostgreSQL will sort NaN as the largest value for Double in ascending order.
LantaoJin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking for this PR:
can you test the sort pushdown on join condition such on a.id + 1 = b.id +2?
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4750-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e83f7413f5924dc5b9f91ea79bc72aed337a3a33
# Push it to GitHub
git push --set-upstream origin backport/backport-4750-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
|
@LantaoJin It doesn't support sort expr pushdown for SortMergeJoin yet. SortMergeJoin is optimized by EnumerableMergeJoinRule, which is a physical plan level rule. It will inserts collation to left and right children of join and then sort is transformed to EnumerableSort. Our rules mostly optimize logical plans. Considering it's a valid case, I think one more enhancement is to add physical plan optimization support(Enumerable convention optimization) as well. I will create an issue to track this enhancement. |
* Pushdown sort expression to scan Signed-off-by: Songkan Tang <[email protected]> * Refactor a bit of SortExprIndexScanRule Signed-off-by: Songkan Tang <[email protected]> * Simplify some code and add more explain tests Signed-off-by: Songkan Tang <[email protected]> * Support null ordering for sort expression pushdown Signed-off-by: Songkan Tang <[email protected]> * Attempt to fix security tests and add more test cases Signed-off-by: Songkan Tang <[email protected]> * Address comments Signed-off-by: Songkan Tang <[email protected]> * Address more comments Signed-off-by: Songkan Tang <[email protected]> --------- Signed-off-by: Songkan Tang <[email protected]>
…) (#4818) * Pushdown sort by complex expressions to scan (#4750) * Pushdown sort expression to scan Signed-off-by: Songkan Tang <[email protected]> * Refactor a bit of SortExprIndexScanRule Signed-off-by: Songkan Tang <[email protected]> * Simplify some code and add more explain tests Signed-off-by: Songkan Tang <[email protected]> * Support null ordering for sort expression pushdown Signed-off-by: Songkan Tang <[email protected]> * Attempt to fix security tests and add more test cases Signed-off-by: Songkan Tang <[email protected]> * Address comments Signed-off-by: Songkan Tang <[email protected]> * Address more comments Signed-off-by: Songkan Tang <[email protected]> --------- Signed-off-by: Songkan Tang <[email protected]> * Fix compilation after merge conflicts Signed-off-by: Songkan Tang <[email protected]> * Fix tests Signed-off-by: Songkan Tang <[email protected]> --------- Signed-off-by: Songkan Tang <[email protected]>
* Pushdown sort expression to scan Signed-off-by: Songkan Tang <[email protected]> * Refactor a bit of SortExprIndexScanRule Signed-off-by: Songkan Tang <[email protected]> * Simplify some code and add more explain tests Signed-off-by: Songkan Tang <[email protected]> * Support null ordering for sort expression pushdown Signed-off-by: Songkan Tang <[email protected]> * Attempt to fix security tests and add more test cases Signed-off-by: Songkan Tang <[email protected]> * Address comments Signed-off-by: Songkan Tang <[email protected]> * Address more comments Signed-off-by: Songkan Tang <[email protected]> --------- Signed-off-by: Songkan Tang <[email protected]>
Description
Add sort performance enhancement of pushing down sort by complex expressions to scan.
Related Issues
Resolves #3912
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.