Skip to content

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Nov 6, 2025

Description

Add sort performance enhancement of pushing down sort by complex expressions to scan.

Related Issues

Resolves #3912

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@yuancu yuancu added enhancement New feature or request backport 2.19-dev labels Nov 7, 2025
project.getInput().getTraitSet().getTrait(RelCollationTraitDef.INSTANCE);

// Branch 1: Check if complex expressions are already sorted by scan
if (handleComplexExpressionsSortedByScan(call, project, toTraits, toCollation)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some IT and ExplainIT for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of sort expression pushdown ITs are for this case.

Copy link
Member

@LantaoJin LantaoJin Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: which one for example?
and can this predicate move to definition of ExpandCollationOnProjectExprRule.Config (can reduce node copy)

Signed-off-by: Songkan Tang <[email protected]>
LantaoJin
LantaoJin previously approved these changes Nov 12, 2025

// Check if the scan has sort expressions pushed down
if (scan.getPushDownContext().stream()
.noneMatch(operation -> operation.type() == PushDownType.SORT_EXPR)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the pushed SORT(i.e. SORT_EXPR with all digests are simple expression) can satisfy sort collation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible. The cost computation has coarse results by multiplying field count with 1.1. So VolcanoPlanner will prefer regular field sort pushdown.

Another option is to calculate the accurate complex expression count and multiply this count with 1.1. Put a check in the rule to not pushdown if all of expressions are simple expressions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, in this query ... | sort a | eval b = a + 1 | sort b, when pushing the sort b, this method will always return false. But I expect it return true.

Copy link
Contributor Author

@songkant-aws songkant-aws Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it returns false in this case but doesn't matter. Now, if all of expressions are simple, it will go through regular pushDownSort. Field sort exists in regular traitset and is propagated well. It will be handled by handleSimpleExpressionFieldSorts method in ExpandCollationOnProjectExprRule.

Integer offsetValue = LimitIndexScanRule.extractOffsetValue(sort.offset);
newScan = (CalciteLogicalIndexScan) scan.pushDownLimit(sort, limitValue, offsetValue);
} else {
// Attempt to push down sort expressions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still push down sort although scanProvidesRequiredCollation is true? I think we could just remove the sort directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if first sort by [a, b] expressions is already in scan. The second sort by [a] expression could narrow down the sort effort. We could still allow the pushdown.

For topK pushdown, it could be also applied. If sort by [a, b], limit 10 is pushed, say the second sort by [a], limit 2 comes in, we should allow it pushdown as well. This may need replace existing SORT_EXPR and LIMIT pushdown context.

But right now, it has the problem that multiple different sort complex expressions doesn't work well yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sort by [a] expression could narrow down the sort effort. We could still allow the pushdown.

Shall we do the same thing for the above branch then? It only push down limit when scanProvidesRequiredCollation is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisit the logic, for already pushed down topK and if the scan collation satisfies the output collation, we can still leave the old sort in the scan(remove the new sort in the meantime). It's kind of keeping semantics. For regular sort pushdown, we can override by the new sort.

OSRequestBuilderAction action =
requestBuilder -> {
for (SortExprDigest info : sortExprDigests) {
requestBuilder.pushDownSortExpression(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Will this method throw exception by any chance? We plan to make the request builder transformation process lazy. We should put any logic which may throw exception outside of the lambda function, so it can avoid failure on the final plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, move most of logic that could throw exception outside. And use a lazy pushdownSortSuppliers method instead.

sortExprDigest.getDirection().name()));

Script script = scriptExpr.getScript();
if (script != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if script is null? Seems nothing happens. We better add assertion or throw exception if that's never possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should never happen. I neglect this check.

case SORT_EXPR -> {
@SuppressWarnings("unchecked")
List<SortExprDigest> sortKeys = (List<SortExprDigest>) operation.digest();
dCpu += NumberUtil.multiply(dRows, 1.1 * sortKeys.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should filter the simple expr.

Copy link
Contributor Author

@songkant-aws songkant-aws Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, it will only count complex expr. The calculation is fine grained.

Signed-off-by: Songkan Tang <[email protected]>
RexNode sortExpr = digest.getExpression();
assert sortExpr instanceof RexCall : "sort expression should be RexCall";
Map<String, Object> directionParams = new LinkedHashMap<>();
directionParams.put(PlanUtils.NULL_DIRECTION, digest.getNullDirection().name());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these 2 params be combined into 1 like MISSING_MIN or MISSING_MAX?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a blocking comment. We have chance to do this when resolving conflict with #4795. These 2 PR both pass parameter and should have conflict. Let's see which will be merged first.

if (value == null) {
boolean isAscending = direction == Direction.ASCENDING;
boolean isNullFirst = nullDirection == NullDirection.FIRST;
return isAscending == isNullFirst ? Double.NEGATIVE_INFINITY : Double.NaN;
Copy link
Collaborator

@qianheng-aws qianheng-aws Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be Double.POSITIVE_INFINITY for Double.NaN?

Copy link
Contributor Author

@songkant-aws songkant-aws Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Double's constant definition,
Double.POSITIVE_INFINITY is equal to the value of Double.longBitsToDouble(0x7ff0000000000000L).
Double.NaN is equal to the value of Double.longBitsToDouble(0x7ff8000000000000L).

compareTo() treats NaN is larger. However operator of comparisons like '>', '=', '<' always return false if comparing with NaN because NaN is not a number.

I searched the sorting order online. It indicates most of SQL engine like PostgreSQL will sort NaN as the largest value for Double in ascending order.

Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking for this PR:
can you test the sort pushdown on join condition such on a.id + 1 = b.id +2?

@LantaoJin LantaoJin merged commit e83f741 into opensearch-project:main Nov 18, 2025
66 of 67 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4750-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e83f7413f5924dc5b9f91ea79bc72aed337a3a33
# Push it to GitHub
git push --set-upstream origin backport/backport-4750-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4750-to-2.19-dev.

@songkant-aws
Copy link
Contributor Author

@LantaoJin It doesn't support sort expr pushdown for SortMergeJoin yet.

SortMergeJoin is optimized by EnumerableMergeJoinRule, which is a physical plan level rule. It will inserts collation to left and right children of join and then sort is transformed to EnumerableSort.

Our rules mostly optimize logical plans. Considering it's a valid case, I think one more enhancement is to add physical plan optimization support(Enumerable convention optimization) as well. I will create an issue to track this enhancement.

songkant-aws added a commit to songkant-aws/sql that referenced this pull request Nov 18, 2025
* Pushdown sort expression to scan

Signed-off-by: Songkan Tang <[email protected]>

* Refactor a bit of SortExprIndexScanRule

Signed-off-by: Songkan Tang <[email protected]>

* Simplify some code and add more explain tests

Signed-off-by: Songkan Tang <[email protected]>

* Support null ordering for sort expression pushdown

Signed-off-by: Songkan Tang <[email protected]>

* Attempt to fix security tests and add more test cases

Signed-off-by: Songkan Tang <[email protected]>

* Address comments

Signed-off-by: Songkan Tang <[email protected]>

* Address more comments

Signed-off-by: Songkan Tang <[email protected]>

---------

Signed-off-by: Songkan Tang <[email protected]>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Nov 19, 2025
LantaoJin pushed a commit that referenced this pull request Nov 19, 2025
…) (#4818)

* Pushdown sort by complex expressions to scan (#4750)

* Pushdown sort expression to scan

Signed-off-by: Songkan Tang <[email protected]>

* Refactor a bit of SortExprIndexScanRule

Signed-off-by: Songkan Tang <[email protected]>

* Simplify some code and add more explain tests

Signed-off-by: Songkan Tang <[email protected]>

* Support null ordering for sort expression pushdown

Signed-off-by: Songkan Tang <[email protected]>

* Attempt to fix security tests and add more test cases

Signed-off-by: Songkan Tang <[email protected]>

* Address comments

Signed-off-by: Songkan Tang <[email protected]>

* Address more comments

Signed-off-by: Songkan Tang <[email protected]>

---------

Signed-off-by: Songkan Tang <[email protected]>

* Fix compilation after merge conflicts

Signed-off-by: Songkan Tang <[email protected]>

* Fix tests

Signed-off-by: Songkan Tang <[email protected]>

---------

Signed-off-by: Songkan Tang <[email protected]>
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
* Pushdown sort expression to scan

Signed-off-by: Songkan Tang <[email protected]>

* Refactor a bit of SortExprIndexScanRule

Signed-off-by: Songkan Tang <[email protected]>

* Simplify some code and add more explain tests

Signed-off-by: Songkan Tang <[email protected]>

* Support null ordering for sort expression pushdown

Signed-off-by: Songkan Tang <[email protected]>

* Attempt to fix security tests and add more test cases

Signed-off-by: Songkan Tang <[email protected]>

* Address comments

Signed-off-by: Songkan Tang <[email protected]>

* Address more comments

Signed-off-by: Songkan Tang <[email protected]>

---------

Signed-off-by: Songkan Tang <[email protected]>
@songkant-aws songkant-aws deleted the sort-expression-pushdown branch December 11, 2025 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Calcite Engine Framework: pushdown sort with non-reference expression

4 participants