Skip to content

Conversation

@qianheng-aws
Copy link
Collaborator

Description

In our current implementation/final plan, we always push down a redundant filter of ISNOTNULL for group fields if params BUCKET_NULLABLE=false or the group field is time span. However, since we've set MISSING_BUCKET to false in the aggregation builder, we actually don't need this filter which will introduce more overhead especially the filter is pushed down to be a script.

This PR enhance the above case by adding a new rule, in which we detect the derived filter from bucket-non-null agg and then ignore them in the push down process.

Related Issues

Resolves #4811

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

CalciteLogicalIndexScan(table=[[OpenSearch, hits]])
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, hits]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 2},c=COUNT(),sum(IsRefresh)=SUM($1),avg(ResolutionWidth)=AVG($3)), SORT_AGG_METRICS->[2 DESC LAST], PROJECT->[c, sum(IsRefresh), avg(ResolutionWidth), WatchID, ClientIP], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"WatchID|ClientIP":{"multi_terms":{"terms":[{"field":"WatchID"},{"field":"ClientIP"}],"size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"c":"desc"},{"_key":"asc"}]},"aggregations":{"sum(IsRefresh)":{"sum":{"field":"IsRefresh"}},"avg(ResolutionWidth)":{"avg":{"field":"ResolutionWidth"}},"c":{"value_count":{"field":"_index"}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, hits]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0, 1},c=COUNT(),sum(IsRefresh)=SUM($2),avg(ResolutionWidth)=AVG($3)), PROJECT->[c, sum(IsRefresh), avg(ResolutionWidth), WatchID, ClientIP], SORT_AGG_METRICS->[0 DESC LAST], LIMIT->10, LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"WatchID|ClientIP":{"multi_terms":{"terms":[{"field":"WatchID"},{"field":"ClientIP"}],"size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"c":"desc"},{"_key":"asc"}]},"aggregations":{"sum(IsRefresh)":{"sum":{"field":"IsRefresh"}},"avg(ResolutionWidth)":{"avg":{"field":"ResolutionWidth"}},"c":{"value_count":{"field":"_index"}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the groupset and SORT_AGG_METRICS digest changed.

Copy link
Collaborator Author

@qianheng-aws qianheng-aws Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, the plan can only match the rule agg-filter-project-scan after being transformed by some calcite's transpose rule and project merge rule. So the index in group set is different from the original plan(i.e. logical plan).

After adding another new rule agg-project-filter-project-scan, the original plan can match this new one after trimming(it will add another project before scan), so the new digest has exactly the same ref index in the group set as logical plan.

But overall, though the aggregation digest is different, their generated request builders are exactly the same.

return rex instanceof RexCall rexCall
&& rexCall.getOperator().equals(SqlStdOperatorTable.IS_NOT_NULL)
&& rexCall.getOperands().get(0) instanceof RexInputRef;
&& rexCall.getOperator().equals(SqlStdOperatorTable.IS_NOT_NULL);
Copy link
Member

@LantaoJin LantaoJin Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be simplified to rex.isA(SqlKind.IS_NOT_NULL)

LantaoJin
LantaoJin previously approved these changes Nov 21, 2025
Signed-off-by: Heng Qian <[email protected]>
LantaoJin
LantaoJin previously approved these changes Nov 21, 2025
yuancu
yuancu previously approved these changes Nov 24, 2025
final LogicalFilter filter = call.rel(2);
final LogicalProject bottomProject = call.rel(3);
final CalciteLogicalIndexScan scan = call.rel(4);
boolean isBucketNullable = Config.isBucketNullableAgg.test(aggregate);
Copy link
Collaborator

@yuancu yuancu Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel it's a little confusing because it's semantics seem to be buket not nullable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. That's a typo. Will correct it.

Signed-off-by: Heng Qian <[email protected]>
@qianheng-aws qianheng-aws dismissed stale reviews from yuancu and LantaoJin via 18895e7 November 24, 2025 06:02
@LantaoJin LantaoJin merged commit ede63cb into opensearch-project:main Nov 24, 2025
35 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4843-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ede63cb7b0c4976c818d3dada2447ee1985d43b5
# Push it to GitHub
git push --set-upstream origin backport/backport-4843-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4843-to-2.19-dev.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4843-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ede63cb7b0c4976c818d3dada2447ee1985d43b5
# Push it to GitHub
git push --set-upstream origin backport/backport-4843-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4843-to-2.19-dev.

qianheng-aws added a commit to qianheng-aws/sql that referenced this pull request Nov 24, 2025
…pensearch-project#4843)

* Remove redundant push-down-filters derived for bucket-non-null aggregation

Signed-off-by: Heng Qian <[email protected]>

* Address comment

Signed-off-by: Heng Qian <[email protected]>

* Fix name typo

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>

(cherry picked from commit ede63cb)
Signed-off-by: Heng Qian <[email protected]>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Nov 24, 2025
yuancu pushed a commit that referenced this pull request Nov 25, 2025
…ket-non-null agg (#4843) (#4851)

* Remove redundant push-down-filters derived for bucket-non-null agg (#4843)

* Remove redundant push-down-filters derived for bucket-non-null aggregation

Signed-off-by: Heng Qian <[email protected]>

* Address comment

Signed-off-by: Heng Qian <[email protected]>

* Fix name typo

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>

(cherry picked from commit ede63cb)
Signed-off-by: Heng Qian <[email protected]>

* Fix compiling

Signed-off-by: Heng Qian <[email protected]>

* Fix IT

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
…pensearch-project#4843)

* Remove redundant push-down-filters derived for bucket-non-null aggregation

Signed-off-by: Heng Qian <[email protected]>

* Address comment

Signed-off-by: Heng Qian <[email protected]>

* Fix name typo

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. bugFix enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Push down redundant filter for time span

3 participants