Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Oct 20, 2025

Description

Pushdown sort aggregate metrics:

  • Add SORT_AGG_METRICS after AGGREGATION in pushdown context
  • Convert composite bucket agg to terms/histo/date-histo agg if sort one metric on single bucket
  • Convert composite bucket agg (all terms source) to multi-terms agg if sort one metric on multiple buckets

Related Issues

Resolves #4282

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@LantaoJin LantaoJin added the enhancement New feature or request label Oct 21, 2025
Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
return newTraitSet;
}

public CalciteLogicalIndexScan pushDownSortAggregateMetrics(Sort sort) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only do this operation when bucket_nullable=false, right? Any place to check this parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Will it be better to put these checks in one place if possible? Now the prerequisites are split both here and in aggAction.pushDownSortAggMetrics, although they are both aggregationBuilders and some checking seems overlapped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot get the bucket_nullable information here. Aggregation has already pushed down. So AggPushDownAction.L97 and L126 would be better IMO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I see you've already get the pushed aggregation builder here.

 List<AggregationBuilder> aggregationBuilders =
          pushDownContext.getAggPushDownAction().getAggregationBuilder().getLeft();

Or otherwise L290-310 could all be moved into aggAction.pushDownSortAggMetrics.

It's also fine to me if not making change since the logic is correct anyway.

Copy link
Member Author

@LantaoJin LantaoJin Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I don't want always to check CompositeValuesSourceBuilder.missingBucket() here since not all ValuesSourceBuilders are impacted by bucket_nullable, for example date histogram. So moving to here has to check types, sounds not worth to check types twice.

Signed-off-by: Lantao Jin <[email protected]>
@LantaoJin LantaoJin changed the title Pushdown sort aggregate metrics Pushdown sort aggregate metrics for single bucket Oct 24, 2025
return newTraitSet;
}

public CalciteLogicalIndexScan pushDownSortAggregateMetrics(Sort sort) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Will it be better to put these checks in one place if possible? Now the prerequisites are split both here and in aggAction.pushDownSortAggMetrics, although they are both aggregationBuilders and some checking seems overlapped.

@LantaoJin LantaoJin changed the title Pushdown sort aggregate metrics for single bucket Pushdown sort aggregate metrics Oct 24, 2025
// count agg optimized, get the path name from field names
path = fieldNames.get(collations.get(0).getFieldIndex());
} else if (metric instanceof ValuesSourceAggregationBuilder.LeafOnly) {
path = metric.getName();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit, non-blocking] Could this be combined with above branch? i.e. Does path = fieldNames.get(collations.get(0).getFieldIndex()); also works for metric instanceof ValuesSourceAggregationBuilder.LeafOnly?

List<String> fieldNames,
CompositeAggregationBuilder composite) {
String path;
AggregationBuilder metric = composite.getSubAggregations().stream().findFirst().orElse(null);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only support only 1 agg metric for this enhancement? And anywhere to check this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catching. added restricted checking.

@LantaoJin LantaoJin enabled auto-merge (squash) October 24, 2025 08:47
@LantaoJin LantaoJin merged commit 5ebed84 into opensearch-project:main Oct 24, 2025
35 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4603-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5ebed8465dc9d498602a25b802207d63ee5030c8
# Push it to GitHub
git push --set-upstream origin backport/backport-4603-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4603-to-2.19-dev.

LantaoJin added a commit to LantaoJin/search-plugins-sql that referenced this pull request Oct 24, 2025
* convert sort aggregate metrics to term sort

Signed-off-by: Lantao Jin <[email protected]>

* refactor

Signed-off-by: Lantao Jin <[email protected]>

* fix conflicts

Signed-off-by: Lantao Jin <[email protected]>

* fix conflicts2

Signed-off-by: Lantao Jin <[email protected]>

* Add more javadoc

Signed-off-by: Lantao Jin <[email protected]>

* address comments

Signed-off-by: Lantao Jin <[email protected]>

* delete incorrect comments

Signed-off-by: Lantao Jin <[email protected]>

* convert composite agg to multi-terms agg for sort metrics on multiple buckets

Signed-off-by: Lantao Jin <[email protected]>

* avoid the case of 'stat count, sum ... | sort count'

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
(cherry picked from commit 5ebed84)
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Oct 24, 2025
yuancu pushed a commit that referenced this pull request Oct 27, 2025
* Pushdown sort aggregate metrics (#4603)

* convert sort aggregate metrics to term sort

Signed-off-by: Lantao Jin <[email protected]>

* refactor

Signed-off-by: Lantao Jin <[email protected]>

* fix conflicts

Signed-off-by: Lantao Jin <[email protected]>

* fix conflicts2

Signed-off-by: Lantao Jin <[email protected]>

* Add more javadoc

Signed-off-by: Lantao Jin <[email protected]>

* address comments

Signed-off-by: Lantao Jin <[email protected]>

* delete incorrect comments

Signed-off-by: Lantao Jin <[email protected]>

* convert composite agg to multi-terms agg for sort metrics on multiple buckets

Signed-off-by: Lantao Jin <[email protected]>

* avoid the case of 'stat count, sum ... | sort count'

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
(cherry picked from commit 5ebed84)

* fix conflicts

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
asifabashar added a commit to asifabashar/sql that referenced this pull request Oct 28, 2025
* default-main: (34 commits)
  Enhance dynamic source clause to support only metadata filters (opensearch-project#4554)
  Make nested alias type support referring to outer context (opensearch-project#4673)
  Update big5 ppl queries and check plans (opensearch-project#4668)
  Support push down sort after limit (opensearch-project#4657)
  Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670)
  Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621)
  Fix bin nested fields issue (opensearch-project#4606)
  Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531)
  Pushdown sort aggregate metrics (opensearch-project#4603)
  Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648)
  Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646)
  Allow renaming group-by fields to existing field names (opensearch-project#4586)
  Publish internal modules separately for downstream reuse (opensearch-project#4484)
  Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643)
  Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599)
  Replace all dots in fields of table scan's PhysType (opensearch-project#4633)
  Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629)
  Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623)
  Pushdown case function in aggregations as range queries (opensearch-project#4400)
  Update GEOIP function to support IP types as input (opensearch-project#4613)
  ...

# Conflicts:
#	docs/user/ppl/functions/conversion.rst
asifabashar added a commit to asifabashar/sql that referenced this pull request Oct 28, 2025
* default-main: (34 commits)
  Enhance dynamic source clause to support only metadata filters (opensearch-project#4554)
  Make nested alias type support referring to outer context (opensearch-project#4673)
  Update big5 ppl queries and check plans (opensearch-project#4668)
  Support push down sort after limit (opensearch-project#4657)
  Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670)
  Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621)
  Fix bin nested fields issue (opensearch-project#4606)
  Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531)
  Pushdown sort aggregate metrics (opensearch-project#4603)
  Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648)
  Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646)
  Allow renaming group-by fields to existing field names (opensearch-project#4586)
  Publish internal modules separately for downstream reuse (opensearch-project#4484)
  Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643)
  Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599)
  Replace all dots in fields of table scan's PhysType (opensearch-project#4633)
  Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629)
  Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623)
  Pushdown case function in aggregations as range queries (opensearch-project#4400)
  Update GEOIP function to support IP types as input (opensearch-project#4613)
  ...

Signed-off-by: Asif Bashar <[email protected]>
expani pushed a commit to vinaykpud/sql that referenced this pull request Nov 4, 2025
* convert sort aggregate metrics to term sort

Signed-off-by: Lantao Jin <[email protected]>

* refactor

Signed-off-by: Lantao Jin <[email protected]>

* fix conflicts

Signed-off-by: Lantao Jin <[email protected]>

* fix conflicts2

Signed-off-by: Lantao Jin <[email protected]>

* Add more javadoc

Signed-off-by: Lantao Jin <[email protected]>

* address comments

Signed-off-by: Lantao Jin <[email protected]>

* delete incorrect comments

Signed-off-by: Lantao Jin <[email protected]>

* convert composite agg to multi-terms agg for sort metrics on multiple buckets

Signed-off-by: Lantao Jin <[email protected]>

* avoid the case of 'stat count, sum ... | sort count'

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Push down bucket aggregation with sorting by metrics to multi-terms aggregation

3 participants