Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Nov 26, 2025

Description

Remove count aggregation for sort-on-measure case

Related Issues

Resolves #4862

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Standardized aggregation result sorting to use consistent metrics instead of custom aliases, improving sort order reliability in aggregated queries.
    • Optimized sub-aggregation handling to eliminate unnecessary default aggregations when not needed.
  • Tests

    • Updated test expectations to reflect aggregation sorting improvements and sub-aggregation optimizations.

✏️ Tip: You can customize this high-level summary in your review settings.

@LantaoJin
Copy link
Member Author

cc @noCharger

@peterzhuamazon
Copy link
Member

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

Walkthrough

This pull request implements a feature to remove count aggregations when sorting on aggregate measures. Test expectations are updated across multiple YAML files to reflect changes from custom count aliases (e.g., count(), c, PageViews) to the internal _count metric. Core logic in AggPushDownAction.java is modified to conditionally use BucketOrder.count() when no sub-aggregations exist and skip attaching empty sub-aggregation collections.

Changes

Cohort / File(s) Summary
Test resource path updates
integ-test/src/test/java/org/opensearch/sql/calcite/big5/PPLBig5IT.java
Resource paths updated to reference big5/ subdirectory for expected plan and query files
big5 expected outputs
integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms.yaml, keyword_terms_low_cardinality.yaml, multi_terms_keyword.yaml
Aggregation sort orders changed from custom field/alias names to _count metric; nested value_count sub-aggregations removed
clickbench expected outputs - count sort changes
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml, q13.yaml, q15.yaml, q16.yaml, q19.yaml, q35.yaml, q36.yaml, q38.yaml, q40.yaml, q42.yaml
Terms/multi-terms aggregation sort orders updated from count() or field aliases to _count
clickbench expected outputs - additional changes
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q17.yaml, q34.yaml, q37.yaml, q39.yaml, q41.yaml
Aggregation sort orders changed from custom aliases (e.g., PageViews, c) to _count; nested sub-aggregations simplified or removed; q22.yaml includes pushdown filter binding adjustments; q37.yaml includes timestamp literal format updates
calcite explain expected outputs
integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure1.yaml, explain_agg_sort_on_measure3.yaml, explain_agg_sort_on_measure_multi_terms.yaml, explain_agg_sort_on_measure_script.yaml
Sort order keys updated from custom aliases (count(), cnt) to _count metric
calcite explain formatting
integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_terms_script.yaml
Formatting adjustment with no functional changes
Aggregation pushdown logic
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java
rePushDownSortAggMeasure() now conditionally uses BucketOrder.count(asc) when sub-aggregations are empty; attachSubAggregations() skips creating metricBuilder and attaching when subAggregations collection is non-empty only

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Homogeneous pattern: Changes across test YAML files follow a consistent pattern (custom count aliases → _count), reducing per-file review complexity
  • Isolated logic change: Single source file modification with straightforward conditional logic (no complex interdependencies or state management)
  • Low risk: Aggregation sort key updates are mechanical and well-contained; no control flow branches added

Areas for attention:

  • Verify that the conditional check in rePushDownSortAggMeasure() correctly identifies when sub-aggregations are present before selecting BucketOrder.count(asc) vs. BucketOrder.aggregation(path, asc)
  • Confirm that skipping empty sub-aggregations in attachSubAggregations() does not inadvertently remove necessary default behaviors in edge cases

Poem

🐰 Hop, hop! The count aggregations hop away,
No more aliases clouding the measure's day.
_count shines clear, the metrics ring true,
Simplify sorting—we've cleansed the view! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.08% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: removing count aggregations for sort-on-aggregate-measure cases throughout the codebase.
Linked Issues check ✅ Passed All code changes directly address the objective from issue #4862 to remove count aggregations for sort-on-aggregate-measure cases.
Out of Scope Changes check ✅ Passed All changes are scoped to removing count aggregations for sort-on-aggregate-measure; no unrelated modifications detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b88bf56 and dbfdf56.

📒 Files selected for processing (26)
  • integ-test/src/test/java/org/opensearch/sql/calcite/big5/PPLBig5IT.java (5 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms_low_cardinality.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q16.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q17.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q34.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q35.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q36.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure1.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure3.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_terms.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_terms_script.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_script.yaml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java (2 hunks)
🔇 Additional comments (27)
integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_script.yaml (1)

12-12: Test expectation correctly updated to use internal _count metric.

The aggregation ordering has been updated to use _count instead of count(), which aligns with the PR objective to remove extraneous count aggregations for sort-on-aggregate measure cases. The order specification [{"_count":"asc"},{"_key":"asc"}] is consistent across test files.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml (1)

14-14: Multi-terms aggregation ordering correctly switched to _count metric.

The ordering key for the multi_terms aggregation has been updated from PageViews to _count, eliminating the custom count alias in favor of the internal metric. This maintains the DESC sort direction while using the standard push-down mechanism.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml (1)

12-12: Multi-terms aggregation ordering correctly updated to use _count.

The order specification for the multi_terms aggregation has been updated to [{"_count":"desc"},{"_key":"asc"}], removing the custom count alias and using the internal _count metric consistently with the PR's optimization objective.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml (1)

12-12: Count alias correctly replaced with internal _count metric.

The aggregation order has been updated from the custom alias c to the internal _count metric. The order specification [{"_count":"desc"},{"_key":"asc"}] properly reflects the optimization while maintaining DESC sort semantics.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q35.yaml (1)

13-13: Terms aggregation ordering correctly updated to use _count.

The order specification for the terms aggregation has been updated from {"c":"desc"} to {"_count":"desc"}, properly replacing the custom alias with the internal metric while preserving the DESC sort semantics.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml (1)

12-12: Terms aggregation correctly updated from count() to _count metric.

The aggregation order has been updated from the count() alias to the internal _count metric. The order specification [{"_count":"desc"},{"_key":"asc"}] properly reflects the push-down optimization for sort-on-aggregate scenarios.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml (1)

16-16: Multi-terms aggregation with scripts correctly updated to use _count.

The aggregation ordering has been updated to use the _count metric instead of the PageViews custom alias, while preserving the complex multi-terms structure with script-based field derivations. The order direction remains consistent with the logical plan.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q17.yaml (1)

11-11: Multi-terms aggregation correctly simplified and ordering updated to _count.

The aggregation has been optimized by removing unnecessary inner aggregations (value_count on _index) and updating the order specification to use _count. This aligns with the PR's objective to eliminate redundant count aggregations in sort-on-aggregate scenarios.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q16.yaml (1)

11-11: LGTM! Test expectation updated correctly.

The aggregation order now uses the internal _count metric instead of the count() alias, aligning with the PR objective to remove separate count aggregations for sort-on-measure cases.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q34.yaml (1)

11-11: LGTM! Consistent test expectation update.

The URL terms aggregation now uses _count for ordering instead of the custom alias c, consistent with the broader changes across this PR.

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure3.yaml (1)

12-12: LGTM! Date histogram ordering updated correctly.

The date_histogram now uses _count instead of the cnt alias for ordering, maintaining consistency with the terms aggregation updates in other test files.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml (1)

12-12: LGTM! Multiple push-down updates coordinated correctly.

The SearchPhrase terms aggregation ordering now uses _count, and the filter bindings have been updated to reflect adjusted projection positions in the push-down context.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml (1)

12-12: LGTM! Aggregation structure simplified.

The SearchPhrase terms aggregation now leverages the internal _count metric, eliminating the need for a separate value_count aggregation.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml (1)

14-14: LGTM! PageViews ordering migrated to _count.

The URL terms aggregation now uses _count for ordering instead of the PageViews alias, consistent with the pattern across all updated test expectations.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q36.yaml (1)

13-13: LGTM! Final test expectation updated consistently.

The ClientIP terms aggregation ordering now uses _count instead of the c alias, completing the uniform update across all test expectations.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java (2)

314-322: Conditional sub-aggregation attachment implemented correctly.

This change complements the bucket order selection at lines 117-119:

  • When subAggregations is empty, no sub-aggregations are built or attached (relies on built-in _count)
  • When sub-aggregations exist, they are attached with a count aggregation added if missing (lines 318-320)

The comment at line 318 correctly explains that the count aggregator may be eliminated by doc_count optimization and needs to be restored when used as a sort key.


114-119: I'm encountering persistent issues with repository cloning in the sandbox environment, which prevents me from executing the verification scripts. This prevents me from confirming or refuting the concern about whether deriving the asc flag from only the first collation is always sufficient for sort-on-measure scenarios.

Given this limitation, I'll provide the rewritten review comment based on the current state:


The concern about first collation direction derivation requires manual verification.

The logic appropriately distinguishes between two cases:

  • No sub-aggregations: Uses BucketOrder.count(asc) to leverage OpenSearch's built-in _count metric
  • Sub-aggregations present: Uses BucketOrder.aggregation(path, asc) to sort by the computed aggregate

This aligns with the PR objective to eliminate separate count aggregations when sorting by count.

Verify the assumption at Line 114: The asc flag is derived from only the first collation (collations.get(0).getDirection()). Confirm that:

  1. Multiple collations are never used in sort-on-measure scenarios, or
  2. The first collation direction is always the relevant one for determining bucket order

Check test cases with multiple ORDER BY clauses in sort-on-measure queries to ensure this assumption holds.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml (1)

14-14: Aggregation ordering correctly switched to _count.

The multi_terms aggregation now uses _count for primary sort instead of the measure alias, aligning with the PR objective to remove count aggregations for sort-on-aggregate cases.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml (1)

12-12: Terms aggregation correctly uses _count ordering.

The terms aggregation on URL field now uses _count for sorting, consistent with the PR objective.

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_terms.yaml (1)

11-11: Multi-terms aggregation correctly uses _count with ascending order.

The aggregation sort order correctly reflects the ascending direction for the count metric, maintaining consistency with push-down context.

integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms_low_cardinality.yaml (1)

11-11: Terms aggregation correctly removes inner sub-aggregation and uses _count.

The removal of the country value_count sub-aggregation and switch to _count ordering aligns with the PR objective to eliminate unnecessary count aggregations for sort-on-measure cases.

integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms.yaml (1)

11-11: Terms aggregation correctly updated to use _count ordering.

Consistent with other big5 test expectations and the PR objective.

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure1.yaml (1)

11-11: Terms aggregation correctly uses _count with ascending order for explain test.

Consistent with the push-down context and ascending sort requirement.

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_terms_script.yaml (1)

12-12: Multi-terms aggregation with script fields correctly uses _count ordering.

The change properly maintains script field functionality while switching to _count ordering.

integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml (1)

12-12: Multi-terms aggregation correctly uses _count with descending order.

Consistent ordering pattern across all test expectations and aligned with the PR objective.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml (1)

12-12: LGTM! Test expectation correctly updated to use _count ordering.

The change from PageViews-based ordering to _count-based ordering in the terms aggregation aligns with the PR objective to remove redundant count aggregations for sort-on-measure cases. The use of OpenSearch's internal _count metric is the correct approach.

integ-test/src/test/java/org/opensearch/sql/calcite/big5/PPLBig5IT.java (1)

56-424: Based on my analysis, I cannot directly verify the resource files due to repository access limitations. However, I can provide an assessment based on the available information:

Evidence supporting the changes are correct:

  1. Systematic and consistent pattern: All 45 test methods follow the identical refactoring pattern (adding big5/ prefix)
  2. PR context: The original AI summary explicitly mentions related resource changes in integ-test/src/test/resources/expectedOutput/calcite/big5/
  3. No logic changes: Only path strings are updated; test assertions and logic remain unchanged
  4. Official workload: Web search confirms Big5 is an official OpenSearch benchmark workload
  5. Mechanical correctness: All file path updates are syntactically and structurally correct

Assessment:
The code changes are mechanically sound and demonstrate proper resource refactoring. The path updates are consistent and follow a clear organizational pattern. Since I cannot directly access the repository to verify file existence, test execution in CI will definitively confirm whether all referenced YAML files exist at their new locations. Given the systematic nature of these changes and the PR context, the refactoring appears legitimate.


Resource reorganization is correctly implemented with consistent path updates.

All 45 test methods have been updated to reference YAML files in the new big5/ subdirectory. The changes are systematic, syntactically correct, and align with the PR's resource reorganization objective. Test execution will validate that all referenced files exist at their new paths.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 27, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@qianheng-aws qianheng-aws merged commit 0ab2ba2 into opensearch-project:main Nov 27, 2025
43 of 47 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4867-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0ab2ba276ead0c9a1d382542e9f2078a2a73ef4a
# Push it to GitHub
git push --set-upstream origin backport/backport-4867-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4867-to-2.19-dev.

LantaoJin added a commit to LantaoJin/search-plugins-sql that referenced this pull request Nov 27, 2025
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Nov 27, 2025
LantaoJin added a commit that referenced this pull request Nov 27, 2025
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. clickbench enhancement New feature or request pushdown pushdown related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Remove count aggregation for sort on aggregate measure

4 participants