Skip to content

Conversation

@ishaoxy
Copy link
Contributor

@ishaoxy ishaoxy commented Dec 10, 2025

Description

Just use aggregateWithTrimming() to avoid duplicating existing functionality.

Related Issues

Resolves #4925

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Xinyu Hao <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

📝 Walkthrough

Summary by CodeRabbit

  • Refactor
    • Optimized window-function aggregation handling in streaming statistics queries by consolidating aggregation logic and simplifying the query execution plan structure.
    • Refined query optimization to improve performance for streamstats operations, reducing unnecessary intermediate computation steps while maintaining consistent results.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

This change consolidates window-function aggregation logic in the Streamstats command by replacing a duplicate aggregation implementation with a call to the existing aggregateWithTrimming() method. The removal of buildAggCallsForWindowFunctions eliminates redundant code and unifies aggregation handling. All related test expectations are updated to reflect the restructured logical plan.

Changes

Cohort / File(s) Change Summary
Core aggregation logic refactor
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Replaces direct aggregation handling for window functions with a call to aggregateWithTrimming(). Removes duplicate buildAggCallsForWindowFunctions helper method. Eliminates manual AggCall construction and validation logic, consolidating aggregation responsibility into the existing utility.
Streamstats YAML test expectations (Calcite)
integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global.yaml, explain_streamstats_global_null_bucket.yaml, explain_streamstats_reset.yaml, explain_streamstats_reset_null_bucket.yaml
Updates aggregation column references (e.g., AVG($8) → AVG($0)) and restructures projection chains. Introduces intermediate LogicalProject(age=[$8]) layers between aggregates and filters, adjusting the order of field propagation through the logical plan.
Streamstats YAML test expectations (Calcite no pushdown)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_global.yaml, explain_streamstats_global_null_bucket.yaml, explain_streamstats_reset.yaml, explain_streamstats_reset_null_bucket.yaml
Mirrors changes from Calcite variant: updates aggregation inputs (AVG($8) → AVG($0)), introduces intermediate projection layers for age extraction, and restructures field propagation paths prior to index scans.
Streamstats Java test expectations
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java
Updates test assertions for logical plan structure and Spark SQL translation. Changes column references in aggregations (MAX($0), AVG($0)), adjusts projection ordering, updates intermediate table aliases (e.g., t2 → t3), and refines boundary conditions around stream sequence and nullability checks to align with restructured projections.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Core logic consolidation: The main change is straightforward—removing duplicate aggregation code in favor of existing utility. However, understanding why column references and projection layering shift requires careful tracing of the aggregateWithTrimming() behavior.
  • Test expectation updates: Multiple files (8 YAML files + 1 Java test) require verification that changes are consistent across all variants (Calcite, Calcite no-pushdown, PPL).
  • Projection restructuring: The introduction of intermediate LogicalProject layers and shift in column indices (e.g., $8 → $0) across numerous test assertions should be validated to confirm alignment with the new aggregation path.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: replacing duplicated aggregation logic with the aggregateWithTrimming() function.
Description check ✅ Passed The description adequately explains the change and references the related issue #4925, providing clear context about using aggregateWithTrimming() to avoid duplication.
Linked Issues check ✅ Passed The code changes align with issue #4925's objective: aggregation logic for window functions in buildStreamWindowJoinPlan is refactored to use aggregateWithTrimming() instead of the duplicate buildAggCallsForWindowFunctions implementation.
Out of Scope Changes check ✅ Passed All changes are within scope: the Java file modification directly addresses the issue, and the YAML test output files are expected updates reflecting the refactored aggregation behavior.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7fc5f5 and 822e04c.

📒 Files selected for processing (10)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_global.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_global_null_bucket.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_reset.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_reset_null_bucket.yaml (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (4 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java
**/ppl/**/*.java

⚙️ CodeRabbit configuration file

**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases

  • Check AST generation for new syntax
  • Ensure corresponding AST builder classes are updated
  • Validate edge cases and boundary conditions

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java
🧠 Learnings (3)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset.yaml
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_reset.yaml
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_global.yaml
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_reset_null_bucket.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (11)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

1831-1832: Reuse of aggregateWithTrimming for stream window aggregation looks correct

Delegating the right-side stream window aggregation to aggregateWithTrimming with an empty group list and hintBucketNonNull=false matches how stats/rare/chart aggregations are handled and avoids duplicated logic. This should preserve (or improve) behavior while ensuring consistent trimming and decorrelator workarounds for the streamstats path as well. Based on learnings, this keeps CalciteRelNodeVisitor aligned with existing Calcite integration patterns.

Please double-check streamstats cases that use count(field) windows, since the shared doc_count-style null-filter optimization in aggregateWithTrimming will now also apply there; existing and/or new tests should confirm the intended semantics.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml (1)

9-13: Expected logical plan correctly reflects trimmed aggregation

The switch to avg_age=[AVG($0)] with an intervening LogicalProject(age=[$8]) before the filter is consistent with aggregateWithTrimming introducing a trimming projection on the right side of the correlate. Indexing and predicates still reference the same fields, so the updated expected plan looks correct.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_reset.yaml (1)

10-15: Reset streamstats no-pushdown plan aligned with shared aggregation logic

The added LogicalProject(age=[$8]) feeding LogicalAggregate(group=[{}], avg_age=[AVG($0)]) matches the new trimming-based aggregation path. The reset frame and segment filters are still applied on the full projection below, so the updated expected plan is structurally consistent with the refactor.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_global.yaml (1)

9-13: Global streamstats (no-pushdown) expected plan matches trimmed AVG input

The logical plan now projects age=[$8] and aggregates avg_age=[AVG($0)], which is consistent with the trimming projection introduced by aggregateWithTrimming. Correlate keys and filter predicates remain unchanged, so this expectation looks correct.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global.yaml (1)

9-13: Global streamstats Calcite logical plan correctly updated for trimming

Using AVG($0) over a preceding LogicalProject(age=[$8]) is the expected shape when aggregateWithTrimming is applied on the correlate’s right side. The additional projection cleanly feeds the aggregate without altering the filter semantics, so the new expected output is consistent.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_global_null_bucket.yaml (1)

9-13: Null-bucket global streamstats (no-pushdown) plan aligned with new aggregation path

The aggregate now consuming $0 after a dedicated LogicalProject(age=[$8]) matches the shared trimming logic. Bucket-nullability handling is still driven by the filter conditions, so the updated expected logical plan is coherent with the refactor.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset.yaml (1)

10-15: Reset streamstats Calcite logical plan updated consistently with trimming

Introducing LogicalProject(age=[$8]) beneath LogicalAggregate(group=[{}], avg_age=[AVG($0)]) is consistent with the new shared aggregation implementation. Reset flags, segment ID, and group-equality predicates are unaffected, so this expectation looks correct.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_streamstats_reset_null_bucket.yaml (1)

10-15: Reset + null-bucket no-pushdown plan matches shared aggregation semantics

The change to avg_age=[AVG($0)] with a preceding LogicalProject(age=[$8]) fits the new trimming-based aggregation approach. The reset and segment filters continue to operate on the full projection beneath, so this updated expected output is consistent with the refactored planning logic.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml (1)

10-15: Logical plan update correctly trims inputs before AVG(age)

The added LogicalProject(age=[$8]) plus avg_age=[AVG($0)] matches the aggregate-with-trimming pattern while keeping the filter’s use of $17 (stream seq), $20 (seg id), and $4 (gender) consistent with the underlying projection. The updated expected plan looks coherent with the new aggregation path.
Based on learnings, this keeps the Calcite explain output aligned with the optimized SQL generation path.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (2)

91-129: Windowed max(SAL) test now matches trimmed-aggregate logical plan

The new right-side correlate plan (LogicalAggregate(group=[{}], max(SAL)=[MAX($0)]) over LogicalProject(SAL=[$5])) correctly trims inputs so the aggregate always sees SAL at index 0, while the filter still uses $8 for __stream_seq__ and $7 for DEPTNO from the deeper project.

In the Spark SQL expectation, the lateral subquery alias t3 is referenced consistently in the outer select and at the subquery closing. Overall, the test accurately reflects the updated window-aggregation planning.
Based on learnings, this keeps the PPL-to-Calcite and Spark SQL generation paths in sync.


158-224: Reset-aware avg(SAL) test correctly adopts projection-before-aggregate shape

The reset-aware branch now uses LogicalAggregate(group=[{}], avg(SAL)=[AVG($0)]) atop LogicalProject(SAL=[$5]), with the filter still reading $8 (__stream_seq__), $11 (__seg_id__), and $7 (DEPTNO) from the underlying projection—indices line up with the projected schema.

The Spark SQL lateral subquery alias update to t5 is consistent between the select list and the subquery definition. The expected plan and SQL thus align with the new aggregateWithTrimming-driven implementation while preserving the reset and bucket semantics.
Based on learnings, this robustly tests the updated Calcite integration for streamstats with reset flags.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.0)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

[]


Comment @coderabbitai help to get the list of available commands and usage tips.

@ishaoxy ishaoxy marked this pull request as draft December 10, 2025 06:36
@yuancu yuancu added the bugFix label Dec 10, 2025
@ishaoxy ishaoxy marked this pull request as ready for review December 11, 2025 05:11
Comment on lines -1832 to -1833
List<AggCall> aggCalls = buildAggCallsForWindowFunctions(node.getWindowFunctionList(), context);
context.relBuilder.aggregate(context.relBuilder.groupKey(), aggCalls);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuancu @songkant-aws we'd better not to call context.relBuilder.aggregate directly, instead, use aggregateWithTrimming to build aggregation in stack. I see some codes in visitChart, rankByColumnSplit and visitPatterns. Can you create a issue to replace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or co-auther with @ishaoxy to address them in this PR.

Copy link
Collaborator

@yuancu yuancu Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did so because I needed to build multiple aggregations for the chart command. aggregateWithTrimming works on PPL's AST (composed of UnresolvedPlan) and creates RexNode AST, but I have already gone through this in the first aggregation. Therefore, I had to call relBuilder.aggregate the second time I created an aggregation to aggregate on RexNode.

@LantaoJin LantaoJin merged commit 35ca89e into opensearch-project:main Dec 11, 2025
38 of 40 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4926-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 35ca89e6ea626600393c6c21cbdb7fcea6a07d43
# Push it to GitHub
git push --set-upstream origin backport/backport-4926-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4926-to-2.19-dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Replace duplicated aggregation logic with aggregateWithTrimming()

3 participants