Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Dec 23, 2025

Description

Support enumerable TopK, check #4982 for issue details.

  • EnumerableTopKConverterRule
    • Convert LogicalSort with fetch to CalciteEnumerableTopK
  • EnumerableTopKMergeRule
    • Merge EnumerableLimit and EnumerableSort to CalciteEnumerableTopK

The CalciteEnumerableTopK is derived from EnumerableLimitSort which has the corrected cost computation.

Related Issues

Resolves #4982

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added optimized query execution for limit and sort operations, which are now merged into a single efficient operation for faster query results.
  • Bug Fixes

    • Fixed potential stack overflow issues in limit operation pushdown handling.
  • Refactor

    • Improved physical query plan optimization for better execution performance.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

This PR introduces an optimizer rule to fuse EnumerableLimit and EnumerableSort operators into a single EnumerableLimitSort operator, preventing full in-memory sorting of large datasets. It adds new rule infrastructure, a custom top-K operator, and protective guards against stack overflow in pushdown logic.

Changes

Cohort / File(s) Summary
New TopK Optimization Classes
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java, opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java
Introduces CalciteEnumerableTopK extending EnumerableLimitSort with custom cost reduction for TopK operations, and EnumerableTopKMergeRule that matches and transforms EnumerableLimit followed by EnumerableSort patterns into the new operator.
Planner Rule Registration
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
Registers ENUMERABLE_TOP_K_MERGE_RULE into the planner's rule set after RARE_TOP_PUSH_DOWN.
Pushdown Logic Updates
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Adds digest duplication guard in pushDownLimit() to prevent stack overflow when identical limit digests are encountered.
PushDown Context Enhancement
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
Adds public method containsDigestOnTop(Object digest) to check if the most recently added operation has a matching digest.
Expected Physical Plan Updates
integ-test/src/test/resources/expectedOutput/calcite/*.yaml, integ-test/src/test/resources/expectedOutput/calcite/*.json
~40 test output files updated to reflect new physical plan structure where EnumerableLimit followed by EnumerableSort sequences are replaced by EnumerableLimitSort operators across various query patterns (aggregations, joins, dedup, streamstats, etc.).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • sql#4830: Modifies OpenSearch planner's pushdown and scan machinery with changes to CalciteLogicalIndexScan and planner rules, sharing code-level overlap in query optimization infrastructure.

Suggested labels

bugFix, clickbench

Suggested reviewers

  • ps48
  • penghuo
  • kavithacm
  • derek-ho
  • joshuali925
  • GumpacG
  • Swiddis

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Support enumerable TopK' is clear and directly describes the main feature being added to support TopK enumerable operations.
Description check ✅ Passed The description clearly relates to the changeset by explaining the TopK support feature, referencing issue #4982, and detailing the two new rules being introduced.
Linked Issues check ✅ Passed The PR implements the core objective from issue #4982 by merging EnumerableLimit and EnumerableSort into CalciteEnumerableTopK to prevent full in-memory sorting and reduce memory usage.
Out of Scope Changes check ✅ Passed All changes directly support TopK enumerable operations: new CalciteEnumerableTopK class, new EnumerableTopKMergeRule, integration into rule lists, and multiple test expectation file updates reflecting the optimized physical plans.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba8d9df and 940fc4d.

📒 Files selected for processing (1)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)

42-43: LGTM! Rule declaration and implementation are correct.

The EnumerableTopKMergeRule class is properly implemented with correct operand matching logic. The rule successfully merges EnumerableLimit + EnumerableSort into CalciteEnumerableTopK, which applies cost optimization (0.99 multiplier) to favor the TopK approach and avoid full sort materialization. Integration test coverage is present in CalciteExplainIT.testTopKThenSortExplain() validating the TopK+Sort transformation behavior.


63-63: No issues with the current placement of ENUMERABLE_TOP_K_MERGE_RULE.

The rules operate at different abstraction levels: RareTopPushdownRule targets logical nodes (LogicalFilterLogicalProjectCalciteLogicalIndexScan), while EnumerableTopKMergeRule targets enumerable nodes (EnumerableLimitEnumerableSort). These rules have no pattern matching overlap and execute at different stages in Calcite's optimization pipeline. Rule ordering between them is natural and determined by the pipeline, not a concern for optimization effectiveness.

Likely an incorrect or invalid review comment.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java (1)

142-145: Address the TODO comment and add JavaDoc.

The method logic is correct, but:

  1. The TODO comment "check on adding" is vague and unclear about what needs to be checked and when.
  2. Per coding guidelines, public methods should have proper JavaDoc documentation.
🔎 Proposed fix
-  // TODO check on adding
-  public boolean containsDigestOnTop(Object digest) {
+  /**
+   * Check if the most recently added operation has a digest equal to the provided digest.
+   *
+   * @param digest The digest object to compare against the top operation
+   * @return true if the top operation's digest equals the provided digest, false otherwise
+   */
+  public boolean containsDigestOnTop(Object digest) {
     return this.queue.peekLast() != null && this.queue.peekLast().digest().equals(digest);
   }
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)

20-21: Add class-level JavaDoc.

Per coding guidelines, public classes should have proper JavaDoc. Consider expanding the single-line comment into a full JavaDoc explaining the TopK optimization purpose and when this node is used.

Suggested JavaDoc
-/** The different between this and {@link EnumerableLimitSort} is the cost. */
+/**
+ * A physical relational node representing a TopK operation with optimized cost computation.
+ *
+ * <p>This node extends {@link EnumerableLimitSort} but computes a lower cost to encourage
+ * the planner to prefer fusing separate Limit and Sort operators into this single TopK node,
+ * thereby reducing memory usage during execution.
+ */
 public class CalciteEnumerableTopK extends EnumerableLimitSort {
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java (1)

28-42: Add a copy() override to CalciteEnumerableTopK to preserve the custom type.

When Calcite's planning engine clones relational nodes, the parent EnumerableLimitSort.copy() method will be invoked. Without an override, this returns an EnumerableLimitSort instance instead of CalciteEnumerableTopK, losing the custom computeSelfCost() implementation. Following the pattern used in LogicalSystemLimit and other custom Calcite nodes in this codebase, override copy() to return a new CalciteEnumerableTopK instance with the updated parameters.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbcdbd6 and 2a88ae6.

📒 Files selected for processing (16)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java (2)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Sort.java (1)
  • Sort (16-48)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
  • CalciteEnumerableTopK (21-54)
🔇 Additional comments (16)
integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml (1)

11-14: LGTM!

The test expectation correctly reflects the Limit + Sort fusion into EnumerableLimitSort. The sort key (sort0=[$5]), direction (dir0=[ASC]), and fetch limit (fetch=[10000]) are properly combined into a single operator, which aligns with the PR objective of reducing memory overhead by avoiding full in-memory sorts.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)

420-429: LGTM!

The guard logic correctly prevents stack overflow by checking if an identical LimitDigest already exists at the top of the pushdown context. This is a sound fix for the recursive pushdown issue, and the precomputed digest variable is properly reused when adding to the context.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml (1)

18-40: LGTM!

The test expectation correctly reflects the plan restructuring:

  • EnumerableLimitSort fuses the limit and sort operations
  • EnumerableMergeJoin with 3-key condition requires the corresponding 3-key sorts on both inputs
  • The inner branch projections ([gender, age]) are appropriately minimized for the join/aggregation logic

The plan structure is consistent with the TopK optimization objective.

integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml (1)

16-27: LGTM!

The test expectation correctly reflects the multisearch physical plan with EnumerableMergeUnion. The LIMIT->5 appears twice in each PushDownContext (before and after PROJECT), which is consistent with limit pushdown occurring at multiple stages. The search filter terms (A, B for the first source; E, F for the second) are correctly represented in the query builders.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml (1)

16-31: LGTM!

The test expectation correctly reflects the TopK optimization with EnumerableLimitSort fusing the limit and sort. The EnumerableMergeJoin with its required sorted inputs and the inner branch's optimized projections ([gender] and [gender, age]) are appropriately structured.

integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json (1)

4-4: LGTM!

The test expectation correctly shows EnumerableLimitSort(sort0=[$0], dir0=[ASC-nulls-first], fetch=[100]) replacing the previous EnumerableSort(fetch=[100]). This fusion is consistent with the PR objective of combining limit and sort into a single TopK operator for improved memory efficiency.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (1)

13-15: LGTM!

This is the key test expectation for the PR's main objective (issue #4982). The EnumerableLimitSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[25]) directly addresses the OOM issue by maintaining only the top 25 results in memory during sorting, rather than sorting all data first and then limiting. This pattern is correct and will prevent the memory exhaustion observed in Clickbench Q28.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)

14-17: LGTM! Physical plan correctly reflects TopK fusion.

The expected output properly shows the optimization: the inner EnumerableLimit(fetch=[25]) + EnumerableSort pattern is fused into a single EnumerableLimitSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[25]), while preserving the outer system limit (EnumerableLimit(fetch=[10000])). This aligns with the PR objective to reduce memory usage by avoiding full in-memory sorts.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)

44-46: LGTM! Rule registration follows existing patterns.

The new ENUMERABLE_TOP_K_RULE follows the established pattern for rule instantiation and naming conventions (UPPER_SNAKE_CASE for constants).


66-68: Verify rule ordering is intentional.

The ENUMERABLE_TOP_K_RULE is placed before EXPAND_COLLATION_ON_PROJECT_EXPR. Please confirm this ordering is intentional, as the TopK transformation should occur before collation expansion to ensure proper optimization.

integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml (1)

15-17: Verify semantic equivalence of operator reordering.

The physical plan now has EnumerableLimit(fetch=[10000]) above EnumerableCalc, whereas the previous structure had them in reverse order. While this may be a valid optimization, please verify that limiting before calculating (projection) is semantically equivalent in this correlated subquery context—especially since the projection changes the column ordering (id, name, salary from name, id, salary).

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml (1)

11-14: LGTM! Correctly demonstrates Limit+Sort fusion.

The expected output properly reflects the TopK optimization: EnumerableLimit(fetch=[10000]) + EnumerableSort(sort0=[$5], dir0=[ASC]) are fused into EnumerableLimitSort(sort0=[$5], dir0=[ASC], fetch=[10000]). The downstream window operators and index scan remain unchanged.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml (1)

11-14: LGTM! Consistent with TopK fusion pattern.

The expected output correctly shows the Limit+Sort operators fused into a single EnumerableLimitSort while preserving downstream window operators and the index scan context.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java (2)

17-26: LGTM! Well-documented rule implementation.

The class-level JavaDoc clearly explains the transformation purpose. The rule correctly extends InterruptibleRelRule and follows the established pattern in this codebase.


51-58: Rule predicate correctly identifies TopK candidates.

The predicate ensures the rule only matches LogicalSort nodes that have:

  1. A fetch or offset (making it a "top" operation)
  2. Non-empty field collations (actual ordering is specified)

This correctly filters for genuine TopK scenarios.

integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml (1)

11-15: LGTM! Consistent TopK fusion pattern.

The expected output correctly demonstrates the Limit+Sort fusion into EnumerableLimitSort while preserving the windowing and index scan operators downstream.

yuancu
yuancu previously approved these changes Dec 24, 2025
Copy link
Collaborator

@yuancu yuancu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LantaoJin
Copy link
Member Author

LantaoJin commented Dec 25, 2025

Let's suspend on merging until #4992 merged

* CalciteEnumerableTopK}.
*/
@Value.Enclosing
public class EnumerableTopKRule extends InterruptibleRelRule<EnumerableTopKRule.Config> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need copy the license from Calcite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think. it's not 100% copy

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java (1)

192-195: Clarify or resolve the TODO comment.

The TODO comment "check on adding" is vague and doesn't explain what verification is needed. Consider either:

  1. Expanding the comment to describe what should be checked (e.g., "TODO: Consider checking digest equality during add() to prevent duplicates earlier")
  2. Removing it if no action is required

Additionally, consider adding JavaDoc for this public method to document its purpose and when it should be used versus containsDigest(Object).

🔎 Suggested improvement
-  // TODO check on adding
+  /**
+   * Check if the most recently pushed operation has the specified digest.
+   * This is useful for preventing duplicate consecutive operations.
+   *
+   * @param digest The digest to compare against the top operation
+   * @return true if the last operation's digest equals the provided digest
+   */
   public boolean containsDigestOnTop(Object digest) {
     return this.queue.peekLast() != null && this.queue.peekLast().digest().equals(digest);
   }
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)

439-443: LGTM! Consider clarifying the comment.

The defensive guard against duplicate consecutive limit digests is a sound fix. Creating the digest once and reusing it is efficient.

The comment "avoid stack overflow" could be more descriptive about the root cause. Consider explaining the scenario that triggers recursive pushdown attempts:

🔎 Suggested comment improvement
         LimitDigest digest = new LimitDigest(limit, offset);
         if (pushDownContext.containsDigestOnTop(digest)) {
-          // avoid stack overflow
+          // Avoid infinite recursion: if the same limit was just pushed, the planner
+          // may re-invoke this rule on the same node. Return null to break the cycle.
           return null;
         }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a88ae6 and 58012e9.

📒 Files selected for processing (9)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (58)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: WhiteSource Security Check
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (9)
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (1)

13-15: LGTM! TopK optimization correctly reflected in expected output.

The physical plan now uses EnumerableLimitSort to fuse the limit and sort operations into a single TopK operator, which aligns perfectly with the PR objective to avoid OOM by preventing full in-memory sorting of all rows.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)

15-17: LGTM! Consistent TopK optimization across test cases.

The expected output correctly shows the EnumerableLimitSort fusion, demonstrating that the TopK optimization applies consistently across different query patterns.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)

42-43: LGTM! New TopK rule properly declared.

The ENUMERABLE_TOP_K_RULE follows the established pattern for rule declarations in this class and uses the standard configuration approach.


63-63: LGTM! Rule positioning appears intentional.

The rule is placed after RARE_TOP_PUSH_DOWN and before EXPAND_COLLATION_ON_PROJECT_EXPR, which allows rare/top pushdown operations to complete before applying the TopK optimization. This ordering supports the goal of preventing OOM by ensuring TopK is applied at the appropriate stage of query optimization.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)

444-449: Verify test coverage for the EnumerableTopKRule integration path with this limit pushdown logic.

The code change correctly reuses the precomputed digest object, and pushDownLimit itself has comprehensive test coverage across multiple test classes. However, tests for the new EnumerableTopKRule integration are missing. Consider adding tests that specifically verify:

  1. The EnumerableTopKRule optimization path exercising this pushdown logic
  2. The scenario where duplicate limits would have caused issues (as mentioned in the PR objective)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml (1)

12-12: The composite source ordering correctly reflects the logical plan.

The state ordering has been updated to "desc" which correctly matches the logical plan's dir1=[DESC-nulls-last] for the state field ($3), and gender remains "asc" matching dir0=[ASC-nulls-first].

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (1)

12-12: Ordering difference between primary and alternative expected outputs is intentional and represents non-deterministic physical plan generation.

The alternative expected output file is correctly set up to handle non-deterministic ordering of script_fields (new_state/new_gender vs new_gender/new_state) during physical plan generation. The test code explicitly loads and compares both primary and alternative files using assertYamlEqualsIgnoreId, confirming that both orderings are valid and acceptable. This pattern is consistently applied across all 8 alternative expected output files in the calcite directory, indicating that HashMap-based collections during plan generation can produce either ordering non-deterministically. The test structure is correct as-is.

integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml (2)

17-17: Cosmetic formatting change to OpenSearchRequestBuilder.

The startFrom=0 parameter placement has been reformatted. This is a cosmetic change with no functional impact.


9-10: The concern about operator ordering is based on a misinterpretation of the query plan tree structure. In Calcite plans, indentation indicates parent-child relationships. EnumerableCalc is the parent (root) operator, and EnumerableLimit is a child operator beneath it. Data flows from the leaf (IndexScan) upward through Sort and Limit before reaching Calc at the root. This means Limit actually executes before the projection, filtering to 10,000 rows first and then projecting the age column—which is the optimal order. This operator arrangement is consistent across eight similar test files in the same directory and matches the expected behavior of the TopK optimization that pushes the balance sort (fetch=5) to the index scan.

Likely an incorrect or invalid review comment.

CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"asc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false},"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"desc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false},"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change {"field":"state.keyword","missing_bucket":false,"order":"asc"} to{"field":"state.keyword","missing_bucket":false,"order":"desc"}

CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"asc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false},"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"desc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false},"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

yuancu
yuancu previously approved these changes Dec 26, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml (1)

12-14: Physical plan transformation is correct; add edge case tests for fetch boundary values.

The transformation from separate EnumerableLimit+EnumerableSort to EnumerableLimitSort correctly preserves:

  • Sort column reference ($1 = cnt field)
  • Fetch limit (10000)
  • Downstream operators (EnumerableCalc, CalciteEnumerableIndexScan)
  • PushDownContext for aggregation pushdown

This aligns with the PR objective to reduce memory usage via TopK optimization.

However, test coverage lacks boundary conditions for fetch limits. While the test suite comprehensively covers various aggregation scenarios and sort directions (ASC-nulls-first, DESC-nulls-last), it consistently uses fetch=[10000]. Add tests with:

  • Small fetch values (e.g., fetch=[1], fetch=[10])
  • Edge cases (fetch=[0], fetch exceeding result size)
  • Missing sort direction variant (ASC-nulls-last)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58012e9 and ba8d9df.

📒 Files selected for processing (24)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
🧰 Additional context used
📓 Path-based instructions (2)
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
🧬 Code graph analysis (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java (3)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Sort.java (1)
  • Sort (16-48)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java (1)
  • Value (19-56)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
  • CalciteEnumerableTopK (21-54)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java (1)
  • Value (22-66)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
  • CalciteEnumerableTopK (21-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (30)
integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml (1)

22-31: LGTM! The expected output correctly reflects the TopK optimization.

The physical plan now starts with EnumerableLimitSort (line 22), which properly fuses the limit and sort operations instead of using separate EnumerableLimit + EnumerableSort operators. This aligns with the PR's objective to reduce memory usage by using a TopK operator.

The plan structure is consistent with the logical plan:

  • fetch=[10000] matches LogicalSystemLimit
  • Sort directions and columns match LogicalSort
  • Downstream operators (aggregate, calc, merge join, index scans) are preserved correctly

Based on learnings, this change correctly tests the SQL optimization path for Calcite integration.

integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml (1)

21-35: LGTM! Physical plan correctly demonstrates the TopK optimization.

The expected output properly reflects the fusion of EnumerableLimit + EnumerableSort into a single EnumerableLimitSort at the top of the physical plan. The sort keys, directions, and fetch limit (10000) are consistent with the logical plan. The inner EnumerableSort operators (lines 27, 30) are appropriately preserved for the EnumerableMergeJoin which requires sorted inputs.

This test case is representative—covering multiple group keys, left joins, window functions, and aggregation pushdowns—and aligns with the PR objective of reducing memory usage for limit+sort patterns. Based on learnings, this validates the SQL generation and optimization paths for Calcite integration changes.

integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml (1)

11-12: LGTM! Correct TopK optimization transformation.

The physical plan correctly consolidates the separate limit and sort operations from the logical plan into a single EnumerableLimitSort operator. The sort columns ($0, $1), directions (ASC-nulls-first), and fetch limit (10000) accurately match the logical plan specification.

integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml (1)

11-12: LGTM! Correct TopK optimization with different sort columns.

The physical plan correctly transforms the nested limit-over-sort pattern into a unified EnumerableLimitSort operator. The sort parameters (sort0=[$0], sort1=[$2]) properly reflect the logical plan, providing complementary test coverage for different column index combinations.

integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml (1)

12-13: LGTM! Correct single-column TopK optimization.

The physical plan correctly applies the EnumerableLimitSort optimization for a single-column sort scenario. The parameters (sort0=[$1], dir0=[ASC-nulls-first], fetch=[10000]) accurately match the logical plan, and this test case provides valuable coverage for temporal aggregations with auto_date_histogram alongside the TopK transformation.

integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml (1)

14-21: LGTM! Physical plan correctly reflects the TopK optimization.

The consolidation of limit and sort into EnumerableLimitSort with fetch=[10000] is correct. The plan maintains:

  • Proper correlation handling via EnumerableCorrelate with $cor0
  • Correct pushdown of filter and limit to the inner scan (10000 subsearch limit)
  • Accurate projection of [name, id, salary] for the outer worker scan

The test expectation is realistic and representative of the correlated EXISTS subquery pattern. Based on learnings, this correctly tests the SQL optimization path for Calcite integration changes.

integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml (1)

14-22: LGTM! Correctly represents IN correlated subquery with TopK optimization.

The plan structure is well-formed:

  • EnumerableLimitSort at line 16 consolidates limit and sort as intended
  • The additional EnumerableCalc at line 17 correctly implements the IN condition ($t0=$t3) between outer and inner values
  • Inner scan properly pushes FILTER and LIMIT->10000 to OpenSearch

Test data covers the IN subquery edge case distinct from EXISTS, ensuring comprehensive test coverage for the optimization.

integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml (1)

14-20: LGTM! Correctly differentiates uncorrelated EXISTS from correlated pattern.

The use of EnumerableNestedLoopJoin(condition=[true]) instead of EnumerableCorrelate is semantically correct for uncorrelated subqueries. Key observations:

  • The join type inner with condition=[true] implements EXISTS correctly - if inner produces rows, the cross join succeeds
  • EnumerableAggregate(group=[{0}]) ensures single-row output for the EXISTS check
  • EnumerableLimitSort consolidation at line 15 applies the TopK optimization as intended

Test coverage is comprehensive with this file covering the uncorrelated EXISTS edge case alongside the correlated variants in other files.

integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml (1)

12-13: LGTM! Physical plan correctly reflects the TopK optimization.

The physical plan now uses the fused EnumerableLimitSort operator instead of the two-step EnumerableLimitEnumerableSort sequence. This aligns with the PR objective to reduce memory usage by applying TopK directly rather than sorting all results before limiting. The sort key ($7 = @timestamp), direction (DESC-nulls-last), and fetch limit (10000) are all preserved correctly.

Based on learnings, ensure that the corresponding test execution validates both the correctness of the results and the optimization path.

integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml (1)

13-17: LGTM! Test expectation correctly reflects the TopK optimization.

The updated physical plan accurately demonstrates the PR's optimization goal:

  • Line 14 introduces EnumerableLimitSort, which combines the previously separate limit and sort operations into a single TopK operator
  • The sort field ($2 = salary), direction (DESC-nulls-last), and fetch limit (10000) are correctly preserved from the logical plan
  • The operator placement is correct: positioned between the EnumerableCalc projection and the EnumerableHashJoin to sort the join results before the final projection
  • This optimization reduces memory usage by avoiding a full in-memory sort before applying the limit, directly addressing the OOM issues described in issue #4982

Based on learnings, verify that the SQL generation and optimization paths work correctly end-to-end with this new operator.

integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml (1)

21-30: Transformation is correct; edge case coverage confirmed.

The conversion from EnumerableLimit + EnumerableSort to EnumerableLimitSort correctly preserves the sort keys, directions, and fetch limit while consolidating the operations. The test data is realistic (OTEL logs) and the schema is consistent.

The DESC-nulls-last edge case with small fetch values from issue #4982 is already covered: q28.yaml and q29.yaml both contain EnumerableLimitSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[25]). This test file appropriately covers the ASC multi-key sorting variant.

integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml (1)

12-13: LGTM: Correct TopK optimization applied.

The transformation from separate EnumerableLimit and EnumerableSort operators to a single EnumerableLimitSort correctly implements the TopK optimization described in the PR objectives. The fused operator preserves all parameters (sort on column $1, ASC-nulls-first direction, fetch=10000) and maintains the CalciteEnumerableIndexScan child with its aggregation pushdown context intact. This should reduce memory consumption by avoiding a full in-memory sort before applying the limit.

Based on learnings, this test validates the Calcite optimization path for the TopK transformation.

integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml (2)

22-22: LGTM! EnumerableLimitSort correctly fuses limit and sort operations.

The physical plan correctly applies the TopK optimization by replacing the separate LogicalSystemLimit + LogicalSort pattern (lines 3-4) with a single EnumerableLimitSort operator. This aligns with the PR objective to prevent OOM by avoiding full in-memory sorts when fetch limits are present.


23-39: Physical plan structure is consistent and test coverage is comprehensive.

The downstream operators correctly preserve the logical plan semantics:

  • Line 36 explicitly handles the IS NOT NULL filter condition matching the logical plan (line 15)
  • Join, aggregation, and window function operators are properly structured
  • Index scan pushdown contexts appropriately filter and project required fields

The test provides good coverage with null handling, multi-key sorting, joins, aggregations, and window functions.

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml (1)

11-12: LGTM!

The expected output correctly reflects the TopK optimization: the separate EnumerableLimit and EnumerableSort operators are fused into a single EnumerableLimitSort with the appropriate sort key, direction, and fetch limit preserved. The underlying CalciteEnumerableIndexScan and its PushDownContext remain unchanged, which is expected behavior.

integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml (1)

22-39: LGTM!

The physical plan correctly demonstrates the TopK fusion at the top level with EnumerableLimitSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[10000]). The restructured plan maintains semantic equivalence while enabling the memory-efficient TopK operation. Both CalciteEnumerableIndexScan nodes retain their respective PushDownContext configurations.

integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml (1)

22-35: LGTM!

The expected output follows the same TopK fusion pattern as other timechart tests, correctly replacing the limit+sort sequence with EnumerableLimitSort. The plan structure maintains proper semantics for the count-based timechart query.

integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json (1)

4-4: LGTM!

The JSON expected output correctly reflects the TopK fusion: EnumerableLimitSort now directly wraps CalciteEnumerableIndexScan with the aggregation pushdown preserved in the PushDownContext. This aligns with the optimization objective.

integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml (1)

9-16: LGTM!

This test correctly demonstrates the TopK-then-sort optimization pattern. The inner TopK (fetch=5, sorted by balance) is pushed down to OpenSearch via PushDownContext, while the outer sort (by age) with system limit (fetch=10000) uses EnumerableLimitSort. This is the expected behavior for queries requiring a secondary sort after initial TopK retrieval.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)

42-45: LGTM!

The new TopK rules follow the established pattern used by other rules in this class. The Config.DEFAULT.toRule() instantiation is consistent with existing rule declarations.


65-66: Verify the rule ordering is intentional.

The TopK rules are placed after RARE_TOP_PUSH_DOWN and before EXPAND_COLLATION_ON_PROJECT_EXPR. Since Calcite rule ordering can affect optimization outcomes, please confirm this placement is intentional and that the converter rule should fire before the merge rule in the optimization sequence.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java (3)

15-23: LGTM!

The class follows the established InterruptibleRelRule pattern used by other OpenSearch planning rules. The protected constructor and @Value.Enclosing annotation are correctly applied for Immutables integration.


25-33: Implementation correctly merges Limit + Sort into TopK.

The rule properly extracts the collation from EnumerableSort and the offset/fetch from EnumerableLimit, then creates a CalciteEnumerableTopK using the sort's input. This correctly bypasses both original operators.


35-55: Config correctly specifies the operand pattern.

The predicate sort.fetch == null ensures this rule only matches EnumerableSort nodes without their own fetch limit, which is the correct pattern for merging a separate EnumerableLimit with an underlying full sort. The oneInput constraint properly enforces the parent-child relationship.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java (3)

17-27: LGTM!

The class structure and JavaDoc clearly describe the rule's purpose: converting LogicalSort nodes with fetch/offset and non-empty collation directly into CalciteEnumerableTopK. This complements EnumerableTopKMergeRule by handling cases where the limit is already part of the LogicalSort rather than a separate EnumerableLimit.


29-43: Convention conversion is handled correctly.

The convert() call properly transforms the input to EnumerableConvention before creating the CalciteEnumerableTopK. This ensures the physical operator receives properly converted children.


45-65: Config predicate correctly identifies TopK candidates.

The predicate (sort.fetch != null || sort.offset != null) && !sort.collation.getFieldCollations().isEmpty() ensures:

  1. The sort has limit semantics (fetch or offset present)
  2. The sort has actual ordering (non-empty collation)

This correctly filters out pure sorts without limits and pure limits without ordering.

integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml (1)

12-13: Transformation correctly implements TopK optimization.

The physical plan correctly merges LogicalSystemLimit and LogicalSort into a single EnumerableLimitSort operator. Parameters match precisely:

  • Sort field: $1 (cnt field)
  • Sort direction: ASC-nulls-first
  • Fetch limit: 10000

This aligns with the PR objective to reduce memory consumption through TopK semantics. Edge cases are well-covered across the test suite: EnumerableLimitSort is used in 30 test files where both limit and sort are present; separate EnumerableSort appears in queries without limits; and standalone EnumerableLimit appears in queries without sort requirements.

integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml (2)

17-22: Test data is valid and properly formatted for string comparison.

The YAML file uses a literal block scalar (pipe | syntax) where the SORT->[{...}] configuration is treated as plain text, not parsed as YAML structure. The test framework's assertYamlEqualsIgnoreId() method performs string comparison via assertEquals() after normalizing line breaks and IDs—it does not parse the embedded JSON-like syntax.

The multi-line SORT configuration is correct:

  • SORT->[{"firstname" : {"order" : "asc", "missing" : "_last"}}] matches the OpenSearchRequestBuilder sort specification
  • Indentation and format are consistent with similar working test files (e.g., asc_sort_timestamp.yaml, desc_sort_timestamp.yaml)
  • The test method testExplainSortPassThroughJoinThenPushdown() successfully compares this expected output against actual query results

No changes required—the test data is realistic, matches the expected schema, and correctly represents the join sort passthrough scenario.


13-14: Merge join optimization correctly implements sort pushdown; plan shows expected architecture.

The transformation from EnumerableLimit + EnumerableSort to EnumerableLimitSort aligns with the PR's TopK optimization. The switch to EnumerableMergeJoin is valid: both inputs are correctly sorted by their join keys—left by $13 (initial) and right by $15 (firstname)—as shown by the PushDownContext entries.

The right side PushDownContext explicitly includes SORT->[{"firstname":{"order":"asc","missing":"_last"}}] and the OpenSearch request builder reflects this sort directive. Similar merge join sort pushdown scenarios are validated in CalcitePPLJoinIT.testSimpleSortPushDownForSMJ and related tests with actual result verification, confirming the pattern works as intended.

yuancu
yuancu previously approved these changes Dec 26, 2025
Comment on lines +44 to +45
private static final EnumerableTopKMergeRule ENUMERABLE_TOP_K_MERGE_RULE =
EnumerableTopKMergeRule.Config.DEFAULT.toRule();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ENUMERABLE_TOP_K_RULE be eliminated with ENUMERABLE_TOP_K_MERGE_RULE?

It seems the former converts limit+sort from logical to enumerable top-k, while the latter converts limit+sort from enumerable ones to enumerable top-k. But the former will be converted to limit+sort enumerable plan if without the converter rule. The merge rule seem to cover this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No before #4992 , let me try it now, will remove it if no explain case fails

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 4992, removing EnumerableTopKConverterRule works now.

Signed-off-by: Lantao Jin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Clickbench PPL q28 is easily OOM due to sort all data in memory

3 participants