-
Notifications
You must be signed in to change notification settings - Fork 180
Support enumerable TopK #4993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support enumerable TopK #4993
Conversation
Signed-off-by: Lantao Jin <[email protected]>
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis PR introduces an optimizer rule to fuse Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.java📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Files:
⚙️ CodeRabbit configuration file
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
🔇 Additional comments (2)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java (1)
142-145: Address the TODO comment and add JavaDoc.The method logic is correct, but:
- The TODO comment "check on adding" is vague and unclear about what needs to be checked and when.
- Per coding guidelines, public methods should have proper JavaDoc documentation.
🔎 Proposed fix
- // TODO check on adding - public boolean containsDigestOnTop(Object digest) { + /** + * Check if the most recently added operation has a digest equal to the provided digest. + * + * @param digest The digest object to compare against the top operation + * @return true if the top operation's digest equals the provided digest, false otherwise + */ + public boolean containsDigestOnTop(Object digest) { return this.queue.peekLast() != null && this.queue.peekLast().digest().equals(digest); }opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
20-21: Add class-level JavaDoc.Per coding guidelines, public classes should have proper JavaDoc. Consider expanding the single-line comment into a full JavaDoc explaining the TopK optimization purpose and when this node is used.
Suggested JavaDoc
-/** The different between this and {@link EnumerableLimitSort} is the cost. */ +/** + * A physical relational node representing a TopK operation with optimized cost computation. + * + * <p>This node extends {@link EnumerableLimitSort} but computes a lower cost to encourage + * the planner to prefer fusing separate Limit and Sort operators into this single TopK node, + * thereby reducing memory usage during execution. + */ public class CalciteEnumerableTopK extends EnumerableLimitSort {opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java (1)
28-42: Add acopy()override toCalciteEnumerableTopKto preserve the custom type.When Calcite's planning engine clones relational nodes, the parent
EnumerableLimitSort.copy()method will be invoked. Without an override, this returns anEnumerableLimitSortinstance instead ofCalciteEnumerableTopK, losing the customcomputeSelfCost()implementation. Following the pattern used inLogicalSystemLimitand other custom Calcite nodes in this codebase, overridecopy()to return a newCalciteEnumerableTopKinstance with the updated parameters.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.jsoninteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yamlopensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java
integ-test/src/test/resources/**/*
⚙️ CodeRabbit configuration file
integ-test/src/test/resources/**/*: - Verify test data is realistic and representative
- Check data format matches expected schema
- Ensure test data covers edge cases and boundary conditions
Files:
integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.jsoninteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yamlopensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.javainteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.jsoninteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yamlopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.jsoninteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java (2)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Sort.java (1)
Sort(16-48)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
CalciteEnumerableTopK(21-54)
🔇 Additional comments (16)
integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest_custom_time.yaml (1)
11-14: LGTM!The test expectation correctly reflects the Limit + Sort fusion into
EnumerableLimitSort. The sort key (sort0=[$5]), direction (dir0=[ASC]), and fetch limit (fetch=[10000]) are properly combined into a single operator, which aligns with the PR objective of reducing memory overhead by avoiding full in-memory sorts.opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)
420-429: LGTM!The guard logic correctly prevents stack overflow by checking if an identical
LimitDigestalready exists at the top of the pushdown context. This is a sound fix for the recursive pushdown issue, and the precomputeddigestvariable is properly reused when adding to the context.integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_reset_null_bucket.yaml (1)
18-40: LGTM!The test expectation correctly reflects the plan restructuring:
EnumerableLimitSortfuses the limit and sort operationsEnumerableMergeJoinwith 3-key condition requires the corresponding 3-key sorts on both inputs- The inner branch projections (
[gender, age]) are appropriately minimized for the join/aggregation logicThe plan structure is consistent with the TopK optimization objective.
integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml (1)
16-27: LGTM!The test expectation correctly reflects the multisearch physical plan with
EnumerableMergeUnion. TheLIMIT->5appears twice in eachPushDownContext(before and afterPROJECT), which is consistent with limit pushdown occurring at multiple stages. The search filter terms (A,Bfor the first source;E,Ffor the second) are correctly represented in the query builders.integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_global_null_bucket.yaml (1)
16-31: LGTM!The test expectation correctly reflects the TopK optimization with
EnumerableLimitSortfusing the limit and sort. TheEnumerableMergeJoinwith its required sorted inputs and the inner branch's optimized projections ([gender]and[gender, age]) are appropriately structured.integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown5.json (1)
4-4: LGTM!The test expectation correctly shows
EnumerableLimitSort(sort0=[$0], dir0=[ASC-nulls-first], fetch=[100])replacing the previousEnumerableSort(fetch=[100]). This fusion is consistent with the PR objective of combining limit and sort into a single TopK operator for improved memory efficiency.integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (1)
13-15: LGTM!This is the key test expectation for the PR's main objective (issue #4982). The
EnumerableLimitSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[25])directly addresses the OOM issue by maintaining only the top 25 results in memory during sorting, rather than sorting all data first and then limiting. This pattern is correct and will prevent the memory exhaustion observed in Clickbench Q28.integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)
14-17: LGTM! Physical plan correctly reflects TopK fusion.The expected output properly shows the optimization: the inner
EnumerableLimit(fetch=[25])+EnumerableSortpattern is fused into a singleEnumerableLimitSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[25]), while preserving the outer system limit (EnumerableLimit(fetch=[10000])). This aligns with the PR objective to reduce memory usage by avoiding full in-memory sorts.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)
44-46: LGTM! Rule registration follows existing patterns.The new
ENUMERABLE_TOP_K_RULEfollows the established pattern for rule instantiation and naming conventions (UPPER_SNAKE_CASE for constants).
66-68: Verify rule ordering is intentional.The
ENUMERABLE_TOP_K_RULEis placed beforeEXPAND_COLLATION_ON_PROJECT_EXPR. Please confirm this ordering is intentional, as the TopK transformation should occur before collation expansion to ensure proper optimization.integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml (1)
15-17: Verify semantic equivalence of operator reordering.The physical plan now has
EnumerableLimit(fetch=[10000])aboveEnumerableCalc, whereas the previous structure had them in reverse order. While this may be a valid optimization, please verify that limiting before calculating (projection) is semantically equivalent in this correlated subquery context—especially since the projection changes the column ordering (id, name, salaryfromname, id, salary).integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_earliest_latest.yaml (1)
11-14: LGTM! Correctly demonstrates Limit+Sort fusion.The expected output properly reflects the TopK optimization:
EnumerableLimit(fetch=[10000])+EnumerableSort(sort0=[$5], dir0=[ASC])are fused intoEnumerableLimitSort(sort0=[$5], dir0=[ASC], fetch=[10000]). The downstream window operators and index scan remain unchanged.integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_distinct_count.yaml (1)
11-14: LGTM! Consistent with TopK fusion pattern.The expected output correctly shows the Limit+Sort operators fused into a single
EnumerableLimitSortwhile preserving downstream window operators and the index scan context.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKRule.java (2)
17-26: LGTM! Well-documented rule implementation.The class-level JavaDoc clearly explains the transformation purpose. The rule correctly extends
InterruptibleRelRuleand follows the established pattern in this codebase.
51-58: Rule predicate correctly identifies TopK candidates.The predicate ensures the rule only matches
LogicalSortnodes that have:
- A fetch or offset (making it a "top" operation)
- Non-empty field collations (actual ordering is specified)
This correctly filters for genuine TopK scenarios.
integ-test/src/test/resources/expectedOutput/calcite/explain_streamstats_null_bucket.yaml (1)
11-15: LGTM! Consistent TopK fusion pattern.The expected output correctly demonstrates the Limit+Sort fusion into
EnumerableLimitSortwhile preserving the windowing and index scan operators downstream.
...arch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java
Show resolved
Hide resolved
yuancu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Let's suspend on merging until #4992 merged |
| * CalciteEnumerableTopK}. | ||
| */ | ||
| @Value.Enclosing | ||
| public class EnumerableTopKRule extends InterruptibleRelRule<EnumerableTopKRule.Config> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need copy the license from Calcite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think. it's not 100% copy
Signed-off-by: Lantao Jin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java (1)
192-195: Clarify or resolve the TODO comment.The TODO comment "check on adding" is vague and doesn't explain what verification is needed. Consider either:
- Expanding the comment to describe what should be checked (e.g., "TODO: Consider checking digest equality during
add()to prevent duplicates earlier")- Removing it if no action is required
Additionally, consider adding JavaDoc for this public method to document its purpose and when it should be used versus
containsDigest(Object).🔎 Suggested improvement
- // TODO check on adding + /** + * Check if the most recently pushed operation has the specified digest. + * This is useful for preventing duplicate consecutive operations. + * + * @param digest The digest to compare against the top operation + * @return true if the last operation's digest equals the provided digest + */ public boolean containsDigestOnTop(Object digest) { return this.queue.peekLast() != null && this.queue.peekLast().digest().equals(digest); }opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)
439-443: LGTM! Consider clarifying the comment.The defensive guard against duplicate consecutive limit digests is a sound fix. Creating the digest once and reusing it is efficient.
The comment "avoid stack overflow" could be more descriptive about the root cause. Consider explaining the scenario that triggers recursive pushdown attempts:
🔎 Suggested comment improvement
LimitDigest digest = new LimitDigest(limit, offset); if (pushDownContext.containsDigestOnTop(digest)) { - // avoid stack overflow + // Avoid infinite recursion: if the same limit was just pushed, the planner + // may re-invoke this rule on the same node. Return null to break the cycle. return null; }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yamlopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
🚧 Files skipped from review as they are similar to previous changes (1)
- integ-test/src/test/resources/expectedOutput/calcite/explain_multisearch_timestamp.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
integ-test/src/test/resources/**/*
⚙️ CodeRabbit configuration file
integ-test/src/test/resources/**/*: - Verify test data is realistic and representative
- Check data format matches expected schema
- Ensure test data covers edge cases and boundary conditions
Files:
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlopensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.javainteg-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yamlinteg-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (58)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: WhiteSource Security Check
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (9)
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml (1)
13-15: LGTM! TopK optimization correctly reflected in expected output.The physical plan now uses
EnumerableLimitSortto fuse the limit and sort operations into a single TopK operator, which aligns perfectly with the PR objective to avoid OOM by preventing full in-memory sorting of all rows.integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)
15-17: LGTM! Consistent TopK optimization across test cases.The expected output correctly shows the
EnumerableLimitSortfusion, demonstrating that the TopK optimization applies consistently across different query patterns.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)
42-43: LGTM! New TopK rule properly declared.The
ENUMERABLE_TOP_K_RULEfollows the established pattern for rule declarations in this class and uses the standard configuration approach.
63-63: LGTM! Rule positioning appears intentional.The rule is placed after
RARE_TOP_PUSH_DOWNand beforeEXPAND_COLLATION_ON_PROJECT_EXPR, which allows rare/top pushdown operations to complete before applying the TopK optimization. This ordering supports the goal of preventing OOM by ensuring TopK is applied at the appropriate stage of query optimization.opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)
444-449: Verify test coverage for the EnumerableTopKRule integration path with this limit pushdown logic.The code change correctly reuses the precomputed
digestobject, andpushDownLimititself has comprehensive test coverage across multiple test classes. However, tests for the newEnumerableTopKRuleintegration are missing. Consider adding tests that specifically verify:
- The
EnumerableTopKRuleoptimization path exercising this pushdown logic- The scenario where duplicate limits would have caused issues (as mentioned in the PR objective)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml (1)
12-12: The composite source ordering correctly reflects the logical plan.The state ordering has been updated to "desc" which correctly matches the logical plan's
dir1=[DESC-nulls-last]for the state field ($3), and gender remains "asc" matchingdir0=[ASC-nulls-first].integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (1)
12-12: Ordering difference between primary and alternative expected outputs is intentional and represents non-deterministic physical plan generation.The alternative expected output file is correctly set up to handle non-deterministic ordering of script_fields (new_state/new_gender vs new_gender/new_state) during physical plan generation. The test code explicitly loads and compares both primary and alternative files using
assertYamlEqualsIgnoreId, confirming that both orderings are valid and acceptable. This pattern is consistently applied across all 8 alternative expected output files in the calcite directory, indicating that HashMap-based collections during plan generation can produce either ordering non-deterministically. The test structure is correct as-is.integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml (2)
17-17: Cosmetic formatting change to OpenSearchRequestBuilder.The
startFrom=0parameter placement has been reformatted. This is a cosmetic change with no functional impact.
9-10: The concern about operator ordering is based on a misinterpretation of the query plan tree structure. In Calcite plans, indentation indicates parent-child relationships.EnumerableCalcis the parent (root) operator, andEnumerableLimitis a child operator beneath it. Data flows from the leaf (IndexScan) upward throughSortandLimitbefore reachingCalcat the root. This meansLimitactually executes before the projection, filtering to 10,000 rows first and then projecting theagecolumn—which is the optimal order. This operator arrangement is consistent across eight similar test files in the same directory and matches the expected behavior of the TopK optimization that pushes the balance sort (fetch=5) to the index scan.Likely an incorrect or invalid review comment.
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"asc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false},"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)]) | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"desc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false},"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change {"field":"state.keyword","missing_bucket":false,"order":"asc"} to{"field":"state.keyword","missing_bucket":false,"order":"desc"}
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"asc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false},"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)]) | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=LogicalProject#,group={0, 1},agg#0=LITERAL_AGG(2)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"gender":{"terms":{"field":"gender.keyword","missing_bucket":false,"order":"asc"}}},{"state":{"terms":{"field":"state.keyword","missing_bucket":false,"order":"desc"}}}]},"aggregations":{"$f2":{"top_hits":{"from":0,"size":2,"version":false,"seq_no_primary_term":false,"explain":false,"_source":{"includes":["gender","state","account_number","age"],"excludes":[]},"script_fields":{"new_state":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["state.keyword"]}},"ignore_failure":false},"new_gender":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQA/HsKICAib3AiOiB7CiAgICAibmFtZSI6ICJMT1dFUiIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0],"DIGESTS":["gender.keyword"]}},"ignore_failure":false}}}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Signed-off-by: Lantao Jin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integ-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yaml (1)
12-14: Physical plan transformation is correct; add edge case tests for fetch boundary values.The transformation from separate EnumerableLimit+EnumerableSort to EnumerableLimitSort correctly preserves:
- Sort column reference ($1 = cnt field)
- Fetch limit (10000)
- Downstream operators (EnumerableCalc, CalciteEnumerableIndexScan)
- PushDownContext for aggregation pushdown
This aligns with the PR objective to reduce memory usage via TopK optimization.
However, test coverage lacks boundary conditions for fetch limits. While the test suite comprehensively covers various aggregation scenarios and sort directions (ASC-nulls-first, DESC-nulls-last), it consistently uses fetch=[10000]. Add tests with:
- Small fetch values (e.g., fetch=[1], fetch=[10])
- Edge cases (fetch=[0], fetch exceeding result size)
- Missing sort direction variant (ASC-nulls-last)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_null_str.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_use_other.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.jsoninteg-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_timechart.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yamlopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
🧰 Additional context used
📓 Path-based instructions (2)
integ-test/src/test/resources/**/*
⚙️ CodeRabbit configuration file
integ-test/src/test/resources/**/*: - Verify test data is realistic and representative
- Check data format matches expected schema
- Ensure test data covers edge cases and boundary conditions
Files:
integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.jsoninteg-test/src/test/resources/expectedOutput/calcite/chart_null_str.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_timechart.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_use_other.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_multi_terms_autodate_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.jsoninteg-test/src/test/resources/expectedOutput/calcite/chart_null_str.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_timechart.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_use_other.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.jsoninteg-test/src/test/resources/expectedOutput/calcite/chart_null_str.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yamlinteg-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_use_other.yamlinteg-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
🧬 Code graph analysis (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java (3)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Sort.java (1)
Sort(16-48)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java (1)
Value(19-56)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
CalciteEnumerableTopK(21-54)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java (1)
Value(22-66)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java (1)
CalciteEnumerableTopK(21-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (30)
integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml (1)
22-31: LGTM! The expected output correctly reflects the TopK optimization.The physical plan now starts with
EnumerableLimitSort(line 22), which properly fuses the limit and sort operations instead of using separateEnumerableLimit+EnumerableSortoperators. This aligns with the PR's objective to reduce memory usage by using a TopK operator.The plan structure is consistent with the logical plan:
fetch=[10000]matchesLogicalSystemLimit- Sort directions and columns match
LogicalSort- Downstream operators (aggregate, calc, merge join, index scans) are preserved correctly
Based on learnings, this change correctly tests the SQL optimization path for Calcite integration.
integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml (1)
21-35: LGTM! Physical plan correctly demonstrates the TopK optimization.The expected output properly reflects the fusion of
EnumerableLimit+EnumerableSortinto a singleEnumerableLimitSortat the top of the physical plan. The sort keys, directions, and fetch limit (10000) are consistent with the logical plan. The innerEnumerableSortoperators (lines 27, 30) are appropriately preserved for theEnumerableMergeJoinwhich requires sorted inputs.This test case is representative—covering multiple group keys, left joins, window functions, and aggregation pushdowns—and aligns with the PR objective of reducing memory usage for limit+sort patterns. Based on learnings, this validates the SQL generation and optimization paths for Calcite integration changes.
integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml (1)
11-12: LGTM! Correct TopK optimization transformation.The physical plan correctly consolidates the separate limit and sort operations from the logical plan into a single
EnumerableLimitSortoperator. The sort columns ($0,$1), directions (ASC-nulls-first), and fetch limit (10000) accurately match the logical plan specification.integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml (1)
11-12: LGTM! Correct TopK optimization with different sort columns.The physical plan correctly transforms the nested limit-over-sort pattern into a unified
EnumerableLimitSortoperator. The sort parameters (sort0=[$0], sort1=[$2]) properly reflect the logical plan, providing complementary test coverage for different column index combinations.integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_sort_agg_measure_not_push.yaml (1)
12-13: LGTM! Correct single-column TopK optimization.The physical plan correctly applies the
EnumerableLimitSortoptimization for a single-column sort scenario. The parameters (sort0=[$1], dir0=[ASC-nulls-first], fetch=[10000]) accurately match the logical plan, and this test case provides valuable coverage for temporal aggregations with auto_date_histogram alongside the TopK transformation.integ-test/src/test/resources/expectedOutput/calcite/explain_exists_correlated_subquery.yaml (1)
14-21: LGTM! Physical plan correctly reflects the TopK optimization.The consolidation of limit and sort into
EnumerableLimitSortwithfetch=[10000]is correct. The plan maintains:
- Proper correlation handling via
EnumerableCorrelatewith$cor0- Correct pushdown of filter and limit to the inner scan (10000 subsearch limit)
- Accurate projection of
[name, id, salary]for the outer worker scanThe test expectation is realistic and representative of the correlated EXISTS subquery pattern. Based on learnings, this correctly tests the SQL optimization path for Calcite integration changes.
integ-test/src/test/resources/expectedOutput/calcite/explain_in_correlated_subquery.yaml (1)
14-22: LGTM! Correctly represents IN correlated subquery with TopK optimization.The plan structure is well-formed:
EnumerableLimitSortat line 16 consolidates limit and sort as intended- The additional
EnumerableCalcat line 17 correctly implements the IN condition ($t0=$t3) between outer and inner values- Inner scan properly pushes
FILTERandLIMIT->10000to OpenSearchTest data covers the IN subquery edge case distinct from EXISTS, ensuring comprehensive test coverage for the optimization.
integ-test/src/test/resources/expectedOutput/calcite/explain_exists_uncorrelated_subquery.yaml (1)
14-20: LGTM! Correctly differentiates uncorrelated EXISTS from correlated pattern.The use of
EnumerableNestedLoopJoin(condition=[true])instead ofEnumerableCorrelateis semantically correct for uncorrelated subqueries. Key observations:
- The join type
innerwithcondition=[true]implements EXISTS correctly - if inner produces rows, the cross join succeedsEnumerableAggregate(group=[{0}])ensures single-row output for the EXISTS checkEnumerableLimitSortconsolidation at line 15 applies the TopK optimization as intendedTest coverage is comprehensive with this file covering the uncorrelated EXISTS edge case alongside the correlated variants in other files.
integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml (1)
12-13: LGTM! Physical plan correctly reflects the TopK optimization.The physical plan now uses the fused
EnumerableLimitSortoperator instead of the two-stepEnumerableLimit→EnumerableSortsequence. This aligns with the PR objective to reduce memory usage by applying TopK directly rather than sorting all results before limiting. The sort key ($7=@timestamp), direction (DESC-nulls-last), and fetch limit (10000) are all preserved correctly.Based on learnings, ensure that the corresponding test execution validates both the correctness of the results and the optimization path.
integ-test/src/test/resources/expectedOutput/calcite/explain_in_uncorrelated_subquery.yaml (1)
13-17: LGTM! Test expectation correctly reflects the TopK optimization.The updated physical plan accurately demonstrates the PR's optimization goal:
- Line 14 introduces
EnumerableLimitSort, which combines the previously separate limit and sort operations into a single TopK operator- The sort field (
$2= salary), direction (DESC-nulls-last), and fetch limit (10000) are correctly preserved from the logical plan- The operator placement is correct: positioned between the
EnumerableCalcprojection and theEnumerableHashJointo sort the join results before the final projection- This optimization reduces memory usage by avoiding a full in-memory sort before applying the limit, directly addressing the OOM issues described in issue #4982
Based on learnings, verify that the SQL generation and optimization paths work correctly end-to-end with this new operator.
integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml (1)
21-30: Transformation is correct; edge case coverage confirmed.The conversion from EnumerableLimit + EnumerableSort to EnumerableLimitSort correctly preserves the sort keys, directions, and fetch limit while consolidating the operations. The test data is realistic (OTEL logs) and the schema is consistent.
The DESC-nulls-last edge case with small fetch values from issue #4982 is already covered: q28.yaml and q29.yaml both contain
EnumerableLimitSort(sort0=[$0], dir0=[DESC-nulls-last], fetch=[25]). This test file appropriately covers the ASC multi-key sorting variant.integ-test/src/test/resources/expectedOutput/calcite/agg_composite_range_sort_agg_measure_not_push.yaml (1)
12-13: LGTM: Correct TopK optimization applied.The transformation from separate EnumerableLimit and EnumerableSort operators to a single EnumerableLimitSort correctly implements the TopK optimization described in the PR objectives. The fused operator preserves all parameters (sort on column $1, ASC-nulls-first direction, fetch=10000) and maintains the CalciteEnumerableIndexScan child with its aggregation pushdown context intact. This should reduce memory consumption by avoiding a full in-memory sort before applying the limit.
Based on learnings, this test validates the Calcite optimization path for the TopK transformation.
integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml (2)
22-22: LGTM! EnumerableLimitSort correctly fuses limit and sort operations.The physical plan correctly applies the TopK optimization by replacing the separate
LogicalSystemLimit+LogicalSortpattern (lines 3-4) with a singleEnumerableLimitSortoperator. This aligns with the PR objective to prevent OOM by avoiding full in-memory sorts when fetch limits are present.
23-39: Physical plan structure is consistent and test coverage is comprehensive.The downstream operators correctly preserve the logical plan semantics:
- Line 36 explicitly handles the IS NOT NULL filter condition matching the logical plan (line 15)
- Join, aggregation, and window function operators are properly structured
- Index scan pushdown contexts appropriately filter and project required fields
The test provides good coverage with null handling, multi-key sorting, joins, aggregations, and window functions.
integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_multi_buckets_not_pushed.yaml (1)
11-12: LGTM!The expected output correctly reflects the TopK optimization: the separate
EnumerableLimitandEnumerableSortoperators are fused into a singleEnumerableLimitSortwith the appropriate sort key, direction, and fetch limit preserved. The underlyingCalciteEnumerableIndexScanand itsPushDownContextremain unchanged, which is expected behavior.integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml (1)
22-39: LGTM!The physical plan correctly demonstrates the TopK fusion at the top level with
EnumerableLimitSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[10000]). The restructured plan maintains semantic equivalence while enabling the memory-efficient TopK operation. BothCalciteEnumerableIndexScannodes retain their respectivePushDownContextconfigurations.integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml (1)
22-35: LGTM!The expected output follows the same TopK fusion pattern as other timechart tests, correctly replacing the limit+sort sequence with
EnumerableLimitSort. The plan structure maintains proper semantics for the count-based timechart query.integ-test/src/test/resources/expectedOutput/calcite/explain_sort_agg_push.json (1)
4-4: LGTM!The JSON expected output correctly reflects the TopK fusion:
EnumerableLimitSortnow directly wrapsCalciteEnumerableIndexScanwith the aggregation pushdown preserved in thePushDownContext. This aligns with the optimization objective.integ-test/src/test/resources/expectedOutput/calcite/explain_top_k_then_sort_push.yaml (1)
9-16: LGTM!This test correctly demonstrates the TopK-then-sort optimization pattern. The inner TopK (fetch=5, sorted by balance) is pushed down to OpenSearch via
PushDownContext, while the outer sort (by age) with system limit (fetch=10000) usesEnumerableLimitSort. This is the expected behavior for queries requiring a secondary sort after initial TopK retrieval.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (2)
42-45: LGTM!The new TopK rules follow the established pattern used by other rules in this class. The
Config.DEFAULT.toRule()instantiation is consistent with existing rule declarations.
65-66: Verify the rule ordering is intentional.The TopK rules are placed after
RARE_TOP_PUSH_DOWNand beforeEXPAND_COLLATION_ON_PROJECT_EXPR. Since Calcite rule ordering can affect optimization outcomes, please confirm this placement is intentional and that the converter rule should fire before the merge rule in the optimization sequence.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKMergeRule.java (3)
15-23: LGTM!The class follows the established
InterruptibleRelRulepattern used by other OpenSearch planning rules. The protected constructor and@Value.Enclosingannotation are correctly applied for Immutables integration.
25-33: Implementation correctly merges Limit + Sort into TopK.The rule properly extracts the collation from
EnumerableSortand the offset/fetch fromEnumerableLimit, then creates aCalciteEnumerableTopKusing the sort's input. This correctly bypasses both original operators.
35-55: Config correctly specifies the operand pattern.The predicate
sort.fetch == nullensures this rule only matchesEnumerableSortnodes without their own fetch limit, which is the correct pattern for merging a separateEnumerableLimitwith an underlying full sort. TheoneInputconstraint properly enforces the parent-child relationship.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableTopKConverterRule.java (3)
17-27: LGTM!The class structure and JavaDoc clearly describe the rule's purpose: converting
LogicalSortnodes with fetch/offset and non-empty collation directly intoCalciteEnumerableTopK. This complementsEnumerableTopKMergeRuleby handling cases where the limit is already part of theLogicalSortrather than a separateEnumerableLimit.
29-43: Convention conversion is handled correctly.The
convert()call properly transforms the input toEnumerableConventionbefore creating theCalciteEnumerableTopK. This ensures the physical operator receives properly converted children.
45-65: Config predicate correctly identifies TopK candidates.The predicate
(sort.fetch != null || sort.offset != null) && !sort.collation.getFieldCollations().isEmpty()ensures:
- The sort has limit semantics (fetch or offset present)
- The sort has actual ordering (non-empty collation)
This correctly filters out pure sorts without limits and pure limits without ordering.
integ-test/src/test/resources/expectedOutput/calcite/agg_composite_autodate_range_metric_sort_agg_measure_not_push.yaml (1)
12-13: Transformation correctly implements TopK optimization.The physical plan correctly merges LogicalSystemLimit and LogicalSort into a single EnumerableLimitSort operator. Parameters match precisely:
- Sort field: $1 (cnt field)
- Sort direction: ASC-nulls-first
- Fetch limit: 10000
This aligns with the PR objective to reduce memory consumption through TopK semantics. Edge cases are well-covered across the test suite: EnumerableLimitSort is used in 30 test files where both limit and sort are present; separate EnumerableSort appears in queries without limits; and standalone EnumerableLimit appears in queries without sort requirements.
integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml (2)
17-22: Test data is valid and properly formatted for string comparison.The YAML file uses a literal block scalar (pipe
|syntax) where theSORT->[{...}]configuration is treated as plain text, not parsed as YAML structure. The test framework'sassertYamlEqualsIgnoreId()method performs string comparison viaassertEquals()after normalizing line breaks and IDs—it does not parse the embedded JSON-like syntax.The multi-line SORT configuration is correct:
SORT->[{"firstname" : {"order" : "asc", "missing" : "_last"}}]matches theOpenSearchRequestBuildersort specification- Indentation and format are consistent with similar working test files (e.g.,
asc_sort_timestamp.yaml,desc_sort_timestamp.yaml)- The test method
testExplainSortPassThroughJoinThenPushdown()successfully compares this expected output against actual query resultsNo changes required—the test data is realistic, matches the expected schema, and correctly represents the join sort passthrough scenario.
13-14: Merge join optimization correctly implements sort pushdown; plan shows expected architecture.The transformation from
EnumerableLimit+EnumerableSorttoEnumerableLimitSortaligns with the PR's TopK optimization. The switch toEnumerableMergeJoinis valid: both inputs are correctly sorted by their join keys—left by$13(initial) and right by$15(firstname)—as shown by the PushDownContext entries.The right side PushDownContext explicitly includes
SORT->[{"firstname":{"order":"asc","missing":"_last"}}]and the OpenSearch request builder reflects this sort directive. Similar merge join sort pushdown scenarios are validated inCalcitePPLJoinIT.testSimpleSortPushDownForSMJand related tests with actual result verification, confirming the pattern works as intended.
| private static final EnumerableTopKMergeRule ENUMERABLE_TOP_K_MERGE_RULE = | ||
| EnumerableTopKMergeRule.Config.DEFAULT.toRule(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ENUMERABLE_TOP_K_RULE be eliminated with ENUMERABLE_TOP_K_MERGE_RULE?
It seems the former converts limit+sort from logical to enumerable top-k, while the latter converts limit+sort from enumerable ones to enumerable top-k. But the former will be converted to limit+sort enumerable plan if without the converter rule. The merge rule seem to cover this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No before #4992 , let me try it now, will remove it if no explain case fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 4992, removing EnumerableTopKConverterRule works now.
Signed-off-by: Lantao Jin <[email protected]>
Description
Support enumerable TopK, check #4982 for issue details.
LogicalSortwithfetchtoCalciteEnumerableTopKEnumerableLimitandEnumerableSorttoCalciteEnumerableTopKThe
CalciteEnumerableTopKis derived fromEnumerableLimitSortwhich has the corrected cost computation.Related Issues
Resolves #4982
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.