-
Notifications
You must be signed in to change notification settings - Fork 180
Pushdown join with max=n option to TopHits aggregation
#4929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pushdown join with max=n option to TopHits aggregation
#4929
Conversation
Signed-off-by: Lantao Jin <[email protected]>
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds a helper to apply a join-subsearch system limit in CalciteRelNodeVisitor, invokes it after dedup steps in join paths, removes ORDER BY clauses from ROW_NUMBER() dedup windows, updates PlanUtils row-number detection, and adjusts integration/PPL tests and expected-output YAMLs accordingly. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
🧰 Additional context used🧠 Learnings (3)📓 Common learnings📚 Learning: 2025-12-02T17:27:55.938ZApplied to files:
📚 Learning: 2025-12-11T05:27:39.831ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
🔇 Additional comments (1)
Comment |
Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
1323-1374: Address the gap between documentation and SEMI/ANTI join implementation.The documentation allows
maxoption withsemiandantijoin types, but the code returns early for SEMI/ANTI joins (lines 1328-1332) before processing themaxoption and applying dedup/limit optimizations. This creates an inconsistency: users can specifymaxwith SEMI/ANTI joins per the documented syntax, but it will be silently ignored.Either remove the early return to enable
maxsupport for SEMI/ANTI joins, or add a validation error whenmaxis specified with SEMI/ANTI, and update documentation to clarify the limitation.
♻️ Duplicate comments (5)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.java (1)
1034-1050: Consistent test expectation updates.Same pattern as lines 1005-1020, correctly updating expectations for the simplified window function in the criteria-based join test.
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.java (4)
61-87: Same determinism concern applies to testDedup2.This test method exhibits the same issue as testDedup1: deterministic
expectedResult(lines 65-77) with non-deterministicROW_NUMBER() OVER (PARTITION BY $7)(line 61).
100-179: Same determinism concern applies to keepempty test methods.Both
testDedupKeepEmpty1andtestDedupKeepEmpty2useROW_NUMBER() OVER (PARTITION BY $7, $2)without ORDER BY (lines 100, 143), yet expect deterministic results. The keepempty=true logic adds OR conditions for NULL values, but doesn't address the non-deterministic ordering issue.
192-230: Same determinism concern applies to expression-based dedup tests.The
testDedupExprmethod uses computed expressions (NEW_DEPTNO) in dedup operations with unordered ROW_NUMBER() (lines 192, 211, 224), maintaining the same determinism concerns.
242-274: Same determinism concern applies to rename dedup tests.The
testRenameDedupmethod follows the same pattern with renamed columns (lines 242, 255, 268).
🧹 Nitpick comments (2)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_dedup_keepempty_true_not_pushed.yaml (1)
6-11: Verify test determinism with unordered ROW_NUMBER().The
ROW_NUMBER() OVER (PARTITION BY $4)window function lacks anORDER BYclause, making row numbering within each partition non-deterministic. While this may be intentional for performance (allowing OpenSearch to use its default ordering), it could affect test result predictability.For dedup operations where the specific rows kept don't matter, this is semantically valid, but the test assertions should account for potential result variation across runs or OpenSearch versions.
Confirm that:
- The test expectations in the corresponding test file are flexible enough to handle non-deterministic ordering
- This behavioral change is documented in the PR description or test comments
- The removal of ORDER BY is intentional and aligns with the TopHits aggregation pushdown strategy
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
1383-1393: Consider adding JavaDoc for clarity on system limit vs max option.The helper method is well-implemented and centralizes the limit application logic. However, since this interacts with the
maxoption in the join paths above, consider adding JavaDoc to clarify:
- This applies a system-level
join.subsearch_maxoutlimit, separate from the user-specifiedmaxoption- The
maxoption controls deduplication, while this limit controls the overall subsearch output size- Why it's called unconditionally (not just when
maxis set)This would help future maintainers understand the relationship between these two related but distinct concepts.
Based on learnings: Document Calcite-specific workarounds and optimization patterns in code.
Example JavaDoc:
+ /** + * Applies system-level join subsearch limit to the current RelBuilder top. + * <p> + * This limit is independent of the user-specified {@code max} option: + * <ul> + * <li>{@code max} option: controls deduplication of the right side</li> + * <li>join.subsearch_maxout: system limit on total subsearch output</li> + * </ul> + * Both optimizations work together to enable TopHits aggregation pushdown. + * + * @param context CalcitePlanContext containing the RelBuilder and system limits + */ private static void addSysLimitForJoinSubsearch(CalcitePlanContext context) {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java(3 hunks)core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java(1 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java(1 hunks)integ-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.java(1 hunks)integ-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.java(4 hunks)integ-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.java(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml(2 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_true_not_pushed.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_text_type_no_push.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml(2 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml(2 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml(2 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_dedup_keepempty_false_push.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_dedup_keepempty_true_not_pushed.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_dedup_push.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_join_with_criteria_max_option.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_join_with_fields_max_option.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_output.yaml(2 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.java(14 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.java(5 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure methods are under 20 lines with single responsibility
- Verify proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify test coverage for new business logic
- Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Verify SQL generation and optimization paths
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
**/ppl/**/*.java
⚙️ CodeRabbit configuration file
**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases
- Check AST generation for new syntax
- Ensure corresponding AST builder classes are updated
- Validate edge cases and boundary conditions
Files:
integ-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.java
**/*Test.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*Test.java: All new business logic requires unit tests
Name unit tests with*Test.javasuffix in OpenSearch SQL
Files:
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.java
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_dedup_push.yamlinteg-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_dedup_keepempty_false_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlinteg-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_join_with_fields_max_option.yamlinteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yamlinteg-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_output.yamlinteg-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_dedup_keepempty_true_not_pushed.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yamlppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.javainteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_true_not_pushed.yamlinteg-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_output.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_text_type_no_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.javainteg-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧬 Code graph analysis (2)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java (1)
GlobalPushdownConfig(307-310)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
core/src/main/java/org/opensearch/sql/calcite/plan/LogicalSystemLimit.java (1)
LogicalSystemLimit(25-90)
🔇 Additional comments (30)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.java (2)
1005-1020: Test expectations correctly updated for simplified window function.The removal of ORDER BY from the ROW_NUMBER window function aligns with the PR objective to optimize join dedup operations. The window function now uses
ROW_NUMBER() OVER (PARTITION BY ...)without explicit ordering.Please verify that removing ORDER BY from the dedup window function doesn't affect the correctness of the max=n join semantics. When ORDER BY is absent, the row numbering within each partition becomes non-deterministic. Confirm this is acceptable for the dedup use case or that ordering is handled elsewhere in the execution path.
Based on learnings, verify SQL generation and optimization paths for Calcite integration changes.
1080-1080: Good catch - test logic correction.This change fixes a test logic bug where
root1was being verified twice. The corrected code now properly testsroot2(SQL-like join syntax) to ensure both join syntaxes behave correctly before the maxout setting is applied.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_join_with_fields_max_option.yaml (3)
8-8: New JOIN_SUBSEARCH_MAXOUT limit type is appropriately introduced.The logical plan correctly applies
LogicalSystemLimitwith type[JOIN_SUBSEARCH_MAXOUT]on the right-side subquery, aligning with the PR objective of pushing down themax=nlimit to the join subsearch. This replaces in-memory window function processing with DSL-level limit enforcement.
10-11: ROW_NUMBER window function dedup structure is correct.The window function at line 11 uses
ROW_NUMBER() OVER (PARTITION BY $0)without an ORDER BY clause, which aligns with the PR's removal of ORDER BY from dedup window functions. This simplification supports pushdown compatibility.
15-27: The expected output file is correct and requires no changes.The no-pushdown variant properly uses
EnumerableMergeJoinat line 18 because both inputs are preceded byEnumerableSortoperations (lines 19 and 22), enabling sort-merge join optimization. This differs from the pushdown variant which usesEnumerableHashJoinwith pushed-down limits. The physical plan structure correctly reflects the no-pushdown optimization strategy where sorting happens client-side before the merge join.integ-test/src/test/java/org/opensearch/sql/ppl/dashboard/WafPplDashboardIT.java (1)
147-148: LGTM! Secondary sort key ensures deterministic top-k results.Adding the secondary sort by
terminatingRuleIdensures that when multiple terminating rules have the same count, the results are consistently ordered. This prevents test flakiness and aligns with the PR's pattern of stabilizing top-k queries across dashboard tests.integ-test/src/test/java/org/opensearch/sql/ppl/dashboard/NfwPplDashboardIT.java (1)
439-440: LGTM! Secondary sort key improves test determinism.The addition of
event.tcp.tcp_flagsas a secondary sort key ensures consistent ordering when TCP flag counts are tied, preventing non-deterministic test results.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_dedup_keepempty_false_push.yaml (1)
6-6: LGTM! Simplified dedup window semantics align with PR objectives.Removing the ORDER BY clause from the ROW_NUMBER() window function in dedup scenarios simplifies the plan and supports the pushdown optimization for join with
max=noption.Also applies to: 13-13
integ-test/src/test/java/org/opensearch/sql/ppl/dashboard/VpcFlowLogsPplDashboardIT.java (1)
169-171: LGTM! Secondary sort keys ensure deterministic test results.Multiple test methods now include secondary sort keys (srcaddr/dstaddr) to break ties when aggregate counts are equal. This ensures reproducible, stable test results and aligns with the broader pattern of deterministic top-k queries across dashboard integration tests.
Also applies to: 192-194, 215-215, 237-237
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml (1)
6-6: LGTM! Dedup window simplification supports pushdown optimization.integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml (1)
6-6: LGTM! Dedup window simplification in both logical and physical plans.Also applies to: 19-19
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml (1)
6-6: LGTM! Multi-column dedup window simplified consistently.core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java (1)
463-467: No action required—the change correctly supports join-max dedup scenarios.The expanded detection is necessary and correct. CalciteRelNodeVisitor conditionally generates either
ROW_NUMBER_COLUMN_FOR_DEDUPorROW_NUMBER_COLUMN_FOR_JOIN_MAX_DEDUPdepending on thefromJoinMaxOptionflag (line 1599). All callers in DedupPushdownRule usecontainsRowNumberDedupas a simple boolean check and don't depend on which specific dedup column type is present, making them compatible with the expanded detection logic.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml (1)
6-6: Test expectation updated correctly for dedup window simplification.The removal of
ORDER BY $1from theROW_NUMBER() OVER (PARTITION BY $1)window aligns with the PR objective to simplify dedup operations and prepare for TopHits aggregation pushdown. This change is consistent across all dedup-related explain outputs in the PR.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml (1)
6-6: Test expectation updated correctly for multi-column dedup window.The removal of
ORDER BY $4, $7from the multi-column partition dedup window is consistent with the broader dedup simplification in this PR.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml (1)
6-6: Test expectation updated correctly.Consistent dedup window simplification removing
ORDER BY $4.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml (1)
6-6: Test expectation updated correctly.Consistent dedup window simplification for multi-column partition.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2)
89-96: Join with max option tests can now run without global pushdown enabled.Removing the
enabledOnlyWhenPushdownIsEnabled()preconditions from these two join tests aligns with the PR objective: the new TopHits aggregation pushdown for join withmax=nallows these queries to be optimized regardless of the global pushdown configuration.Also applies to: 100-107
33-33: No changes needed. The implementation is correct. SettingGlobalPushdownConfig.enabled = falseininit()is the intended configuration for CalciteExplainIT. The 93+ tests in this class are designed to work without global pushdown enabled; inherited tests that specifically require pushdown useenabledOnlyWhenPushdownIsEnabled()and will skip appropriately when pushdown is disabled.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_true_not_pushed.yaml (1)
6-6: Test expectation updated correctly for both logical and physical plans.The dedup window simplification is consistently applied in both the logical plan (line 6) and physical plan (line 11), removing explicit ordering from the
ROW_NUMBER()window.Also applies to: 11-11
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_output.yaml (1)
6-6: Test expectation updated correctly for no-pushdown scenario.The dedup window simplification is applied consistently in both pushdown and no-pushdown scenarios, as evidenced by this file in the
calcite_no_pushdowndirectory.Also applies to: 19-19
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_dedup_push.yaml (1)
6-6: Test expectation updated correctly.Final verification: dedup window simplification is consistently applied across all test expected output files in this PR, affecting both logical and physical plans in both pushdown and no-pushdown scenarios.
Also applies to: 13-13
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (1)
8-27: LGTM with verification needed.The plan correctly implements the JOIN_SUBSEARCH_MAXOUT optimization:
- Line 8: System limit positioned after dedup to cap results at 50,000
- Lines 11-12: Dedup window (
_row_number_join_max_dedup_) with IS NOT NULL filter correctly ordered- Lines 23-26: Physical plan mirrors logical plan structure
However, the
ROW_NUMBER() OVER (PARTITION BY $6)lacks ORDER BY, producing non-deterministic row numbering within each lastname partition.Verify that the corresponding integration test handles potential result variation due to unordered partitioning, or document why deterministic ordering is not required for this join-with-max scenario.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_text_type_no_push.yaml (1)
6-12: Consistent with dedup pattern changes.The plan shows the same ORDER BY removal from
ROW_NUMBER() OVER (PARTITION BY $11)as other dedup files. The metadata fields (_id, _index, _score, etc.) are appropriately retained in the LogicalProject since this is a no-pushdown scenario.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_join_with_criteria_max_option.yaml (1)
8-26: New test expectation correctly implements JOIN_SUBSEARCH_MAXOUT optimization.This new test file correctly documents the expected plan for inner joins with the
max=noption:
- Line 8:
JOIN_SUBSEARCH_MAXOUTlimit caps the right side at 50,000 rows- Lines 11-12: Row number dedup window with IS NOT NULL filter on the partition key
- Lines 17-26: Physical plan uses
EnumerableMergeJoinwith appropriate sortingThe right-side subsearch dedup pattern (
_row_number_join_max_dedup_) is correctly structured to support the TopHits aggregation pushdown mentioned in the PR objectives.Note: The
ROW_NUMBER() OVER (PARTITION BY $0)lacks ORDER BY (line 11), consistent with other files in this PR, but verify test determinism as noted in other files.integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml (1)
8-24: Aggregation pushdown correctly implements join max=n optimization.The physical plan successfully pushes down the join subsearch with
max=nto a TopHits aggregation:
- Line 8-12: Logical plan uses
JOIN_SUBSEARCH_MAXOUTwith row-number dedup (consistent with no-pushdown variant)- Line 24: Physical plan converts this to
composite_bucketsaggregation withtop_hitsofsize=1The aggregation-based approach (
"size":1in top_hits) is semantically equivalent to the window function approach (ROW_NUMBER() <= 1), and allows OpenSearch to execute the limiting operation natively rather than in-memory.This aligns with the PR objective to "push down join with max=n option to TopHits aggregation."
integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml (1)
8-20: Join optimization successfully converts SortMergeJoin to HashJoin.This file demonstrates the join side reordering optimization mentioned in the PR objectives:
- Line 18: Uses
EnumerableHashJoininstead ofEnumerableMergeJoin- Line 19: The side with
max=n(originally right) is now left, with aggregation pushdown- Line 20: The other side (originally left) is now right, with a simple scan
This reordering allows the use of a HashJoin, which can be more efficient than SortMergeJoin when one side is small (limited to 50,000 rows by the
JOIN_SUBSEARCH_MAXOUT).The transformation aligns with the PR objective: "For inner joins, a SortMergeJoin may be converted to a HashJoin by reordering join sides."
integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (1)
8-21: Left join correctly preserves join type while optimizing right side.This file demonstrates proper handling of non-inner joins with the
max=noptimization:
- Line 5:
joinType=[left]- join type is preserved (cannot reorder sides for left join)- Line 18: Uses
EnumerableMergeJoinrather than HashJoin (appropriate for left joins)- Line 20: Left side pushes down complex
REX_EXTRACTexpression as OpenSearch script- Line 21: Right side with
max=npushed down ascomposite_bucketsaggregation withtop_hitsThis aligns with the PR objective: "For non-inner joins, the right side will be fully pushed down to DSL instead of executing the WindowFunction in memory."
The plan correctly optimizes the right side while maintaining left join semantics.
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLDedupTest.java (1)
28-48: ROW_NUMBER() without ORDER BY in dedup is intentional for this operation's semantics.The test correctly uses
ROW_NUMBER() OVER (PARTITION BY ...)without ORDER BY, which is the intended behavior for the dedup operation—selecting any valid row from each partition rather than a specific ordered one.Unit tests with Calcite's SCOTT_WITH_TEMPORAL schema execute deterministically against fixed, immutable data. The verifyResult() calls validate that execution produces consistent results with this schema. The Spark SQL generation also correctly mirrors this design choice across all dedup test methods.
No changes needed.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
1315-1316: Good placement of subsearch limit after dedup.The
addSysLimitForJoinSubsearchcall is correctly placed after dedup processing and before the join operation, ensuring the right-side subsearch is both deduplicated and limited before joining. This aligns with the PR's objective to enable pushdown of join withmax=noption.
|
|
Tip For best results, initiate chat on the files or code changes. @LantaoJin, thank you for the clarification! You're absolutely right. Since SEMI and ANTI joins are existence-based operations that only check whether matching rows exist (rather than joining all matching rows), the The behavior is working as intended—even if users specify ✏️ Learnings added
🧠 Learnings used |
Signed-off-by: Lantao Jin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
31-48: Avoid leakingGlobalPushdownConfig.enabledacross the IT JVM (restore previous value).
Setting a static global ininit()(Line 33) without resetting can make other integration tests order-dependent/flaky.Suggested direction: capture the previous value and restore it in the appropriate teardown hook for this test framework (e.g.,
@After/tearDown/ base cleanup override).Proposed patch (adjust method names/annotations to match the actual test lifecycle in
ExplainIT):public class CalciteExplainIT extends ExplainIT { + private boolean previousGlobalPushdownEnabled; + @Override public void init() throws Exception { - GlobalPushdownConfig.enabled = false; + previousGlobalPushdownEnabled = GlobalPushdownConfig.enabled; + GlobalPushdownConfig.enabled = false; super.init(); enableCalcite(); setQueryBucketSize(1000); loadIndex(Index.STRINGS); loadIndex(Index.BANK_WITH_NULL_VALUES); loadIndex(Index.NESTED_SIMPLE); loadIndex(Index.TIME_TEST_DATA); loadIndex(Index.TIME_TEST_DATA2); loadIndex(Index.EVENTS); loadIndex(Index.LOGS); loadIndex(Index.WORKER); loadIndex(Index.WORK_INFORMATION); loadIndex(Index.WEBLOG); loadIndex(Index.DATA_TYPE_ALIAS); } + + @Override + public void cleanup() throws Exception { + try { + super.cleanup(); + } finally { + GlobalPushdownConfig.enabled = previousGlobalPushdownEnabled; + } + } }(If
cleanup()isn’t the right hook, wire this into the correct one.) As per coding guidelines, consider documenting why this Calcite suite must force-disable global pushdown.
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)
1315-1316: Consider adding an explanatory comment.The placement of the system limit after dedup is strategically important for enabling pushdown of the right-side subsearch to DSL (avoiding in-memory WindowFunction execution). Consider adding a brief inline comment explaining this sequencing, similar to:
// Apply system limit after dedup to enable DSL pushdown for the right-side subsearch addSysLimitForJoinSubsearch(context);This would improve code maintainability for future developers.
1383-1393: Add JavaDoc for clarity and maintainability.The method implementation is correct, but it lacks documentation. While not strictly required for private methods, adding JavaDoc would improve maintainability for a method implementing critical join pushdown logic:
/** * Applies JOIN_SUBSEARCH_MAXOUT system limit to the right side of a join when configured. * This enables pushing the right-side subsearch with max=n down to DSL as a TopHits aggregation, * avoiding in-memory WindowFunction execution. * * @param context CalcitePlanContext containing the RelBuilder and system limits configuration */ private static void addSysLimitForJoinSubsearch(CalcitePlanContext context) {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java(3 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
⚙️ CodeRabbit configuration file
**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure methods are under 20 lines with single responsibility
- Verify proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify test coverage for new business logic
- Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Verify SQL generation and optimization paths
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (5)
📓 Common learnings
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.831Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-11T05:27:39.831Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.831Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javacore/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧬 Code graph analysis (2)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java (1)
GlobalPushdownConfig(307-310)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)
core/src/main/java/org/opensearch/sql/calcite/plan/LogicalSystemLimit.java (1)
LogicalSystemLimit(25-90)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
87-107: RestoreGlobalPushdownConfig.enabledafter test initialization to prevent test isolation issues.
GlobalPushdownConfig.enabled = falseis set ininit()(line 33) but never restored. UnlikeCalciteNoPushdownIT, which properly saves and restores this state using@BeforeClass/@AfterClass,CalciteExplainITleaves the global configuration disabled for all subsequent tests. Add an@AfterClassmethod to restore the original state, similar toCalciteNoPushdownIT(lines 111–120).core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2)
1328-1332: Early return for SEMI/ANTI joins is correct.The early return for SEMI and ANTI joins before processing the
maxoption and applying the system limit is intentional and correct. These join types only use the left side to filter records based on the existence of matches in the right side, so varyingmaxvalues or applying subsearch limits on the right side would not change the results.Based on learnings, this behavior is documented and verified as correct.
1373-1374: Consistent system limit application across join paths.The placement of
addSysLimitForJoinSubsearchafter dedup and before the join operation is correct and consistent with the join-with-field-list path at line 1316. This ensures uniform behavior for both join grammar variants.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Lantao Jin <[email protected]>
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4929-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 4bf5c9c776e7f8cb11714d68fbc2c9163475ef23
# Push it to GitHub
git push --set-upstream origin backport/backport-4929-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
| EnumerableCalc(expr#0..18=[{inputs}], expr#19=[IS NOT NULL($t6)], proj#0..18=[{exprs}], $condition=[$t19]) | ||
| EnumerableLimit(fetch=[50000]) | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]]) | ||
| EnumerableLimit(fetch=[50000]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of system limit has changed from limitation of source to limitation of the results after top hits.
So if we cannot push down the window, it will scan all rows from the source. @LantaoJin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I corrected the behavior since the define of plugins.ppl.join.subsearch_maxout is
The size configures the maximum of rows from subsearch to join against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we ensure the window will always be pushed down? Otherwise it will get regression than before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No if (1) the join keys contain text (not keyword); (2) the join keys contain expression (I am working on support #4789 which could resolve it)
…project#4929) * Pushdown join with max=n option to TopHits aggregation Signed-off-by: Lantao Jin <[email protected]> * Fix UT Signed-off-by: Lantao Jin <[email protected]> * Stabilize the dashboard IT Signed-off-by: Lantao Jin <[email protected]> * address comment Signed-off-by: Lantao Jin <[email protected]> --------- Signed-off-by: Lantao Jin <[email protected]> (cherry picked from commit 4bf5c9c)
* Pushdown join with max=n option to TopHits aggregation * Fix UT * Stabilize the dashboard IT * address comment --------- (cherry picked from commit 4bf5c9c) Signed-off-by: Lantao Jin <[email protected]>
Description
Pushdown join with
max=noption to TopHits aggregation:max=nwill be converted to TopHits aggregation.SortMergeJoinmay be converted toHashJoinby reordering the sides of joinWindowFunctionin memory.Related Issues
Resolves #4927
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.