-
Notifications
You must be signed in to change notification settings - Fork 180
Support pushdown dedup with expression #4957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughRefactors dedup pushdown to support expressions and renames, changes aggregation argument model to Pair<RexNode,String>, adds field-script infrastructure and TopHits field merging, enables the dedup-expr rule, introduces EnumerableNestedAggregate and rule, and expands/activates many integration tests and expected-plan fixtures. Changes
Sequence Diagram(s)sequenceDiagram
participant Planner as Calcite Planner
participant Rule as DedupPushdownRule
participant Analyzer as AggregateAnalyzer
participant Engine as CalciteScriptEngine
participant FieldFactory as CalciteFieldScriptFactory
participant LeafFactory as CalciteFieldScriptLeafFactory
participant OpenSearch as OpenSearch
Planner->>Rule: Match dedup pattern (may include expr-project)
alt expr-project present
Rule->>Rule: extract & reorder dedup columns (advanceDedupColumns)
Rule->>Analyzer: build aggregation args as Pair<RexNode,String>
else no expr-project
Rule->>Analyzer: build aggregation args from projectless path
end
Analyzer->>Engine: request aggregation/script builders using paired args
Engine->>FieldFactory: create FieldScript factory (FieldScript context)
FieldFactory->>LeafFactory: newFactory(params, lookup) per shard
LeafFactory->>OpenSearch: field scripts invoked during aggregation/top_hits execution
OpenSearch-->>Planner: return aggregated/top_hits results (with computed fields)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas needing focused review:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (5)
opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java (1)
93-98: Use parameterized type to avoid raw type warning.Line 95 uses a raw
Maptype which causes an unchecked assignment warning. Since the return type isMap<String, Object>, use proper generics for type safety.return Arrays.stream(hits) - .<Map<String, Object>>map( - hit -> { - Map map = new LinkedHashMap<>(hit.getSourceAsMap()); - hit.getFields().values().forEach(f -> map.put(f.getName(), f.getValue())); - return map; - }) + .map( + hit -> { + Map<String, Object> map = new LinkedHashMap<>(hit.getSourceAsMap()); + hit.getFields().values().forEach(f -> map.put(f.getName(), f.getValue())); + return map; + }) .toList();opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java (2)
16-36: Missing JavaDoc on public methods.Per coding guidelines, public methods should have proper JavaDoc documentation. The
isResultDeterministic()andnewFactory()methods lack documentation explaining their behavior.+ /** + * Indicates whether the script results are deterministic and can be cached. + * + * @return true, as Calcite script results are cacheable + */ @Override public boolean isResultDeterministic() { // This implies the results are cacheable return true; } + /** + * Creates a new leaf factory for generating field scripts. + * + * @param params Script parameters + * @param lookup Search lookup for accessing document fields + * @return A new CalciteFieldScriptLeafFactory instance + */ @Override public FieldScript.LeafFactory newFactory(Map<String, Object> params, SearchLookup lookup) { return new CalciteFieldScriptLeafFactory(function, params, lookup); }
23-25: Remove unused constructor parametertype.The
RelDataType typeparameter is accepted but never used or stored. While this parameter is part of theBiFunctioninterface contract inCalciteScriptEngine, it creates dead code in this factory. Note thatCalciteAggregationScriptFactorydoes store and use thetypeparameter (passing it to its leaf factory), whereasCalciteFieldScriptFactory,CalciteFilterScriptFactory, and both sort script factories do not.Remove the parameter to eliminate dead code:
- public CalciteFieldScriptFactory(Function1<DataContext, Object[]> function, RelDataType type) { + public CalciteFieldScriptFactory(Function1<DataContext, Object[]> function) { this.function = function; }This requires coordinating changes to the
BiFunctionsignature inCalciteScriptEngineand updating all related factory instantiations.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (2)
173-197: O(n²) complexity inadvanceDedupColumnsdue to repeatedcontains()calls.The sorting comparator and mapping both call
dedupColumnIndices.contains(i)for each element, resulting in O(n*m) complexity where n is the column count and m is the dedup column count. For large schemas this could be inefficient.Convert
dedupColumnIndicesto aSetfor O(1) lookups:private static List<Pair<RexNode, String>> advanceDedupColumns( List<Pair<RexNode, String>> originalColumnList, List<Integer> dedupColumnIndices, List<String> dedupColumnNames) { + Set<Integer> dedupIndexSet = new HashSet<>(dedupColumnIndices); List<Pair<RexNode, String>> reordered = IntStream.range(0, originalColumnList.size()) .boxed() .sorted( (i1, i2) -> { - boolean in1 = dedupColumnIndices.contains(i1); - boolean in2 = dedupColumnIndices.contains(i2); + boolean in1 = dedupIndexSet.contains(i1); + boolean in2 = dedupIndexSet.contains(i2); return in1 == in2 ? i1 - i2 : in2 ? 1 : -1; }) .map( i -> { Pair<RexNode, String> original = originalColumnList.get(i); - if (dedupColumnIndices.contains(i)) { + if (dedupIndexSet.contains(i)) { int dedupIndex = dedupColumnIndices.indexOf(i); return Pair.of(original.getKey(), dedupColumnNames.get(dedupIndex)); } return original; }) .toList(); return reordered; }Note: The
indexOfcall at line 190 would still need the original list to get the correct index for name lookup.
49-61: Consider extracting magic numbers 5 and 6 to named constants.The
rels.lengthchecks against 5 and 6 are not self-documenting. Named constants would improve readability.+ private static final int RELS_LENGTH_WITHOUT_EXPR = 5; + private static final int RELS_LENGTH_WITH_EXPR = 6; + @Override protected void onMatchImpl(RelOptRuleCall call) { final LogicalProject finalProject = call.rel(0); // TODO Used when number of duplication is more than 1 final LogicalFilter numOfDedupFilter = call.rel(1); final LogicalProject projectWithWindow = call.rel(2); - if (call.rels.length == 5) { + if (call.rels.length == RELS_LENGTH_WITHOUT_EXPR) { final CalciteLogicalIndexScan scan = call.rel(4); apply(call, finalProject, numOfDedupFilter, projectWithWindow, null, scan); - } else if (call.rels.length == 6) { + } else if (call.rels.length == RELS_LENGTH_WITH_EXPR) { final LogicalProject projectWithExpr = call.rel(4); final CalciteLogicalIndexScan scan = call.rel(5); apply(call, finalProject, numOfDedupFilter, projectWithWindow, projectWithExpr, scan); } else { throw new AssertionError( String.format( "The length of rels should be %s but got %s", this.operands.size(), call.rels.length)); } }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (37)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java(3 hunks)integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1_alternative.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2_alternative.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2_alternative.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java(8 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java(7 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java(3 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java(1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.javaopensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.javainteg-test/src/test/java/org/opensearch/sql/util/MatcherUtils.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.javaopensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
⚙️ CodeRabbit configuration file
**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure methods are under 20 lines with single responsibility
- Verify proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.javaopensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.javainteg-test/src/test/java/org/opensearch/sql/util/MatcherUtils.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.javaopensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify test coverage for new business logic
- Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Verify SQL generation and optimization paths
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧠 Learnings (6)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yamlopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.javainteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yamlopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.javainteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yamlinteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
📚 Learning: 2025-12-11T05:27:39.831Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.831Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yamlinteg-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yamlopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
🧬 Code graph analysis (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (1)
EqualsAndHashCode(19-45)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1)
Value(44-214)core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggGroupMergeRule.java (1)
Value(43-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
🔇 Additional comments (36)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml (1)
1-11: LGTM!This test fixture correctly captures the expected Calcite explain plan for dedup with expression pushdown. The logical plan properly shows the
ROW_NUMBER() OVER (PARTITION BY $17)for dedup on the computednew_gendercolumn, and the physical plan demonstrates successful pushdown with OpenSearch composite aggregation using scripted terms for theLOWERfunction.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml (1)
11-11: Verify the expanded _source field list is intentional.The
_source.includesnow requests system metadata fields (_id,_index,_score,_maxscore,_sort,_routing) alongside regular document fields. These metadata fields are typically not stored in_sourceand are accessed via separate hit methods. Please verify this expanded field list is intentional and doesn't cause issues when these fields aren't found in the source document.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml (1)
1-12: LGTM!This test fixture correctly validates the complex dedup expression scenario with multiple computed columns (
new_gender,new_state). The logical plan appropriately shows the sort before dedup pattern withROW_NUMBER() OVER (PARTITION BY $4, $5), and the physical plan demonstrates successful pushdown with composite aggregation using two scripted terms sources for bothLOWERexpressions.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml (1)
1-11: LGTM!This test fixture correctly validates the scenario where dedup operates on regular fields (
age,account_number) while the query also contains computed expressions (new_gender,new_state). The physical plan appropriately pushes down the aggregation with raw field grouping while usingscript_fieldsintop_hitsto compute the expression-based columns. This aligns well with the PR objective of supporting "dedup when other columns contain expressions."integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml (1)
1-11: LGTM! Test fixture expands plan validation coverage.This alternative expected output YAML provides a valid execution plan variant for dedup-with-expression scenarios, enabling more robust test assertions via the new fallback YAML comparison helper.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml (1)
1-11: LGTM! Test fixture adds dedup-with-expression coverage.This expected output YAML correctly represents the Calcite explain plan for dedup operations with expressions, supporting the expanded pushdown capability.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml (1)
10-10: LGTM! Test fixture updated to reflect pushdown refactor.The PushDownContext now correctly reflects the aggregation-driven pushdown strategy with expanded source field projections, consistent with the dedup expression enhancement.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml (1)
1-14: LGTM! Test fixture adds sort+dedup-expression coverage.This expected output correctly represents the Calcite plan for dedup operations with expressions and sorting, extending test coverage for the enhanced pushdown capability.
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (1)
60-60: LGTM! Enables expression-aware dedup pushdown.Adding DEDUP_EXPR_PUSH_DOWN to the rule set correctly activates the dedup-with-expression pushdown capability, directly fulfilling the PR's core objective.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java (1)
73-73: LGTM! Extends script contexts to support field scripts.The addition of FieldScript context registration follows the existing pattern for other script types and enables field-level scripting required for expression-aware dedup pushdown.
Also applies to: 82-82, 115-115
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1_alternative.yaml (1)
1-12: LGTM! Alternative fixture supports non-deterministic plan validation.This alternative expected output enables robust testing when multiple valid plan variants can be generated for the same dedup-with-expression query.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml (1)
1-11: LGTM!The expected plan correctly represents a
dedup 2 new_gender, new_statescenario with expression-based columns. The logical plan shows appropriate ROW_NUMBER windowing with partition by the computed columns, and the physical plan demonstrates successful pushdown with composite aggregation using scripted terms andtop_hitswith size 2.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml (1)
1-11: LGTM!The expected plan correctly represents a
dedup 1 new_gender, new_statescenario with a reduced column set. Thetop_hitssize of 1 and filter condition<=($6, 1)properly reflect dedup 1 semantics.integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (1)
21-21: LGTM!The updated pushdown context correctly reflects the shift to an AGGREGATION-based strategy for the right side of the merge join with
max=1option. The composite aggregation onlastnamewithtop_hitssize 1 properly implements the max option semantics for join deduplication.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml (1)
1-11: LGTM!The expected plan correctly represents a dedup scenario where the dedup key (
age) is a simple field, but the output includes a computed expression (new_gender = LOWER(gender)). The aggregation pushdown correctly groups byage, and thescript_fieldsconfiguration properly computesnew_genderwithin thetop_hitsaggregation.integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml (1)
19-20: LGTM!The updated plan correctly demonstrates the aggregation-based pushdown for the join with
max=1option. The first scan handles the dedup/max semantics via composite aggregation onaccount_numberwithtop_hitssize 1, while the second scan performs a simple projection.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml (1)
11-11: Review the selective _source fields in the AGGREGATION pushdown strategy.Best practices recommend using _source filtering to retrieve only necessary fields and limiting the size of top_hits to reduce memory usage. The current
_source.includeslist contains all account fields plus internal metadata fields (_id,_index,_score,_maxscore,_sort,_routing), though the logical plan only projectsaccount_number,gender,age,state. Using large _source includes can impact performance. Consider narrowing the includes list to only required fields for the aggregation-based pushdown.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml (1)
1-11: LGTM - AGGREGATION-centric pushdown for dedup is correctly structured.The shift from PROJECT-centric to AGGREGATION-centric pushdown aligns with the PR's objective to support dedup with expressions. The expanded
_sourcefield set including internal metadata fields (_id,_index,_score, etc.) is appropriate for supporting expression evaluation in the pushdown context.integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml (1)
16-24: LGTM - Asymmetric pushdown strategies for join with max option are correctly applied.The physical plan appropriately uses different pushdown strategies for each side:
- Left side: PROJECT + SORT for merge join ordering
- Right side: AGGREGATION pushdown with
top_hitsfor themaxoption dedup behaviorThis aligns with the PR's implicit support for join with max option when expressions are present.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml (1)
1-13: LGTM - New test fixture for dedup with expressions is well-structured.The plan correctly demonstrates:
- Dedup by original fields (
gender,state) withtop_hitssize=2- Expression computation (
LOWER) viascript_fieldswith Calcite-encoded scripts- Sort order alignment between logical (
ASC-nulls-first,DESC-nulls-last) and physical plansinteg-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml (1)
1-11: LGTM - Aggregation-first pushdown for complex dedup scenario.The updated physical plan correctly uses AGGREGATION pushdown with an expanded
_sourcefield set, consistent with the PR's pushdown strategy changes.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (1)
1-13: LGTM - Alternative plan variant for optimizer non-determinism.This alternative fixture accounts for non-deterministic ordering of
script_fields(new_state,new_gendervsnew_gender,new_state). Both plans are semantically equivalent, and having alternatives ensures test stability.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml (1)
1-11: LGTM - Alternative fixture for expression-based dedup pushdown.This correctly demonstrates deduplication where the grouping keys themselves are expressions (
LOWER(gender),LOWER(state)). The composite aggregation uses script-based terms for the grouping, which is the correct approach for expression-based dedup pushdown.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml (1)
1-11: LGTM - Alternative fixture for expression-based dedup with single row retention.The plan correctly implements dedup by expression columns (
new_gender,new_state) withtop_hitssize=1, demonstrating proper pushdown ofdedup 1with expression-based grouping keys.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml (1)
1-10: LGTM!The updated expected output correctly reflects the AGGREGATION pushdown with expanded _source fields, aligning with the PR's dedup expression support enhancements.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2_alternative.yaml (1)
1-11: LGTM!The alternative expected output correctly captures the dedup-with-expression scenario using Calcite field scripts, supporting broader test coverage for the pushdown functionality.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2_alternative.yaml (1)
1-12: LGTM!The alternative expected output correctly represents the complex dedup expression scenario with sort and aggregation pushdown, expanding test coverage.
opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java (3)
340-371: LGTM! Signature change enables richer argument metadata.The conversion from
List<RexNode>toList<Pair<RexNode, String>>is correctly implemented, allowing aggregation arguments to carry both the expression and its field name through the analysis pipeline.
373-402: Verify consistent accessor pattern throughout.All argument accessors now use
args.getFirst().getKey()orargs.get(i).getKey()to retrieve the RexNode, which is consistent with the new Pair structure.
566-597: LITERAL_AGG handling correctly uses both Pair components.The implementation properly extracts RexNode via
rex.getKey()for type checks and String viarex.getValue()for field names, demonstrating correct usage of the Pair structure in the LITERAL_AGG/dedup pushdown path.integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml (1)
1-12: LGTM!The alternative expected output correctly captures another valid execution path for the dedup expression scenario, enhancing test robustness.
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml (1)
1-12: LGTM!The primary expected output correctly represents the complex dedup expression scenario. The script_fields order difference from the alternative variant justifies having multiple valid expected plans.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)
2023-2058: LGTM! Comprehensive dedup expression pushdown tests.The test method properly validates dedup pushdown with expressions across multiple scenarios. The use of alternative expected plans via
assertYamlEqualsIgnoreId(expected, alternative, actual)is appropriate for handling optimizer plan variability.
2101-2124: LGTM! Good coverage of complex dedup-then-sort scenarios.The two scenarios effectively test the ordering of operations: dedup followed by sort vs. sort embedded in dedup pushdown. Both primary and alternative expected plans are validated.
2126-2161: LGTM! New test validates dedup pushdown when expressions exist but aren't dedup columns.This test correctly covers the case where expressions are computed (e.g.,
new_gender = lower(gender)) but the dedup operates on different columns (e.g.,age,account_number). The alternative plan coverage ensures robustness against optimizer variations.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (1)
282-291: Good refactoring:tableScanCheckerimproves readability.Extracting the table scan validation logic into a named method with a clear Javadoc comment improves code maintainability and explains the constraint about project not being pushed.
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
Show resolved
Hide resolved
...rch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
Show resolved
Hide resolved
...rch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
Show resolved
Hide resolved
...rch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
Show resolved
Hide resolved
...n/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
Outdated
Show resolved
Hide resolved
...n/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
Show resolved
Hide resolved
...n/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
Show resolved
Hide resolved
Signed-off-by: Lantao Jin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (1)
84-95: UnsafeOptional.get()still present.Line 94 calls
.get()on an Optional without verification, which will throwNoSuchElementExceptionif no matching INPUT_REF is found for a dedup column index. This can occur if the project doesn't contain an INPUT_REF at the expected index.Apply this diff to provide a clear error message:
.map( i -> projectWithWindow.getNamedProjects().stream() .filter(pair -> pair.getKey().isA(SqlKind.INPUT_REF)) .filter(pair -> ((RexInputRef) pair.getKey()).getIndex() == i) .map(Pair::getValue) .findFirst() - .get()) + .orElseThrow( + () -> + new IllegalStateException( + "Dedup column at index " + i + " not found in project"))) .toList();opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (1)
18-18: Fix grammatical error in class JavaDoc.The verb should agree with the singular subject: "produce" → "produces".
Apply this diff:
-/** Calcite script executor that produce dynamic values. */ +/** Calcite script executor that produces dynamic values. */
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java(8 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.javainteg-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
⚙️ CodeRabbit configuration file
**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure methods are under 20 lines with single responsibility
- Verify proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.javainteg-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify test coverage for new business logic
- Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.javaopensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : All public classes and methods must have proper JavaDoc
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Flag missing JavaDoc on public APIs during code review
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
📚 Learning: 2025-12-11T05:27:39.831Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.831Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java (1)
EqualsAndHashCode(17-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (11)
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)
450-467: LGTM! JavaDoc concern from previous review properly addressed.The comprehensive JavaDoc now clearly documents the method's purpose, parameters, fallback behavior, and exception handling. The implementation correctly tries the primary expected YAML first and falls back to the secondary on failure, which aligns well with the test scenarios where multiple valid plan variants can occur.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.java (1)
35-35: LGTM!Documentation punctuation fix improves Javadoc formatting.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java (1)
33-33: LGTM!Documentation punctuation fix improves Javadoc formatting.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (2)
29-48: LGTM!Constructor JavaDoc is clear and properly documents all parameters.
50-58: LGTM!The bounds check for array access properly handles null and empty results, addressing the previously flagged ArrayIndexOutOfBoundsException risk.
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (6)
8-8: LGTM!The standard Guava import is now correctly used, addressing the previous inconsistency with the rest of the codebase.
49-62: LGTM!The two-path handling correctly distinguishes between dedup with and without expression projects based on the operand count.
110-136: LGTM!The two-path logic correctly handles both scenarios:
- When no expression project exists, it creates a new reordered project from the scan
- When an expression project exists, it reorders that project
The use of
advanceDedupColumnsensures consistent column ordering in both paths.
160-165: LGTM!Clean extraction of INPUT_REF indices from RexNode list.
167-199: LGTM!The
advanceDedupColumnsmethod correctly reorders columns to place dedup columns first while:
- Maintaining relative order within dedup and non-dedup groups
- Renaming dedup columns to their target names
- Preserving non-dedup column names
The sorting comparator logic is correct: returns -1 when only i1 is a dedup column (moves it earlier), 1 when only i2 is a dedup column, and preserves original order otherwise.
284-293: LGTM!The
tableScanCheckerpredicate correctly validates that no limit, aggregate, or project has been pushed, ensuring the scan is in a clean state before applying dedup pushdown.As per coding guidelines, the method clearly documents the rationale: preventing loss of expression names after project pushdown.
...java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java
Show resolved
Hide resolved
Signed-off-by: Lantao Jin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java (1)
43-52: Type safety issue from previous review remains unaddressed.The unchecked cast of
params.get(SOURCES)toList<Integer>on line 49 still lacks validation and could throwClassCastExceptionorNullPointerExceptionat runtime if the parameter is missing or has the wrong type.Apply this diff to add safe type checking:
public CalciteFieldScriptLeafFactory( Function1<DataContext, Object[]> function, Map<String, Object> params, SearchLookup lookup) { this.function = function; this.params = params; this.lookup = lookup; this.parametersToIndex = - IntStream.range(0, ((List<Integer>) params.get(SOURCES)).size()) + IntStream.range(0, getSources(params).size()) .boxed() .collect(Collectors.toMap(i -> "?" + i, i -> i)); } + + @SuppressWarnings("unchecked") + private static List<Integer> getSources(Map<String, Object> params) { + Object sources = params.get(SOURCES); + if (sources == null) { + throw new IllegalArgumentException("SOURCES parameter is required but was null"); + } + if (!(sources instanceof List)) { + throw new IllegalArgumentException( + "SOURCES parameter must be a List, but was: " + sources.getClass().getName()); + } + return (List<Integer>) sources; + }
🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
2084-2123: Clarify the comment about rename behavior.The comment on line 2086 states "rename changes nothing, reuse the same yaml files," which could be misinterpreted. Consider revising to:
- // rename changes nothing, reuse the same yaml files + // Rename doesn't affect the final optimized plan; expected outputs match testDedupExprThis clarifies that the rename operation is tested and works correctly, producing the same optimized plan as the eval-only version.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java(3 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⚙️ CodeRabbit configuration file
**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure methods are under 20 lines with single responsibility
- Verify proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify test coverage for new business logic
- Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Verify SQL generation and optimization paths
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)
2125-2148: LGTM! Good coverage of complex dedup scenarios.This new test method effectively validates the interaction between rename, dedup, and sort operations, covering both sort-before-dedup and dedup-before-sort patterns. The test scenarios align well with the PR objectives to support expression-based dedup pushdown.
2150-2185: LGTM! Comprehensive coverage of dedup with expressions in non-dedup columns.This new test method addresses an important use case from the PR objectives: supporting pushdown when expressions exist in non-dedup columns. The four scenarios provide thorough validation of this functionality.
2047-2082: No actionable concerns identified.The use of alternative expected plan files is an intentionally-designed testing pattern, not a maintenance concern. The
assertYamlEqualsIgnoreIdmethod explicitly supports multiple expected outputs with documented fallback semantics for cases where "the DSL implementation can produce multiple valid plan variants." This pattern correctly accommodates Calcite's legitimate optimization variations.Likely an incorrect or invalid review comment.
...n/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
Show resolved
Hide resolved
| .<Map<String, Object>>map(hit -> new LinkedHashMap<>(hit.getSourceAsMap())) | ||
| .<Map<String, Object>>map( | ||
| hit -> { | ||
| Map map = new LinkedHashMap<>(hit.getSourceAsMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use var or Map<String, Object>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition is in L93 .<Map<String, Object>>map(. But yes, I can add Map<String, Object> here and remove L93. The reason use L93 instead of Map<String, Object> is we used unnamed variable, but now it has to define a name.
opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java
Show resolved
Hide resolved
| List<Pair<RexNode, String>> namedPairFromScan = | ||
| Streams.zip(columnsFromScan.stream(), colNamesFromScan.stream(), Pair::new).toList(); | ||
| List<Pair<RexNode, String>> reordered = | ||
| advanceDedupColumns(namedPairFromScan, dedupColumnIndices, dedupColumnNames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the intuition / reason for reordering the columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why I suggested you to use aggregateWithTrimming() instead of calling the relBuilder.aggregate() directly. In SQL, the Project added for Aggregate always keeps the group columns in the front of others, it means the group keys set always contain {0}, or there could be trigger a Calcite bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it as well. Since the aggregate is going to be pushed into scan and the final generated RelNode is a single Scan operator, will it really trigger the bug in Calcite?
Do we do similar thing in the previous PR for supporting dedup push down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it as well. Since the
aggregateis going to be pushed into scan and the final generated RelNode is a singleScanoperator, will it really trigger the bug in Calcite?Do we do similar thing in the previous PR for supporting dedup push down?
Yes. I see bugs when the group key columns are not in the front of child Project in Calcite in developing support Aggregate with Calcite. Not sure is it fixed or not. Check the comment here
| trimmedRefs.addAll(PlanUtils.getInputRefs(resolvedGroupByList)); // group-by keys first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That happens in CalciteRelNodeVisitor, but is similar change needed in the push down process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic has no diff with CalciteRelNodeVisitor. It is building an aggregate via relbuilder. I didn't test if I remove the reorder logic. It may not easy to trigger the bug in dedup, but for safe, I added the reorder logic
| .filter(pair -> ((RexInputRef) pair.getKey()).getIndex() == i) | ||
| .map(Pair::getValue) | ||
| .findFirst() | ||
| .get()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this be simplified to
dedupColumnIndices.stream()
.map(projectWithWindow.getInput().getRowType().getFieldNames()::get)
.toList()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. above code doesn't cover the rename case:
source=test | rename status as http_status | dedup http_status | fields http_status | sort http_status
| if (LOG.isDebugEnabled()) { | ||
| LOG.debug("Cannot pushdown the dedup since the dedup columns contain RexCall"); | ||
| } | ||
| if (dedupColumnIndices.size() != dedupColumnNames.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this ever happened since dedupColumnNames is derived from dedupColumnIndices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am 100% sure they have same size. Better to keep this check.
| List<Pair<RexNode, String>> namedPairFromScan = | ||
| Streams.zip(columnsFromScan.stream(), colNamesFromScan.stream(), Pair::new).toList(); | ||
| List<Pair<RexNode, String>> reordered = | ||
| advanceDedupColumns(namedPairFromScan, dedupColumnIndices, dedupColumnNames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it as well. Since the aggregate is going to be pushed into scan and the final generated RelNode is a single Scan operator, will it really trigger the bug in Calcite?
Do we do similar thing in the previous PR for supporting dedup push down?
| * @return the converted Pair<RexNode, String> list | ||
| */ | ||
| private static List<RexNode> convertAggArgThroughProject(AggregateCall aggCall, Project project) { | ||
| private static List<org.apache.calcite.util.Pair<RexNode, String>> convertAggArgThroughProject( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already import org.apache.commons.lang3.tuple.Pair imported. Could it be switched to that or at least unified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. will convert to common.lang3's Pair
Signed-off-by: Lantao Jin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
core/src/main/java/org/opensearch/sql/utils/Utils.java (1)
15-21: PreferArrayListand indexed for-loop for better performance.The current implementation uses
LinkedListand anIterator, which has higher memory overhead and is less idiomatic. Consider usingArrayListwith an indexed for-loop for better performance and clarity.🔎 Simplified implementation
- LinkedList<Pair<I, Integer>> result = new LinkedList<>(); - Iterator<I> iter = input.iterator(); - int index = 0; - while (iter.hasNext()) { - result.add(Pair.of(iter.next(), index++)); + List<Pair<I, Integer>> result = new ArrayList<>(input.size()); + for (int i = 0; i < input.size(); i++) { + result.add(Pair.of(input.get(i), i)); }opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java (1)
60-64: Preserve exception stack trace in debug logging.Using
e.toString()loses the stack trace, which is valuable for debugging. Pass the exception directly to include the full stack trace.Suggested fix
} catch (InvalidRelException e) { if (LOG.isDebugEnabled()) { - LOG.debug(e.toString()); + LOG.debug("Cannot convert aggregate to EnumerableNestedAggregate", e); } return null; }opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java (1)
27-30: Consider clarifying the JavaDoc.The class documentation links to OpenSearch's "nested aggregation" for nested documents, but based on the PR context (dedup pushdown with expressions), this class appears to handle aggregations that are structurally nested (sub-aggregations) rather than aggregations on nested document types. The naming and documentation could cause confusion.
Consider updating the JavaDoc to clarify whether this is for:
- Aggregations on nested document fields, or
- Aggregations that will be pushed down as sub-aggregations in OpenSearch queries
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/src/main/java/org/opensearch/sql/utils/Utils.java(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_push.yaml(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java(7 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.javacore/src/main/java/org/opensearch/sql/utils/Utils.java
⚙️ CodeRabbit configuration file
**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure methods are under 20 lines with single responsibility
- Verify proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.javaopensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.javaopensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.javacore/src/main/java/org/opensearch/sql/utils/Utils.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_push.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.
Applied to files:
integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_push.yaml
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java (1)
EnumerableNestedAggregate(31-85)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java (2)
340-373: LGTM - Consistent Pair-based argument handling.The refactoring from
List<RexNode>toList<Pair<RexNode, String>>is implemented correctly. The method now returns pairs containing both the expression node and its associated field name, which enables proper handling of script fields in aggregations (particularly visible in the LITERAL_AGG case at lines 581-595 whererex.getValue()provides the script field name).
376-605: Mechanical updates correctly applied across aggregation builders.The aggregation builder methods (
createDistinctAggregation,createRegularAggregation) correctly use.getKey()to extractRexNodefrom the pairs. The pattern is consistently applied across all aggregation types including AVG, SUM, COUNT, MIN, MAX, variance/stddev functions, ARG_MAX/ARG_MIN, TAKE, FIRST, LAST, and PERCENTILE_APPROX.opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java (1)
21-48: LGTM - Rule implementation follows Calcite patterns correctly.The rule correctly:
- Configures conversion from
LogicalAggregatewithConvention.NONEtoEnumerableConvention- Uses hint-based filtering to selectively apply the conversion only when "stats_args" hint is present
- Gracefully returns
nullwhen conversion cannot be performedopensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java (1)
33-57: LGTM - Constructor validation is appropriate.The constructor correctly validates that:
- No distinct aggregations are used (not supported)
- No within-distinct aggregations are used (not supported)
- Each aggregation has a registered implementor in
RexImpTableThis follows the pattern established in other Calcite enumerable aggregates and provides clear error messages.
integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_push.yaml
Outdated
Show resolved
Hide resolved
.../src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Lantao Jin <[email protected]>
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4957-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 cbcdbd6fc918e4a356480300c208aa76f468fbf1
# Push it to GitHub
git push --set-upstream origin backport/backport-4957-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
* Support pushdown dedup with expression Signed-off-by: Lantao Jin <[email protected]> * Fix java spotless check Signed-off-by: Lantao Jin <[email protected]> * Support dedup with expression Signed-off-by: Lantao Jin <[email protected]> * fix yaml test Signed-off-by: Lantao Jin <[email protected]> * simplify Signed-off-by: Lantao Jin <[email protected]> * address comments Signed-off-by: Lantao Jin <[email protected]> * fix javadoc Signed-off-by: Lantao Jin <[email protected]> * address comments Signed-off-by: Lantao Jin <[email protected]> * delete unexpected files Signed-off-by: Lantao Jin <[email protected]> --------- Signed-off-by: Lantao Jin <[email protected]> (cherry picked from commit cbcdbd6)
* Support pushdown dedup with expression * Fix java spotless check * Support dedup with expression * fix yaml test * simplify * address comments * fix javadoc * address comments * delete unexpected files --------- (cherry picked from commit cbcdbd6) Signed-off-by: Lantao Jin <[email protected]>
Description
Support pushdown
dedupwith expression:| eval new_gender = lower(gender), new_name = lower(name) | dedup 2 new_gender, new_name| eval new_gender = lower(gender), new_name = lower(name) | dedup 2 gender, nameThis PR also implicitly support pushdown
join with max optionwith expressionssource = t1 | eval new_gender = lower(gender) | join new_gender [source = t2 | eval new_gender = lower(gender) ]source = t1 | eval new_gender = lower(gender) | join gender [source = t2 | eval new_gender = lower(gender) ]Related Issues
Resolves #4789
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.