Skip to content

Conversation

@LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Dec 15, 2025

Description

Support pushdown dedup with expression:

  • the dedup columns contain expressions
    • | eval new_gender = lower(gender), new_name = lower(name) | dedup 2 new_gender, new_name
  • the other columns contain expressions
    • | eval new_gender = lower(gender), new_name = lower(name) | dedup 2 gender, name

This PR also implicitly support pushdown join with max option with expressions

  • the join keys contain expressions
    • source = t1 | eval new_gender = lower(gender) | join new_gender [source = t2 | eval new_gender = lower(gender) ]
  • the other columns contain expresssions
    • source = t1 | eval new_gender = lower(gender) | join gender [source = t2 | eval new_gender = lower(gender) ]

Related Issues

Resolves #4789

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@LantaoJin LantaoJin added the enhancement New feature or request label Dec 15, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 15, 2025

📝 Walkthrough

Summary by CodeRabbit

  • Tests
    • Reactivated and expanded dedup integration tests with many new expected-plan variants; matcher now accepts alternate expected YAMLs.
  • Query Optimization
    • Enabled dedup-with-expression pushdown and refined aggregation/dedup planning; explain outputs updated to include expanded field lists and per-hit field values.
  • New Capabilities
    • Added field-level scripting support and new nested-aggregation physical/rule support for richer query execution.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Refactors dedup pushdown to support expressions and renames, changes aggregation argument model to Pair<RexNode,String>, adds field-script infrastructure and TopHits field merging, enables the dedup-expr rule, introduces EnumerableNestedAggregate and rule, and expands/activates many integration tests and expected-plan fixtures.

Changes

Cohort / File(s) Summary
Tests & matcher
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java, integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
Activates previously ignored dedup tests, adds new dedup-with-expression tests, and adds an overloaded assertYamlEqualsIgnoreId(String, String, String) to accept an alternative expected-plan fixture.
Expected-plan fixtures
integ-test/src/test/resources/expectedOutput/calcite/*
Adds many new expected YAMLs for dedup-with-expression scenarios and alternative variants; updates multiple existing fixtures to reflect AGGREGATION-first pushdown and expanded _source/top_hits fields.
Dedup pushdown rule
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
Refactors apply() to accept an optional expression LogicalProject, implements dedup-column index extraction and reordering (advanceDedupColumns), replaces prior RexCall checks with nullable-project flow, and updates predicates/imports.
Rule enablement
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
Enables the DEDUP_EXPR push-down rule in the index-scan rule set.
Aggregation arg model
opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
Changes aggregation argument representation from List<RexNode> to List<Pair<RexNode,String>> and updates all downstream aggregation construction and inference sites.
Top-hits response handling
opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java
Merges per-hit field values into result maps in non-single-value paths, potentially overriding _source keys on collision.
Calcite script engine
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
Registers FieldScript.CONTEXT -> CalciteFieldScriptFactory::new to support field-script context.
Field-script infra (new classes)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java, .../CalciteFieldScriptFactory.java, .../CalciteFieldScriptLeafFactory.java
Adds field-script implementation, factory, and per-leaf factory to execute Calcite-derived field scripts with parameter-index mapping.
Enumerable nested aggregate
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java, opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java
Adds new physical node and a ConverterRule to convert LogicalAggregate to EnumerableNestedAggregate when hinted; includes validation and cost override.
Utilities
core/src/main/java/org/opensearch/sql/utils/Utils.java
Adds Utils.zipWithIndex(List<I>) helper returning paired elements with indices.
Docs/javadoc tweaks
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.java, .../filter/CalciteFilterScriptLeafFactory.java
Minor Javadoc/comment adjustments.

Sequence Diagram(s)

sequenceDiagram
    participant Planner as Calcite Planner
    participant Rule as DedupPushdownRule
    participant Analyzer as AggregateAnalyzer
    participant Engine as CalciteScriptEngine
    participant FieldFactory as CalciteFieldScriptFactory
    participant LeafFactory as CalciteFieldScriptLeafFactory
    participant OpenSearch as OpenSearch

    Planner->>Rule: Match dedup pattern (may include expr-project)
    alt expr-project present
        Rule->>Rule: extract & reorder dedup columns (advanceDedupColumns)
        Rule->>Analyzer: build aggregation args as Pair<RexNode,String>
    else no expr-project
        Rule->>Analyzer: build aggregation args from projectless path
    end
    Analyzer->>Engine: request aggregation/script builders using paired args
    Engine->>FieldFactory: create FieldScript factory (FieldScript context)
    FieldFactory->>LeafFactory: newFactory(params, lookup) per shard
    LeafFactory->>OpenSearch: field scripts invoked during aggregation/top_hits execution
    OpenSearch-->>Planner: return aggregated/top_hits results (with computed fields)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing focused review:

  • DedupPushdownRule.java: nullable projectWithExpr handling, dedup index extraction, advanceDedupColumns correctness, group-key construction.
  • AggregateAnalyzer.java: pervasive change to Pair<RexNode,String> across aggregation branches.
  • TopHitsParser.java: merging per-hit fields into source map and collision semantics.
  • New field-script classes and CalciteScriptEngine registration: parameter-index mapping and per-leaf lifecycle.
  • Tests/fixtures: many new/alternative YAMLs — verify plan shapes and encoded script contents.

Possibly related PRs

Suggested labels

PPL, pushdown

Suggested reviewers

  • penghuo
  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • GumpacG
  • Swiddis
  • noCharger
  • dai-chen
  • YANG-DB
  • mengweieric
  • anirudha
  • RyanL1997
  • qianheng-aws

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.68% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Support pushdown dedup with expression' accurately and concisely summarizes the primary change—enabling dedup pushdown when expressions are involved.
Description check ✅ Passed The description clearly explains the feature: supporting dedup pushdown with expressions, provides concrete examples, and mentions related issues and sign-offs.
Linked Issues check ✅ Passed The PR implements the core requirement from #4789: extending dedup pushdown to handle expressions. Code changes support dedup with expression columns and other evaluated expressions, addressing both scenarios outlined in the issue.
Out of Scope Changes check ✅ Passed All changes are scoped to dedup pushdown expression support. Test files validate new behavior; utility updates support the feature; and planner rules implement the core functionality. No unrelated changes detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
@LantaoJin LantaoJin requested a review from RyanL1997 as a code owner December 16, 2025 14:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (5)
opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java (1)

93-98: Use parameterized type to avoid raw type warning.

Line 95 uses a raw Map type which causes an unchecked assignment warning. Since the return type is Map<String, Object>, use proper generics for type safety.

       return Arrays.stream(hits)
-          .<Map<String, Object>>map(
-              hit -> {
-                Map map = new LinkedHashMap<>(hit.getSourceAsMap());
-                hit.getFields().values().forEach(f -> map.put(f.getName(), f.getValue()));
-                return map;
-              })
+          .map(
+              hit -> {
+                Map<String, Object> map = new LinkedHashMap<>(hit.getSourceAsMap());
+                hit.getFields().values().forEach(f -> map.put(f.getName(), f.getValue()));
+                return map;
+              })
           .toList();
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java (2)

16-36: Missing JavaDoc on public methods.

Per coding guidelines, public methods should have proper JavaDoc documentation. The isResultDeterministic() and newFactory() methods lack documentation explaining their behavior.

+  /**
+   * Indicates whether the script results are deterministic and can be cached.
+   *
+   * @return true, as Calcite script results are cacheable
+   */
   @Override
   public boolean isResultDeterministic() {
     // This implies the results are cacheable
     return true;
   }

+  /**
+   * Creates a new leaf factory for generating field scripts.
+   *
+   * @param params Script parameters
+   * @param lookup Search lookup for accessing document fields
+   * @return A new CalciteFieldScriptLeafFactory instance
+   */
   @Override
   public FieldScript.LeafFactory newFactory(Map<String, Object> params, SearchLookup lookup) {
     return new CalciteFieldScriptLeafFactory(function, params, lookup);
   }

23-25: Remove unused constructor parameter type.

The RelDataType type parameter is accepted but never used or stored. While this parameter is part of the BiFunction interface contract in CalciteScriptEngine, it creates dead code in this factory. Note that CalciteAggregationScriptFactory does store and use the type parameter (passing it to its leaf factory), whereas CalciteFieldScriptFactory, CalciteFilterScriptFactory, and both sort script factories do not.

Remove the parameter to eliminate dead code:

-  public CalciteFieldScriptFactory(Function1<DataContext, Object[]> function, RelDataType type) {
+  public CalciteFieldScriptFactory(Function1<DataContext, Object[]> function) {
     this.function = function;
   }

This requires coordinating changes to the BiFunction signature in CalciteScriptEngine and updating all related factory instantiations.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (2)

173-197: O(n²) complexity in advanceDedupColumns due to repeated contains() calls.

The sorting comparator and mapping both call dedupColumnIndices.contains(i) for each element, resulting in O(n*m) complexity where n is the column count and m is the dedup column count. For large schemas this could be inefficient.

Convert dedupColumnIndices to a Set for O(1) lookups:

   private static List<Pair<RexNode, String>> advanceDedupColumns(
       List<Pair<RexNode, String>> originalColumnList,
       List<Integer> dedupColumnIndices,
       List<String> dedupColumnNames) {
+    Set<Integer> dedupIndexSet = new HashSet<>(dedupColumnIndices);
     List<Pair<RexNode, String>> reordered =
         IntStream.range(0, originalColumnList.size())
             .boxed()
             .sorted(
                 (i1, i2) -> {
-                  boolean in1 = dedupColumnIndices.contains(i1);
-                  boolean in2 = dedupColumnIndices.contains(i2);
+                  boolean in1 = dedupIndexSet.contains(i1);
+                  boolean in2 = dedupIndexSet.contains(i2);
                   return in1 == in2 ? i1 - i2 : in2 ? 1 : -1;
                 })
             .map(
                 i -> {
                   Pair<RexNode, String> original = originalColumnList.get(i);
-                  if (dedupColumnIndices.contains(i)) {
+                  if (dedupIndexSet.contains(i)) {
                     int dedupIndex = dedupColumnIndices.indexOf(i);
                     return Pair.of(original.getKey(), dedupColumnNames.get(dedupIndex));
                   }
                   return original;
                 })
             .toList();
     return reordered;
   }

Note: The indexOf call at line 190 would still need the original list to get the correct index for name lookup.


49-61: Consider extracting magic numbers 5 and 6 to named constants.

The rels.length checks against 5 and 6 are not self-documenting. Named constants would improve readability.

+  private static final int RELS_LENGTH_WITHOUT_EXPR = 5;
+  private static final int RELS_LENGTH_WITH_EXPR = 6;
+
   @Override
   protected void onMatchImpl(RelOptRuleCall call) {
     final LogicalProject finalProject = call.rel(0);
     // TODO Used when number of duplication is more than 1
     final LogicalFilter numOfDedupFilter = call.rel(1);
     final LogicalProject projectWithWindow = call.rel(2);
-    if (call.rels.length == 5) {
+    if (call.rels.length == RELS_LENGTH_WITHOUT_EXPR) {
       final CalciteLogicalIndexScan scan = call.rel(4);
       apply(call, finalProject, numOfDedupFilter, projectWithWindow, null, scan);
-    } else if (call.rels.length == 6) {
+    } else if (call.rels.length == RELS_LENGTH_WITH_EXPR) {
       final LogicalProject projectWithExpr = call.rel(4);
       final CalciteLogicalIndexScan scan = call.rel(5);
       apply(call, finalProject, numOfDedupFilter, projectWithWindow, projectWithExpr, scan);
     } else {
       throw new AssertionError(
           String.format(
               "The length of rels should be %s but got %s",
               this.operands.size(), call.rels.length));
     }
   }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccdb0f7 and 8942e10.

📒 Files selected for processing (37)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3 hunks)
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1_alternative.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2_alternative.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2_alternative.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (8 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java (7 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java (3 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)

  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧠 Learnings (6)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
📚 Learning: 2025-12-11T05:27:39.831Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.831Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
🧬 Code graph analysis (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (1)
  • EqualsAndHashCode (19-45)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java (1)
  • Value (44-214)
core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggGroupMergeRule.java (1)
  • Value (43-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
🔇 Additional comments (36)
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr1.yaml (1)

1-11: LGTM!

This test fixture correctly captures the expected Calcite explain plan for dedup with expression pushdown. The logical plan properly shows the ROW_NUMBER() OVER (PARTITION BY $17) for dedup on the computed new_gender column, and the physical plan demonstrates successful pushdown with OpenSearch composite aggregation using scripted terms for the LOWER function.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml (1)

11-11: Verify the expanded _source field list is intentional.

The _source.includes now requests system metadata fields (_id, _index, _score, _maxscore, _sort, _routing) alongside regular document fields. These metadata fields are typically not stored in _source and are accessed via separate hit methods. Please verify this expanded field list is intentional and doesn't cause issues when these fields aren't found in the source document.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1.yaml (1)

1-12: LGTM!

This test fixture correctly validates the complex dedup expression scenario with multiple computed columns (new_gender, new_state). The logical plan appropriately shows the sort before dedup pattern with ROW_NUMBER() OVER (PARTITION BY $4, $5), and the physical plan demonstrates successful pushdown with composite aggregation using two scripted terms sources for both LOWER expressions.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml (1)

1-11: LGTM!

This test fixture correctly validates the scenario where dedup operates on regular fields (age, account_number) while the query also contains computed expressions (new_gender, new_state). The physical plan appropriately pushes down the aggregation with raw field grouping while using script_fields in top_hits to compute the expression-based columns. This aligns well with the PR objective of supporting "dedup when other columns contain expressions."

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml (1)

1-11: LGTM! Test fixture expands plan validation coverage.

This alternative expected output YAML provides a valid execution plan variant for dedup-with-expression scenarios, enabling more robust test assertions via the new fallback YAML comparison helper.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2.yaml (1)

1-11: LGTM! Test fixture adds dedup-with-expression coverage.

This expected output YAML correctly represents the Calcite explain plan for dedup operations with expressions, supporting the expanded pushdown capability.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml (1)

10-10: LGTM! Test fixture updated to reflect pushdown refactor.

The PushDownContext now correctly reflects the aggregation-driven pushdown strategy with expanded source field projections, consistent with the dedup expression enhancement.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml (1)

1-14: LGTM! Test fixture adds sort+dedup-expression coverage.

This expected output correctly represents the Calcite plan for dedup operations with expressions and sorting, extending test coverage for the enhanced pushdown capability.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java (1)

60-60: LGTM! Enables expression-aware dedup pushdown.

Adding DEDUP_EXPR_PUSH_DOWN to the rule set correctly activates the dedup-with-expression pushdown capability, directly fulfilling the PR's core objective.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java (1)

73-73: LGTM! Extends script contexts to support field scripts.

The addition of FieldScript context registration follows the existing pattern for other script types and enables field-level scripting required for expression-aware dedup pushdown.

Also applies to: 82-82, 115-115

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex1_alternative.yaml (1)

1-12: LGTM! Alternative fixture supports non-deterministic plan validation.

This alternative expected output enables robust testing when multiple valid plan variants can be generated for the same dedup-with-expression query.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3.yaml (1)

1-11: LGTM!

The expected plan correctly represents a dedup 2 new_gender, new_state scenario with expression-based columns. The logical plan shows appropriate ROW_NUMBER windowing with partition by the computed columns, and the physical plan demonstrates successful pushdown with composite aggregation using scripted terms and top_hits with size 2.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2.yaml (1)

1-11: LGTM!

The expected plan correctly represents a dedup 1 new_gender, new_state scenario with a reduced column set. The top_hits size of 1 and filter condition <=($6, 1) properly reflect dedup 1 semantics.

integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (1)

21-21: LGTM!

The updated pushdown context correctly reflects the shift to an AGGREGATION-based strategy for the right side of the merge join with max=1 option. The composite aggregation on lastname with top_hits size 1 properly implements the max option semantics for join deduplication.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr1.yaml (1)

1-11: LGTM!

The expected plan correctly represents a dedup scenario where the dedup key (age) is a simple field, but the output includes a computed expression (new_gender = LOWER(gender)). The aggregation pushdown correctly groups by age, and the script_fields configuration properly computes new_gender within the top_hits aggregation.

integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml (1)

19-20: LGTM!

The updated plan correctly demonstrates the aggregation-based pushdown for the join with max=1 option. The first scan handles the dedup/max semantics via composite aggregation on account_number with top_hits size 1, while the second scan performs a simple projection.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml (1)

11-11: Review the selective _source fields in the AGGREGATION pushdown strategy.

Best practices recommend using _source filtering to retrieve only necessary fields and limiting the size of top_hits to reduce memory usage. The current _source.includes list contains all account fields plus internal metadata fields (_id, _index, _score, _maxscore, _sort, _routing), though the logical plan only projects account_number, gender, age, state. Using large _source includes can impact performance. Consider narrowing the includes list to only required fields for the aggregation-based pushdown.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml (1)

1-11: LGTM - AGGREGATION-centric pushdown for dedup is correctly structured.

The shift from PROJECT-centric to AGGREGATION-centric pushdown aligns with the PR's objective to support dedup with expressions. The expanded _source field set including internal metadata fields (_id, _index, _score, etc.) is appropriate for supporting expression evaluation in the pushdown context.

integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml (1)

16-24: LGTM - Asymmetric pushdown strategies for join with max option are correctly applied.

The physical plan appropriately uses different pushdown strategies for each side:

  • Left side: PROJECT + SORT for merge join ordering
  • Right side: AGGREGATION pushdown with top_hits for the max option dedup behavior

This aligns with the PR's implicit support for join with max option when expressions are present.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml (1)

1-13: LGTM - New test fixture for dedup with expressions is well-structured.

The plan correctly demonstrates:

  • Dedup by original fields (gender, state) with top_hits size=2
  • Expression computation (LOWER) via script_fields with Calcite-encoded scripts
  • Sort order alignment between logical (ASC-nulls-first, DESC-nulls-last) and physical plans
integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml (1)

1-11: LGTM - Aggregation-first pushdown for complex dedup scenario.

The updated physical plan correctly uses AGGREGATION pushdown with an expanded _source field set, consistent with the PR's pushdown strategy changes.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml (1)

1-13: LGTM - Alternative plan variant for optimizer non-determinism.

This alternative fixture accounts for non-deterministic ordering of script_fields (new_state, new_gender vs new_gender, new_state). Both plans are semantically equivalent, and having alternatives ensures test stability.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr3_alternative.yaml (1)

1-11: LGTM - Alternative fixture for expression-based dedup pushdown.

This correctly demonstrates deduplication where the grouping keys themselves are expressions (LOWER(gender), LOWER(state)). The composite aggregation uses script-based terms for the grouping, which is the correct approach for expression-based dedup pushdown.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr2_alternative.yaml (1)

1-11: LGTM - Alternative fixture for expression-based dedup with single row retention.

The plan correctly implements dedup by expression columns (new_gender, new_state) with top_hits size=1, demonstrating proper pushdown of dedup 1 with expression-based grouping keys.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml (1)

1-10: LGTM!

The updated expected output correctly reflects the AGGREGATION pushdown with expanded _source fields, aligning with the PR's dedup expression support enhancements.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr2_alternative.yaml (1)

1-11: LGTM!

The alternative expected output correctly captures the dedup-with-expression scenario using Calcite field scripts, supporting broader test coverage for the pushdown functionality.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2_alternative.yaml (1)

1-12: LGTM!

The alternative expected output correctly represents the complex dedup expression scenario with sort and aggregation pushdown, expanding test coverage.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java (3)

340-371: LGTM! Signature change enables richer argument metadata.

The conversion from List<RexNode> to List<Pair<RexNode, String>> is correctly implemented, allowing aggregation arguments to carry both the expression and its field name through the analysis pipeline.


373-402: Verify consistent accessor pattern throughout.

All argument accessors now use args.getFirst().getKey() or args.get(i).getKey() to retrieve the RexNode, which is consistent with the new Pair structure.


566-597: LITERAL_AGG handling correctly uses both Pair components.

The implementation properly extracts RexNode via rex.getKey() for type checks and String via rex.getValue() for field names, demonstrating correct usage of the Pair structure in the LITERAL_AGG/dedup pushdown path.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml (1)

1-12: LGTM!

The alternative expected output correctly captures another valid execution path for the dedup expression scenario, enhancing test robustness.

integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr_complex2.yaml (1)

1-12: LGTM!

The primary expected output correctly represents the complex dedup expression scenario. The script_fields order difference from the alternative variant justifies having multiple valid expected plans.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)

2023-2058: LGTM! Comprehensive dedup expression pushdown tests.

The test method properly validates dedup pushdown with expressions across multiple scenarios. The use of alternative expected plans via assertYamlEqualsIgnoreId(expected, alternative, actual) is appropriate for handling optimizer plan variability.


2101-2124: LGTM! Good coverage of complex dedup-then-sort scenarios.

The two scenarios effectively test the ordering of operations: dedup followed by sort vs. sort embedded in dedup pushdown. Both primary and alternative expected plans are validated.


2126-2161: LGTM! New test validates dedup pushdown when expressions exist but aren't dedup columns.

This test correctly covers the case where expressions are computed (e.g., new_gender = lower(gender)) but the dedup operates on different columns (e.g., age, account_number). The alternative plan coverage ensures robustness against optimizer variations.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (1)

282-291: Good refactoring: tableScanChecker improves readability.

Extracting the table scan validation logic into a named method with a clear Javadoc comment improves code maintainability and explains the constraint about project not being pushed.

Signed-off-by: Lantao Jin <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (1)

84-95: Unsafe Optional.get() still present.

Line 94 calls .get() on an Optional without verification, which will throw NoSuchElementException if no matching INPUT_REF is found for a dedup column index. This can occur if the project doesn't contain an INPUT_REF at the expected index.

Apply this diff to provide a clear error message:

             .map(
                 i ->
                     projectWithWindow.getNamedProjects().stream()
                         .filter(pair -> pair.getKey().isA(SqlKind.INPUT_REF))
                         .filter(pair -> ((RexInputRef) pair.getKey()).getIndex() == i)
                         .map(Pair::getValue)
                         .findFirst()
-                        .get())
+                        .orElseThrow(
+                            () ->
+                                new IllegalStateException(
+                                    "Dedup column at index " + i + " not found in project")))
             .toList();
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (1)

18-18: Fix grammatical error in class JavaDoc.

The verb should agree with the singular subject: "produce" → "produces".

Apply this diff:

-/** Calcite script executor that produce dynamic values. */
+/** Calcite script executor that produces dynamic values. */
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8942e10 and abd6f5a.

📒 Files selected for processing (6)
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (8 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : All public classes and methods must have proper JavaDoc

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Flag missing JavaDoc on public APIs during code review

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
📚 Learning: 2025-12-11T05:27:39.831Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.831Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptFactory.java (1)
  • EqualsAndHashCode (17-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (11)
integ-test/src/test/java/org/opensearch/sql/util/MatcherUtils.java (1)

450-467: LGTM! JavaDoc concern from previous review properly addressed.

The comprehensive JavaDoc now clearly documents the method's purpose, parameters, fallback behavior, and exception handling. The implementation correctly tries the primary expected YAML first and falls back to the secondary on failure, which aligns well with the test scenarios where multiple valid plan variants can occur.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/CalciteAggregationScriptLeafFactory.java (1)

35-35: LGTM!

Documentation punctuation fix improves Javadoc formatting.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java (1)

33-33: LGTM!

Documentation punctuation fix improves Javadoc formatting.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScript.java (2)

29-48: LGTM!

Constructor JavaDoc is clear and properly documents all parameters.


50-58: LGTM!

The bounds check for array access properly handles null and empty results, addressing the previously flagged ArrayIndexOutOfBoundsException risk.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (6)

8-8: LGTM!

The standard Guava import is now correctly used, addressing the previous inconsistency with the rest of the codebase.


49-62: LGTM!

The two-path handling correctly distinguishes between dedup with and without expression projects based on the operand count.


110-136: LGTM!

The two-path logic correctly handles both scenarios:

  • When no expression project exists, it creates a new reordered project from the scan
  • When an expression project exists, it reorders that project

The use of advanceDedupColumns ensures consistent column ordering in both paths.


160-165: LGTM!

Clean extraction of INPUT_REF indices from RexNode list.


167-199: LGTM!

The advanceDedupColumns method correctly reorders columns to place dedup columns first while:

  • Maintaining relative order within dedup and non-dedup groups
  • Renaming dedup columns to their target names
  • Preserving non-dedup column names

The sorting comparator logic is correct: returns -1 when only i1 is a dedup column (moves it earlier), 1 when only i2 is a dedup column, and preserves original order otherwise.


284-293: LGTM!

The tableScanChecker predicate correctly validates that no limit, aggregate, or project has been pushed, ensuring the scan is in a clean state before applying dedup pushdown.

As per coding guidelines, the method clearly documents the rationale: preventing loss of expression names after project pushdown.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java (1)

43-52: Type safety issue from previous review remains unaddressed.

The unchecked cast of params.get(SOURCES) to List<Integer> on line 49 still lacks validation and could throw ClassCastException or NullPointerException at runtime if the parameter is missing or has the wrong type.

Apply this diff to add safe type checking:

   public CalciteFieldScriptLeafFactory(
       Function1<DataContext, Object[]> function, Map<String, Object> params, SearchLookup lookup) {
     this.function = function;
     this.params = params;
     this.lookup = lookup;
     this.parametersToIndex =
-        IntStream.range(0, ((List<Integer>) params.get(SOURCES)).size())
+        IntStream.range(0, getSources(params).size())
             .boxed()
             .collect(Collectors.toMap(i -> "?" + i, i -> i));
   }
+
+  @SuppressWarnings("unchecked")
+  private static List<Integer> getSources(Map<String, Object> params) {
+    Object sources = params.get(SOURCES);
+    if (sources == null) {
+      throw new IllegalArgumentException("SOURCES parameter is required but was null");
+    }
+    if (!(sources instanceof List)) {
+      throw new IllegalArgumentException(
+          "SOURCES parameter must be a List, but was: " + sources.getClass().getName());
+    }
+    return (List<Integer>) sources;
+  }
🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

2084-2123: Clarify the comment about rename behavior.

The comment on line 2086 states "rename changes nothing, reuse the same yaml files," which could be misinterpreted. Consider revising to:

-    // rename changes nothing, reuse the same yaml files
+    // Rename doesn't affect the final optimized plan; expected outputs match testDedupExpr

This clarifies that the rename operation is tested and works correctly, producing the same optimized plan as the eval-only version.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abd6f5a and 8d31a38.

📒 Files selected for processing (3)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/CalciteFilterScriptLeafFactory.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)

  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/field/CalciteFieldScriptLeafFactory.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)

2125-2148: LGTM! Good coverage of complex dedup scenarios.

This new test method effectively validates the interaction between rename, dedup, and sort operations, covering both sort-before-dedup and dedup-before-sort patterns. The test scenarios align well with the PR objectives to support expression-based dedup pushdown.


2150-2185: LGTM! Comprehensive coverage of dedup with expressions in non-dedup columns.

This new test method addresses an important use case from the PR objectives: supporting pushdown when expressions exist in non-dedup columns. The four scenarios provide thorough validation of this functionality.


2047-2082: No actionable concerns identified.

The use of alternative expected plan files is an intentionally-designed testing pattern, not a maintenance concern. The assertYamlEqualsIgnoreId method explicitly supports multiple expected outputs with documented fallback semantics for cases where "the DSL implementation can produce multiple valid plan variants." This pattern correctly accommodates Calcite's legitimate optimization variations.

Likely an incorrect or invalid review comment.

yuancu
yuancu previously approved these changes Dec 17, 2025
.<Map<String, Object>>map(hit -> new LinkedHashMap<>(hit.getSourceAsMap()))
.<Map<String, Object>>map(
hit -> {
Map map = new LinkedHashMap<>(hit.getSourceAsMap());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use var or Map<String, Object>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition is in L93 .<Map<String, Object>>map(. But yes, I can add Map<String, Object> here and remove L93. The reason use L93 instead of Map<String, Object> is we used unnamed variable, but now it has to define a name.

List<Pair<RexNode, String>> namedPairFromScan =
Streams.zip(columnsFromScan.stream(), colNamesFromScan.stream(), Pair::new).toList();
List<Pair<RexNode, String>> reordered =
advanceDedupColumns(namedPairFromScan, dedupColumnIndices, dedupColumnNames);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the intuition / reason for reordering the columns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I suggested you to use aggregateWithTrimming() instead of calling the relBuilder.aggregate() directly. In SQL, the Project added for Aggregate always keeps the group columns in the front of others, it means the group keys set always contain {0}, or there could be trigger a Calcite bug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it as well. Since the aggregate is going to be pushed into scan and the final generated RelNode is a single Scan operator, will it really trigger the bug in Calcite?

Do we do similar thing in the previous PR for supporting dedup push down?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it as well. Since the aggregate is going to be pushed into scan and the final generated RelNode is a single Scan operator, will it really trigger the bug in Calcite?

Do we do similar thing in the previous PR for supporting dedup push down?

Yes. I see bugs when the group key columns are not in the front of child Project in Calcite in developing support Aggregate with Calcite. Not sure is it fixed or not. Check the comment here

trimmedRefs.addAll(PlanUtils.getInputRefs(resolvedGroupByList)); // group-by keys first

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That happens in CalciteRelNodeVisitor, but is similar change needed in the push down process?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has no diff with CalciteRelNodeVisitor. It is building an aggregate via relbuilder. I didn't test if I remove the reorder logic. It may not easy to trigger the bug in dedup, but for safe, I added the reorder logic

.filter(pair -> ((RexInputRef) pair.getKey()).getIndex() == i)
.map(Pair::getValue)
.findFirst()
.get())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this be simplified to

  dedupColumnIndices.stream()
     .map(projectWithWindow.getInput().getRowType().getFieldNames()::get)
     .toList()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. above code doesn't cover the rename case:

source=test | rename status as http_status | dedup http_status | fields http_status | sort http_status

if (LOG.isDebugEnabled()) {
LOG.debug("Cannot pushdown the dedup since the dedup columns contain RexCall");
}
if (dedupColumnIndices.size() != dedupColumnNames.size()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this ever happened since dedupColumnNames is derived from dedupColumnIndices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am 100% sure they have same size. Better to keep this check.

List<Pair<RexNode, String>> namedPairFromScan =
Streams.zip(columnsFromScan.stream(), colNamesFromScan.stream(), Pair::new).toList();
List<Pair<RexNode, String>> reordered =
advanceDedupColumns(namedPairFromScan, dedupColumnIndices, dedupColumnNames);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it as well. Since the aggregate is going to be pushed into scan and the final generated RelNode is a single Scan operator, will it really trigger the bug in Calcite?

Do we do similar thing in the previous PR for supporting dedup push down?

* @return the converted Pair<RexNode, String> list
*/
private static List<RexNode> convertAggArgThroughProject(AggregateCall aggCall, Project project) {
private static List<org.apache.calcite.util.Pair<RexNode, String>> convertAggArgThroughProject(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already import org.apache.commons.lang3.tuple.Pair imported. Could it be switched to that or at least unified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. will convert to common.lang3's Pair

Signed-off-by: Lantao Jin <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
core/src/main/java/org/opensearch/sql/utils/Utils.java (1)

15-21: Prefer ArrayList and indexed for-loop for better performance.

The current implementation uses LinkedList and an Iterator, which has higher memory overhead and is less idiomatic. Consider using ArrayList with an indexed for-loop for better performance and clarity.

🔎 Simplified implementation
-    LinkedList<Pair<I, Integer>> result = new LinkedList<>();
-    Iterator<I> iter = input.iterator();
-    int index = 0;
-    while (iter.hasNext()) {
-      result.add(Pair.of(iter.next(), index++));
+    List<Pair<I, Integer>> result = new ArrayList<>(input.size());
+    for (int i = 0; i < input.size(); i++) {
+      result.add(Pair.of(input.get(i), i));
     }
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java (1)

60-64: Preserve exception stack trace in debug logging.

Using e.toString() loses the stack trace, which is valuable for debugging. Pass the exception directly to include the full stack trace.

Suggested fix
     } catch (InvalidRelException e) {
       if (LOG.isDebugEnabled()) {
-        LOG.debug(e.toString());
+        LOG.debug("Cannot convert aggregate to EnumerableNestedAggregate", e);
       }
       return null;
     }
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java (1)

27-30: Consider clarifying the JavaDoc.

The class documentation links to OpenSearch's "nested aggregation" for nested documents, but based on the PR context (dedup pushdown with expressions), this class appears to handle aggregations that are structurally nested (sub-aggregations) rather than aggregations on nested document types. The naming and documentation could cause confusion.

Consider updating the JavaDoc to clarify whether this is for:

  • Aggregations on nested document fields, or
  • Aggregations that will be pushed down as sub-aggregations in OpenSearch queries
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d31a38 and 3d2786d.

📒 Files selected for processing (6)
  • core/src/main/java/org/opensearch/sql/utils/Utils.java (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_push.yaml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java (7 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
  • core/src/main/java/org/opensearch/sql/utils/Utils.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
  • core/src/main/java/org/opensearch/sql/utils/Utils.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_push.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_nested_agg_push.yaml
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java (1)
  • EnumerableNestedAggregate (31-85)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java (2)

340-373: LGTM - Consistent Pair-based argument handling.

The refactoring from List<RexNode> to List<Pair<RexNode, String>> is implemented correctly. The method now returns pairs containing both the expression node and its associated field name, which enables proper handling of script fields in aggregations (particularly visible in the LITERAL_AGG case at lines 581-595 where rex.getValue() provides the script field name).


376-605: Mechanical updates correctly applied across aggregation builders.

The aggregation builder methods (createDistinctAggregation, createRegularAggregation) correctly use .getKey() to extract RexNode from the pairs. The pattern is consistently applied across all aggregation types including AVG, SUM, COUNT, MIN, MAX, variance/stddev functions, ARG_MAX/ARG_MIN, TAKE, FIRST, LAST, and PERCENTILE_APPROX.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/EnumerableNestedAggregateRule.java (1)

21-48: LGTM - Rule implementation follows Calcite patterns correctly.

The rule correctly:

  1. Configures conversion from LogicalAggregate with Convention.NONE to EnumerableConvention
  2. Uses hint-based filtering to selectively apply the conversion only when "stats_args" hint is present
  3. Gracefully returns null when conversion cannot be performed
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/EnumerableNestedAggregate.java (1)

33-57: LGTM - Constructor validation is appropriate.

The constructor correctly validates that:

  1. No distinct aggregations are used (not supported)
  2. No within-distinct aggregations are used (not supported)
  3. Each aggregation has a registered implementor in RexImpTable

This follows the pattern established in other Calcite enumerable aggregates and provides clear error messages.

Signed-off-by: Lantao Jin <[email protected]>
@yuancu yuancu merged commit cbcdbd6 into opensearch-project:main Dec 19, 2025
38 checks passed
@LantaoJin LantaoJin deleted the pr/issues/4789 branch December 19, 2025 06:42
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4957-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 cbcdbd6fc918e4a356480300c208aa76f468fbf1
# Push it to GitHub
git push --set-upstream origin backport/backport-4957-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4957-to-2.19-dev.

LantaoJin added a commit to LantaoJin/search-plugins-sql that referenced this pull request Dec 19, 2025
* Support pushdown dedup with expression

Signed-off-by: Lantao Jin <[email protected]>

* Fix java spotless check

Signed-off-by: Lantao Jin <[email protected]>

* Support dedup with expression

Signed-off-by: Lantao Jin <[email protected]>

* fix yaml test

Signed-off-by: Lantao Jin <[email protected]>

* simplify

Signed-off-by: Lantao Jin <[email protected]>

* address comments

Signed-off-by: Lantao Jin <[email protected]>

* fix javadoc

Signed-off-by: Lantao Jin <[email protected]>

* address comments

Signed-off-by: Lantao Jin <[email protected]>

* delete unexpected files

Signed-off-by: Lantao Jin <[email protected]>

---------

Signed-off-by: Lantao Jin <[email protected]>
(cherry picked from commit cbcdbd6)
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Dec 19, 2025
qianheng-aws pushed a commit that referenced this pull request Dec 22, 2025
* Support pushdown dedup with expression



* Fix java spotless check



* Support dedup with expression



* fix yaml test



* simplify



* address comments



* fix javadoc



* address comments



* delete unexpected files



---------


(cherry picked from commit cbcdbd6)

Signed-off-by: Lantao Jin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support pushdown dedup complex expression

3 participants