Skip to content

Conversation

@qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Dec 23, 2025

Description

This PR primally supports pruning the old operator if we get a better plan than before. Therefore, it will improve the efficiency of planning process by avoid exploring meaningless equivalent plans.

The performance gain is shown below:

  1. Average query cost on big5 and clickbench:
  previous pruneOld
big5 17ms 14ms
clickbench 26 ms 19ms
  1. Average optimization cost on big5 and clickbench (By enable calcite's debug mode and adding timingTracer, it will induce more time cost than before)
  previous pruneOld
big5 36.3 ms 25.9 ms
clickbench 265 ms 110.9 ms
  1. Average number of applied rules on big5 and clickbench:
  previous pruneOld
big5 114 50
clickbench 474 215

Some positive cases on plan with this PR:

  • testDedupRename
  • testCasePushdownAsRangeQueryExplain
  • testExplainOnAggregationWithFunction
  • testDedupExpr

Implementation Details

As described in #4931 (comment), there is also many issues and bug spotted after pruning old. So there is additional change to fix them and make it compatible:

  • As there is Subset reuse in Calcite, pruning a Subset which is the only child of other Subset will cause preparing failure. So we should prune the old from top to down and stop if the current node cannot be pruned. One node cannot be pruned if it's physical node(see the point5 in the above comment) or it has multiple parents(except the root of the call, as we are generating a new root to replace it).
  • Make PPLAggregateConvertRule, PPLAggGroupMergeRule, 'RareTopPushdownRule', 'DedupPushDownRule' implements SubstitutionRule so they will get higher priority on rule match and then we can get optimized aggregates in RelSubset before pruning.
  • Support removing project, sort and agg derived filter when doing aggregate push down.
  • Slightly refactor AggregateIndexScanRule so it can support pushing down on more cases
  • Refactor and simplify DedupPushDownRule so it can get compatible with the current pruning mechanism
  • Continue pushing down limit if it can reduce the estimated row count.
  • Fix several bugs in AggregateAnalyzer when the project is null. See UT in AggregateAnalyzerTest, its expected results is wrong before.

Related Issues

Resolves #4931

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Smarter aggregation-aware pushdowns that preserve valid filters and narrower source projections for more efficient queries.
  • Bug Fixes

    • Prunes unused intermediate plan nodes after pushdown/transform to reduce query overhead and prevent redundant work.
    • Corrected pushed filter/aggregation column mappings and ordering to improve result stability.
  • Refactor

    • Several planner rules unified to support substitution-style optimizations and simplified pushdown contexts.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds PlanUtils helpers to detect aggregate-derived predicates and prune redundant RelNodes; updates many push-down rules to implement SubstitutionRule and call PlanUtils.tryPruneRelNodes after transforms; adapts pushdown APIs (nullable Project, ProjectDigest, PushDownContext.cloneForAggregate) and updates aggregate/scan analysis and tests to new pushdown ordering and bindings.

Changes

Cohort / File(s) Summary
Plan utilities
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
New helpers and predicates: isTimeSpan, isNotNullDerivedFromAgg, tryPruneRelNodes, aggIgnoreNullBucket, maybeTimeSpanAgg; pruning logic that interacts with VolcanoPlanner; expanded imports and RexSlot usage.
PPL rules
core/src/main/java/.../PPLAggGroupMergeRule.java, core/src/main/java/.../PPLAggregateConvertRule.java
Both classes now implement SubstitutionRule; early-exit added in aggregate convert when no converted args; call PlanUtils.tryPruneRelNodes(call) after transforms.
General planner rules (prune on transform)
opensearch/src/main/java/.../{AggregateIndexScanRule.java, DedupPushdownRule.java, ExpandCollationOnProjectExprRule.java, FilterIndexScanRule.java, LimitIndexScanRule.java, ProjectIndexScanRule.java, RareTopPushdownRule.java, RelevanceFunctionPushdownRule.java, SortAggregateMeasureRule.java, SortExprIndexScanRule.java, SortIndexScanRule.java}
Many rules invoke PlanUtils.tryPruneRelNodes(call) after successful transformTo(...); several rules (Dedup, RareTop, RelevanceFunction, some PPL rules) additionally implement SubstitutionRule. DedupPushdownRule rewritten to build deterministic projection/aggregate and prune.
PushDownContext / scan APIs
opensearch/src/main/java/.../PushDownContext.java, opensearch/src/main/java/.../CalciteLogicalIndexScan.java, opensearch/src/main/java/.../ProjectDigest.java
Added PushDownContext.cloneForAggregate(Aggregate, @nullable Project) to strip sorts/projects and drop filters derived from aggregate; pushDownAggregate accepts @Nullable Project; introduced ProjectDigest record and updated pushDownProject/pushDownLimit heuristics.
Aggregate analysis & request building
opensearch/src/main/java/.../AggregateAnalyzer.java, opensearch/src/test/java/.../AggregateAnalyzerTest.java
Project parameter made @Nullable; group handling switched to (name,index) pairs; helper signatures updated; top_hits/group name resolution adjusted; tests updated for field projection differences.
Index rule registration
opensearch/src/main/java/.../OpenSearchIndexRules.java
Renamed/reorganized rule constants (e.g., AGGREGATE_PROJECT_INDEX_SCAN, AGGREGATE_INDEX_SCAN) and updated rule registration list.
Integration test expectations
integ-test/src/test/resources/expectedOutput/calcite/*.yaml (many files)
Updated expected PushDownContext ordering, $ binding indices, removed/added PROJECT entries, adjusted aggregation grouping/indices and removed some EXISTS/IS NOT NULL wrappers to reflect new pushdown/pruning behavior.

Sequence Diagram(s)

sequenceDiagram
  participant Planner as CalcitePlanner
  participant Rule as PushDownRule
  participant PU as PlanUtils
  participant VP as VolcanoPlanner
  participant Rel as RelSet

  Planner->>Rule: onMatch(call, rels...)
  activate Rule
  Rule->>Rule: compute transformedRel (push-down)
  alt transform produced newRel
    Rule->>Planner: transformTo(newRel)
    Note right of Rule `#e6f7ff`: after successful transform call pruning helper
    Rule->>PU: tryPruneRelNodes(call)
    activate PU
    PU->>VP: check planner type & auto-prune
    alt eligible
      PU->>Rel: identify top-down redundant/equivalent RelNodes
      Rel-->>VP: remove/prune old RelNodes
      VP-->>PU: pruning applied
    else not eligible
      PU-->>Rule: no-op
    end
    deactivate PU
  end
  Rule-->>Planner: onMatch complete
  deactivate Rule
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

pushdown, clickbench

Suggested reviewers

  • penghuo
  • anirudha
  • LantaoJin
  • kavithacm
  • derek-ho
  • dai-chen
  • ps48
  • Swiddis
  • GumpacG
  • noCharger
  • MaxKsyunz
  • Yury-Fridlyand
  • yuancu

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Prune old in operator push down rules' is clear and specific, directly referencing the main change of implementing pruning of old RelNode instances in push-down rules.
Description check ✅ Passed The description thoroughly documents the purpose (pruning old RelNodes to improve planner efficiency), provides performance metrics, implementation details, and related fixes.
Linked Issues check ✅ Passed The code changes fully address the objectives in issue #4931: implementing SubstitutionRule interface in multiple rules, adding tryPruneRelNodes calls, and supporting aggregate pushdown optimization as required.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing pruning of old RelNode instances and supporting optimizations. Test output updates reflect expected behavior changes from the pruning mechanism, which is within scope.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8f2a7f and e8336b5.

📒 Files selected for processing (1)
  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
🧰 Additional context used
📓 Path-based instructions (1)
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (1)
integ-test/src/test/resources/expectedOutput/calcite/big5/dedup_metrics_size_field.yaml (1)

14-14: The _source includes list correctly includes all fields from the project required for both the final projection and DEDUP computation. The inclusion of both "metrics" and "metrics.size" is not redundant—"metrics" is needed for the final output projection, while "metrics.size" is required for the DEDUP PARTITION BY clause and the IS NOT NULL filter. The test data is realistic and properly covers the edge case of DEDUP on a nested field.


Comment @coderabbitai help to get the list of available commands and usage tips.

@qianheng-aws qianheng-aws marked this pull request as draft December 23, 2025 08:47
@qianheng-aws qianheng-aws added the enhancement New feature or request label Dec 23, 2025
@qianheng-aws qianheng-aws changed the title Prune old Prune old in operator push down rules Dec 23, 2025
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
@qianheng-aws qianheng-aws marked this pull request as ready for review December 24, 2025 06:26
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java (1)

190-233: Add a comment explaining the predicate difference between BUCKET_NON_NULL_AGG configs.

BUCKET_NON_NULL_AGG (line 153) uses .predicate(aggIgnoreNullBucket), while BUCKET_NON_NULL_AGG_WITH_UDF (line 190) uses .predicate(aggIgnoreNullBucket.or(maybeTimeSpanAgg)). Both configs are actively registered and used in different contexts (different pattern structures), but the reason for this design difference is not documented. Add a brief comment explaining why the WITH_UDF variant needs to additionally match maybeTimeSpanAgg predicates.

🧹 Nitpick comments (5)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/ProjectDigest.java (1)

10-15: Add JavaDoc for public record.

Per coding guidelines, public classes require proper JavaDoc. This record would benefit from documentation explaining its purpose and the relationship between names and selectedColumns.

Suggested JavaDoc
+/**
+ * Represents a digest of a projection containing field names and their corresponding
+ * column indices in the underlying relation.
+ *
+ * @param names the list of projected field names
+ * @param selectedColumns the list of column indices corresponding to each projected field
+ */
 public record ProjectDigest(List<String> names, List<Integer> selectedColumns) {
   @Override
   public String toString() {
     return names.toString();
   }
 }
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java (1)

68-70: Consider logging or narrowing the catch-all exception handler.

The catch block silently swallows all exceptions, which could mask bugs or unexpected conditions. Consider either:

  1. Logging the exception at DEBUG level for troubleshooting
  2. Narrowing to specific expected exceptions (e.g., ClassCastException, IndexOutOfBoundsException)
🔎 Proposed logging improvement
     } catch (Exception e) {
+      // Log at debug level for troubleshooting pushdown failures
+      // e.g., java.util.logging.Logger or slf4j
       return;
     }
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)

402-424: Consider simplifying the limit reduction check.

The logic for canReduceEstimatedRowsCount is correct but could be simplified. The stream pipeline to find the previous limit digest is somewhat verbose.

🔎 Proposed simplification
-        boolean canReduceEstimatedRowsCount = true;
-        if (pushDownContext.isLimitPushed()) {
-          Optional<Integer> previousRowCount =
-              pushDownContext.getQueue().reversed().stream()
-                  .takeWhile(operation -> operation.type() != PushDownType.AGGREGATION)
-                  .filter(operation -> operation.type() == PushDownType.LIMIT)
-                  .findFirst()
-                  .map(operation -> (LimitDigest) operation.digest())
-                  .map(limitDigest -> limitDigest.offset() + limitDigest.limit());
-          if (previousRowCount.isPresent()) {
-            canReduceEstimatedRowsCount = totalSize < previousRowCount.get();
-          }
-        }
+        boolean canReduceEstimatedRowsCount =
+            !pushDownContext.isLimitPushed()
+                || pushDownContext.getQueue().reversed().stream()
+                    .takeWhile(op -> op.type() != PushDownType.AGGREGATION)
+                    .filter(op -> op.type() == PushDownType.LIMIT)
+                    .findFirst()
+                    .map(op -> (LimitDigest) op.digest())
+                    .map(d -> totalSize < d.offset() + d.limit())
+                    .orElse(true);
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java (1)

658-672: Interface field predicates are idiomatic but consider static method alternatives.

The aggIgnoreNullBucket and maybeTimeSpanAgg predicates are defined as interface fields (implicitly public static final). This works but differs from the static method pattern used elsewhere in this interface. For consistency with other helpers like isTimeSpan, consider converting these to static methods.

🔎 Alternative as static methods
-  Predicate<Aggregate> aggIgnoreNullBucket =
-      agg ->
-          agg.getHints().stream()
-              .anyMatch(
-                  hint ->
-                      hint.hintName.equals("stats_args")
-                          && hint.kvOptions.get(Argument.BUCKET_NULLABLE).equals("false"));
-
-  Predicate<Aggregate> maybeTimeSpanAgg =
-      agg ->
-          agg.getGroupSet().stream()
-              .allMatch(
-                  group ->
-                      isTimeBasedType(
-                          agg.getInput().getRowType().getFieldList().get(group).getType()));
+  static boolean aggIgnoreNullBucket(Aggregate agg) {
+    return agg.getHints().stream()
+        .anyMatch(
+            hint ->
+                hint.hintName.equals("stats_args")
+                    && "false".equals(hint.kvOptions.get(Argument.BUCKET_NULLABLE)));
+  }
+
+  static boolean maybeTimeSpanAgg(Aggregate agg) {
+    return agg.getGroupSet().stream()
+        .allMatch(
+            group ->
+                isTimeBasedType(
+                    agg.getInput().getRowType().getFieldList().get(group).getType()));
+  }

Note: The current hint.kvOptions.get(Argument.BUCKET_NULLABLE).equals("false") could throw NPE if the key is missing. Consider using "false".equals(...) for null-safety.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (1)

128-131: Commented-out assertion may indicate incomplete validation.

The commented assertion on lines 128-130 suggests there was intent to validate that the input field count matches the group-by list size. The remaining assertion on line 131 only checks that the group set equals newGroupByList, not the field count alignment.

Consider either removing the commented code or implementing the intended validation.

🔎 Either remove or uncomment the assertion
-    // assert aggregate.getInput().getRowType().getFieldCount() == groupByList.size() :
-    // String.format("The input's field size should be trimmed to equal to group list size %d, but
-    // got %d", groupByList.size(), aggregate.getInput().getRowType().getFieldCount());
     assert aggregate.getGroupSet().asList().equals(newGroupByList);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbcdbd6 and b161a3b.

📒 Files selected for processing (109)
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggGroupMergeRule.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggregateConvertRule.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_date_range_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_high.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_high_2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_low.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms_low_cardinality.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_single_group_key.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_limit.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q10.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q16.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q17.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q18.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q21.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q33.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q34.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q36.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q6.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q9.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_head_from.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_join4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_no_expr_output_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push6.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown_bucket_nullable1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_scalar_correlated_subquery_in_select.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_scalar_correlated_subquery_in_where.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_script_push_on_text.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_then_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/FilterIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/LimitIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ProjectIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortAggregateMeasureRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/ProjectDigest.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java
🧰 Additional context used
📓 Path-based instructions (6)
integ-test/src/test/resources/**/*

⚙️ CodeRabbit configuration file

integ-test/src/test/resources/**/*: - Verify test data is realistic and representative

  • Check data format matches expected schema
  • Ensure test data covers edge cases and boundary conditions

Files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_high.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_date_range_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_head_from.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms_low_cardinality.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_script_push_on_text.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_no_expr_output_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_low.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_limit.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_scalar_correlated_subquery_in_select.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_then_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q34.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q10.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_join4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q33.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q36.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_single_group_key.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown_bucket_nullable1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_scalar_correlated_subquery_in_where.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q17.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q16.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push6.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q9.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q21.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_high_2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q6.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q18.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yaml
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggGroupMergeRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ProjectIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortAggregateMeasureRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggregateConvertRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/FilterIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/ProjectDigest.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/LimitIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java
  • opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggGroupMergeRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ProjectIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortAggregateMeasureRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggregateConvertRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/FilterIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/ProjectDigest.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/LimitIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java
  • opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggGroupMergeRule.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggregateConvertRule.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggGroupMergeRule.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggregateConvertRule.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q13.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q14.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_high.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_keepempty_false_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_composite_date_range_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q8.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_head_from.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms_low_cardinality.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push3.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart_count.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_script_push_on_text.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/date_histogram_minute_agg.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q11.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_sort_on_measure_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_no_expr_output_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_low.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_date_histogram_daily.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_limit.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/multi_terms_keyword.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms_keyword.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_scalar_correlated_subquery_in_select.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_then_agg_push.json
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q34.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q10.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_join4.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q33.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_with_integer_span.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q36.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_single_group_key.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown_bucket_nullable1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortAggregateMeasureRule.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_scalar_correlated_subquery_in_where.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q17.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q31.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr3_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q22.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q16.yaml
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push6.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q9.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q43.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q15.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_complex4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q32.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q21.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_output.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/composite_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_dedup_with_expr4_alternative.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_high_2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_timechart.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_multiple_group_keys.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/terms_significant_2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q6.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/AggregateIndexScanRule.java
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q23.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q18.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q38.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
  • opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q12.yaml
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/big5/cardinality_agg_high.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_head_from.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/big5/keyword_terms.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push5.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_fields_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q39.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push2.yaml
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push1.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_scalar_correlated_subquery_in_select.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_paginating_join4.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_scalar_correlated_subquery_in_where.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q41.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_multiple_agg_with_sort_on_one_measure_not_push2.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.json
  • integ-test/src/test/resources/expectedOutput/calcite/explain_join_with_criteria_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q42.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q28.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggGroupMergeRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/ExpandCollationOnProjectExprRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggregateConvertRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java
  • integ-test/src/test/resources/expectedOutput/calcite/chart_timestamp_span_and_category.yaml
  • core/src/main/java/org/opensearch/sql/calcite/plan/PPLAggregateConvertRule.java
  • integ-test/src/test/resources/expectedOutput/calcite/chart_use_other.yaml
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Prefer `Optional<T>` for nullable returns in Java

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
🧬 Code graph analysis (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java (1)
  • Value (26-107)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java (1)
  • Value (30-122)
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
  • OpenSearchTypeFactory (63-411)

Signed-off-by: Heng Qian <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (2)

43-52: Consider adding a defensive length check for consistency.

Other rules in the codebase (e.g., RelevanceFunctionPushdownRule, PPLAggGroupMergeRule) include an assertion to validate call.rels.length. While the operand pattern guarantees 6 operands, adding a check improves defensive coding and debugging clarity.

🔎 Proposed addition
   @Override
   protected void onMatchImpl(RelOptRuleCall call) {
+    if (call.rels.length != 6) {
+      throw new AssertionError(
+          String.format("The length of rels should be 6 but got %s", call.rels.length));
+    }
     final LogicalProject finalProject = call.rel(0);
     // TODO Used when number of duplication is more than 1
     final LogicalFilter numOfDedupFilter = call.rel(1);

128-137: Pruning integration follows established pattern.

The PlanUtils.tryPruneRelNodes(call) invocation after transformTo is consistent with other SubstitutionRule implementations in the codebase (e.g., RelevanceFunctionPushdownRule, RareTopPushdownRule).

Note: The assert statement on lines 128-129 may be disabled in production JVMs. If this invariant is critical for correctness, consider using an explicit check that throws an appropriate exception.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b161a3b and c8f2a7f.

📒 Files selected for processing (2)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RareTopPushdownRule.java (1)
  • Value (26-107)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/RelevanceFunctionPushdownRule.java (1)
  • Value (30-122)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (7)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java (3)

34-41: LGTM!

The implementation of SubstitutionRule is consistent with the PR objectives and follows the same pattern used by other rules in the codebase (e.g., RelevanceFunctionPushdownRule, RareTopPushdownRule, PPLAggGroupMergeRule). This enables the pruning behavior for removing redundant RelNodes after substitution.


88-113: LGTM!

The projection building logic correctly prioritizes dedup columns first (for subsequent aggregation grouping) and appends remaining non-ROW_NUMBER columns. The dual source resolution (from projectWithWindow or bottomProject via RexInputRef) handles both expression-based and reference-based dedup columns appropriately.


140-181: LGTM!

The operand structure correctly matches the 6-level pattern required for dedup pushdown:

  1. Final project (no ROW_NUMBER)
  2. Filter (dedup number condition)
  3. Project with window (ROW_NUMBER)
  4. Filter (bucket non-null)
  5. Bottom project
  6. Index scan

The config rename to DEFAULT and removal of isProjectPushed constraint from tableScanChecker align with the PR's refactoring goals.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (4)

13-13: LGTM!

The new imports are properly used: @Nullable for the pushDownAggregate parameter and ProjectDigest for the project push-down digest wrapper.

Also applies to: 62-62


269-272: LGTM!

The change to use ProjectDigest provides richer context by capturing both field names and selected column indices, which supports the new pruning logic by enabling more precise digest comparisons.


344-354: LGTM!

The updated signature with @Nullable Project project and the use of cloneForAggregate(aggregate, project) properly supports the aggregate-aware push-down path where project removal may occur during optimization.


401-419: No issues found. The code uses .reversed() which is a Java 21 method, and this is consistent with the project's build configuration that explicitly targets Java 21 (sourceCompatibility = JavaVersion.VERSION_21, targetCompatibility = JavaVersion.VERSION_21). The logic correctly determines whether a new limit can reduce the estimated row count by comparing against any previously pushed limit before the aggregation.

# Conflicts:
#	opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/DedupPushdownRule.java
Signed-off-by: Heng Qian <[email protected]>
Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to verify the checksum of clickbench result before being merged

Comment on lines +46 to +47
public class PPLAggGroupMergeRule extends RelRule<PPLAggGroupMergeRule.Config>
implements SubstitutionRule {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be wrapped in InterruptibleRelRule?

public class PPLAggGroupMergeRule extends InterruptibleRelRule<PPLAggGroupMergeRule.Config> implements SubstitutionRule

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterruptibleRelRule is in package opensearch and has dependency on OpenSearchTimeoutException while this rule is package core.

Therefore, we cannot make this extends InterruptibleRelRule unless move that from package opensearch to core and add library opensearch in core gradle.

On the other hand, if there is interrupt triggered in planning process, it should be detected in our push down rules in package opensearch.

@LantaoJin LantaoJin mentioned this pull request Dec 25, 2025
8 tasks
throw new IllegalStateException(String.format("Cannot infer value from RexNode %s", node));
}

RexNode inferRexNodeFromIndex(int index, Project project) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add private

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should give the methods in AggregateBuilderHelper package level accessibility. I see all methods in this class are using default symbol

return project == null ? RexInputRef.of(index, rowType) : project.getProjects().get(index);
}

String inferFieldNameFromIndex(int index, Project project) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Prune old RelNode in push down rules

3 participants