Skip to content

Conversation

@songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Nov 20, 2025

Description

We observed there are some common user queries following the pattern like source=tableA | rex '...<?rex_field>...' | join left=a right=b on a.rex_field=b.field tableB. Join keys reference to a expression output instead of original table field. This query will be transformed to SortMergeJoin where it sorts by join keys on each join child.

SortMergeJoin inserts EnumerableSort operators at physical plan optimization. Previous sort expression pushdown optimization #4750 only provides ability of logical plan optimization, aka pushdown LogicalSort into scan. This PR expands the ability to push down EnumerableSort by simple or complex sort expressions in case of SortMergeJoin.

Related Issues

Resolves #4823

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

yuancu
yuancu previously approved these changes Nov 24, 2025
Comment on lines +970 to +974
public void testSimpleSortExprPushdownForSMJ() throws Exception {
String query =
"source=opensearch-sql_test_index_bank | join left=a right=b on a.age + 1 = b.balance - 20"
+ " opensearch-sql_test_index_bank";
var result = explainQueryYaml(query);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that this query is not executable with the following error:

java.lang.RuntimeException: java.sql.SQLException: exception while executing query: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap')
        at org.opensearch.sql.opensearch.executor.OpenSearchExecutionEngine.lambda$execute$2(OpenSearchExecutionEngine.java:217) ~[opensearch-sql-3.4.0.0-SNAPSHOT.jar:3.4.0.0-SNAPSHOT]

It may result from the type difference between age and balance. I see quite a lot type cast to Integer in the generated code.

This should be another problem that's irrelevant to this optimization though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@songkant-aws does this issue be fixed? I don't see any IT for correctness. can you add more tests in CalcitePPLJoinIT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is caused by type mismatch at runtime due to different index mapping types. The plan explanation can still pass. I think it's not a scope of this PR. Dynamic type conversion at runtime or type validation checker could potentially address it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 27, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Enabled sort expression pushdown for sort-merge join operations, improving query optimization and execution efficiency.
  • Tests

    • Added comprehensive test coverage for sort expression pushdown scenarios in joins, including simple and complex expressions with various configuration options.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Refactors sort-expression pushdown to operate on abstract scan types, adds pushdownSortExpr and copy() to AbstractCalciteIndexScan, implements copy() in CalciteEnumerableIndexScan, removes sort pushdown internals from CalciteLogicalIndexScan, updates planner rules to match Sort/Project/AbstractCalciteIndexScan with convention guards, and adds integration tests plus expected explain outputs for SMJ sort-pushdown scenarios.

Changes

Cohort / File(s) Summary
Tests & Expected Outputs
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java, integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java, integ-test/src/test/resources/expectedOutput/calcite/*.yaml, integ-test/src/test/resources/expectedOutput/calcite/*.json, integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/*.yaml
Added multiple integration tests exercising simple and complex sort-expression pushdown with Sort-Merge-Join (SMJ) and pass-through scenarios; added corresponding expected explain-plan YAML/JSON outputs (both pushdown and no-pushdown variants).
Planner Rules
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java, opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java
Switched operand matching from Logical* types to Sort/Project/AbstractCalciteIndexScan, added convention-matching guards, adjusted casting and limit-pushdown checks, and updated helper method signatures.
Index Scan Abstraction & Pushdown Logic
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Moved pushdownSortExpr implementation into AbstractCalciteIndexScan and added copy() abstract API; implemented copy() in CalciteEnumerableIndexScan to clone PushDownContext; removed script-sort/pushdown internals and related imports from CalciteLogicalIndexScan.

Sequence Diagram(s)

sequenceDiagram
    participant Rule as SortExprIndexScanRule
    participant Project as Project
    participant Scan as AbstractCalciteIndexScan
    participant PushCtx as PushDownContext

    Note over Rule,Project: Rule matches (Sort, Project, AbstractCalciteIndexScan)
    Rule->>Rule: verify same convention
    alt conventions match
        Rule->>Project: mapThroughProject(...) // collect expressions
        Rule->>Scan: extractSortExpressionInfos(...)
        Rule->>Scan: pushdownSortExpr(sortExprDigests)
        Scan->>Scan: build SortBuilders (field or script)
        Scan->>PushCtx: clone/create PushDownContext with SORT/PROJECT entries
        PushCtx-->>Scan: updated scan (pushdown registered)
        Scan-->>Rule: return copy() of scan with new pushdown
    else conventions differ
        Rule-->>Rule: no transformation (early return)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay special attention to:
    • Correctness of operand-type changes and convention checks in planner rules.
    • pushdownSortExpr behavior in AbstractCalciteIndexScan (field vs script sort, type mapping, error handling).
    • copy() implementation in CalciteEnumerableIndexScan to ensure PushDownContext is deep-cloned.
    • Exactness of new/updated expected explain outputs and ordering of PushDownContext entries.

Suggested labels

pushdown, PPL

Suggested reviewers

  • ps48
  • penghuo
  • kavithacm
  • derek-ho
  • joshuali925
  • noCharger
  • dai-chen
  • Yury-Fridlyand
  • MaxKsyunz
  • vamsimanohar
  • anirudha

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.52% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Support sort expression pushdown for SortMergeJoin' clearly summarizes the main change—extending sort expression pushdown capability to the physical plan level for SortMergeJoin scenarios.
Description check ✅ Passed The description is well-related to the changeset, explaining the context (SortMergeJoin queries with expression-based join keys), the problem (previous pushdown only at logical level), and the solution (extending to physical-level EnumerableSort pushdown).
Linked Issues check ✅ Passed The PR fulfills linked issue #4823's objectives by extending sort expression pushdown to support EnumerableSort (physical level) for SortMergeJoin, enabling both simple and complex expression-based sort pushdown, with corresponding test coverage.
Out of Scope Changes check ✅ Passed All changes are in-scope: test additions (CalciteExplainIT, CalcitePPLJoinIT), expected output resources, rule modifications (SortExprIndexScanRule, SortProjectExprTransposeRule), and abstract/concrete scan class updates to support sort expression pushdown for SortMergeJoin.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java (1)

418-492: Sort expression pushdown implementation is well-structured.

The method correctly handles both simple field sorts and complex expression sorts through different code paths:

  • Simple field references use fieldSort (lines 446-456)
  • Complex expressions use scriptSort with PredicateAnalyzer.ScriptQueryExpression (lines 457-478)

The use of Supplier pattern for deferred SortBuilder creation is appropriate, and the MISSING_MAX parameter (lines 459-464) correctly controls null handling for different NullDirection values.

Minor observation: The assertion on line 458 (assert sortExpr instanceof RexCall) will be stripped in production builds. Consider whether an explicit runtime check with exception is more appropriate for production safety.

Consider replacing the assertion with an explicit runtime check:

-        assert sortExpr instanceof RexCall : "sort expression should be RexCall";
+        if (!(sortExpr instanceof RexCall)) {
+          throw new PushDownUnSupportedException(
+              "Complex sort expression must be RexCall, got: " + sortExpr.getClass());
+        }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b88bf56 and b7d2441.

📒 Files selected for processing (14)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_push.json (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_pushdown_for_smj.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_single_expr_output_push.json (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_simple_sort_expr_pushdown_for_smj.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_sort_pass_through_join_then_pushdown.yaml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java (6 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (2 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java (5 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (1)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Project.java (1)
  • Project (26-137)
opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java (1)
legacy/src/main/java/org/opensearch/sql/legacy/query/planner/logical/node/Project.java (1)
  • Project (26-137)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchTextType.java (1)
  • OpenSearchTextType (21-106)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilder.java (1)
  • PushDownUnSupportedException (78-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (21)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_simple_sort_expr_pushdown_for_smj.yaml (1)

1-21: Expected output for non-pushdown SMJ scenario looks correct.

The YAML correctly represents a SortMergeJoin plan where sort operations remain at the Enumerable level (EnumerableSort at lines 15 and 18) rather than being pushed down into the index scans. This aligns with the PR's goal of testing both pushdown and non-pushdown scenarios.

integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_single_expr_output_push.json (1)

4-4: PushDownContext ordering change looks intentional.

The reordering of PROJECT before SORT in the PushDownContext representation appears to be a consequence of the refactoring to abstract-level pushdown. The actual query behavior (projecting balance, sorting by balance ascending with nulls first, limit 10000) remains correct.

integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_push.json (1)

4-4: Consistent with other expected output updates.

The PushDownContext ordering change (PROJECT before SORT) is consistent with the pattern seen in other updated expected outputs. The underlying query semantics remain unchanged.

integ-test/src/test/resources/expectedOutput/calcite/explain_simple_sort_expr_pushdown_for_smj.yaml (1)

1-28: Expected output correctly demonstrates sort pushdown for SortMergeJoin.

The physical plan shows that sort operations are successfully pushed down into CalciteEnumerableIndexScan (visible in PushDownContext at lines 16-21 and 23-28), eliminating the EnumerableSort operators that appear in the no-pushdown variant. This validates the PR's core functionality.

Key observations:

  • First scan sorts by age (matching +($10, 1) join key component)
  • Second scan sorts by balance (matching -($7, 20) join key component)
  • requestedTotalSize=2147483647 on line 21 correctly represents an unbounded first input
  • requestedTotalSize=50000 on line 28 correctly reflects the JOIN_SUBSEARCH_MAXOUT limit
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)

106-110: I'm unable to verify the review comment because the repository cloning is failing. However, examining the review comment itself reveals an important issue:

The review comment is incomplete—it ends with a verification request `` but no verification results are provided.

The original review comment:

  1. Approves the @Override annotation addition (lines 106-110)
  2. Requests verification that AbstractCalciteIndexScan contains the pushdownSortExpr() method
  3. Includes an ast-grep shell script for verification
  4. But does not include the results of that verification

Since I cannot access the repository to complete the verification myself, and the review comment is presented as incomplete with an outstanding verification request, here is my rewritten review comment:


@Override annotation addition requires verification of parent class implementation.

The @Override annotation on copy() (lines 106-110) is correct if AbstractCalciteIndexScan contains the pushdownSortExpr() implementation. This refactoring should move pushdown logic to the abstract base class, enabling both logical and physical index scans to leverage the same SortMergeJoin pushdown capability. Verification needed: Confirm that AbstractCalciteIndexScan contains the pushdownSortExpr() method before approving.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java (1)

73-77: LGTM! Proper deep-copy implementation.

The copy() method correctly creates a new scan instance with a cloned pushDownContext, which is essential to prevent shared mutable state across different plan alternatives during optimization.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj.yaml (1)

1-20: Test resource properly structured for baseline validation.

This YAML captures the expected plan when sort expression pushdown is disabled, showing EnumerableSort operators remaining above scans. This serves as an appropriate baseline for comparing against the pushdown-enabled variant.

integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml (1)

1-17: LGTM! Pushdown context properly demonstrates sort expression pushdown.

The YAML correctly shows sort expression pushdown into the scan operator with detailed script parameters. The PushDownContext on line 16 demonstrates the successful pushdown of the REX_EXTRACT sort expression with appropriate MISSING_MAX:false for NULLS_FIRST handling.

integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml (1)

1-21: Excellent test coverage demonstrating mixed sort pushdown scenarios.

This YAML effectively validates both complex (scripted REX_EXTRACT) and simple (field-based) sort expression pushdown within a Sort-Merge Join. The different null handling approaches (MISSING_MAX:true vs. missing: "_last") appropriately reflect the sort requirements on each join side.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortProjectExprTransposeRule.java (2)

49-51: Essential convention compatibility check.

The convention check correctly prevents the rule from transposing Sort and Project operators across convention boundaries, which is necessary since the rule now matches abstract operator types (both logical and physical).


133-140: Operand configuration correctly generalized to abstract types.

The changes from LogicalSort/LogicalProject to Sort/Project appropriately enable the rule to work with both logical and physical operators (e.g., EnumerableSort), which is essential for sort expression pushdown in Sort-Merge Join scenarios. The updated predicate correctly references Project::containsOver.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (3)

1013-1021: Test correctly validates complex sort expression pushdown for SMJ.

The test exercises the scenario where a REX_EXTRACT expression is used in a join key, forcing a Sort-Merge Join that requires sorting. The pushdown optimization should push this complex sort expression into the scan operator.

Note: A past review comment (lines 1024-1028) mentions a potential runtime type casting issue between Long and Integer, though it's noted as unrelated to this optimization.


1023-1031: Test appropriately covers simple arithmetic expression pushdown.

This test validates pushdown of simple arithmetic expressions (age + 1, balance - 20) used as join keys in a Sort-Merge Join, complementing the complex expression test case.


1033-1042: Test validates sort pass-through optimization with join.

This test correctly exercises the scenario where a sort operation after a join can be pushed down through the join and into the scan operator, validating the "pass-through" optimization path for left joins.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_sort_pass_through_join_then_pushdown.yaml (1)

1-20: Baseline test resource appropriately structured.

This YAML correctly captures the expected plan without sort pushdown optimization, showing EnumerableSort (line 15) remaining above the scan operator. This serves as the baseline for validating the pushdown optimization.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java (2)

265-265: Abstract method appropriately requires deep-copy implementation.

The abstract copy() method correctly enforces that subclasses implement proper deep-copy behavior, which is essential for the planner to duplicate relational expressions with independent pushdown state during optimization.


523-540: Based on the web search results, I can now verify the original review comment's claims:

The review comment's suggestion is incorrect. OpenSearch's script-based sorting supports only "number" and "string" types. The web search confirms:

  1. OpenSearch script sort limitation: OpenSearch _script sort configuration accepts only type "number" (for numeric values) or "string" (for text values). These are the only two types supported by OpenSearch's scripting sort functionality.

  2. SQL type mapping: While the SQL plugin documents that DATE/date_nanos map to SQL TIMESTAMP types, this does not mean these types can be directly used in script-based sorts. They would require explicit conversion to string or numeric values first.

  3. Current implementation correctness: The getScriptSortType() method correctly:

    • Maps CHAR_TYPESScriptSortType.STRING
    • Maps INT_TYPES and APPROX_TYPESScriptSortType.NUMBER
    • Throws PushDownUnSupportedException for unsupported types (DATE, TIMESTAMP, BINARY, etc.)

The throw for unsupported types is the correct behavior since OpenSearch cannot handle date/timestamp values directly in script sorts without prior conversion.


The review comment does not reflect actual OpenSearch capabilities and should be corrected.

The original suggestion to extend type mapping for DATE and TIMESTAMP support is based on an incorrect assumption about OpenSearch's script sort functionality. The current implementation is appropriate and doesn't need modification.

opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/SortExprIndexScanRule.java (4)

49-56: Convention consistency guard looks correct

Using Sort, Project, and AbstractCalciteIndexScan in the operands and then explicitly checking that all three share the same convention is a clean way to avoid mismatched logical/physical patterns now that the rule is more generic. No changes needed here.


133-138: Helper signatures generalized appropriately to AbstractCalciteIndexScan

Updating extractSortExpressionInfos and mapThroughProject to take AbstractCalciteIndexScan aligns them with the broader rule and keeps usage (getRowType().getFieldNames(), etc.) within the base API surface. The generalization looks type‑safe and consistent with the rest of the changes.

Also applies to: 169-175


244-262: Operand pattern broadening matches the rule’s new scope

Switching the config to:

  • root operand Sort.class,
  • middle Project.class,
  • leaf AbstractCalciteIndexScan.class,

is consistent with the goal of supporting both logical and enumerable paths (e.g., SMJ‑inserted EnumerableSort) and with the onMatch convention check and noAggregatePushed predicate.

Functionally this looks sound; just ensure your tests cover both logical (LogicalSort) and physical (EnumerableSort) scenarios so we don’t miss regressions when new conventions are introduced.


96-96: I'm unable to access the repository directly due to cloning issues. To complete the verification of this review comment, I need to examine the actual code. Could you please provide:

  1. The code snippet from SortExprIndexScanRule.java (lines 90-115) showing the context around the cast and the extractLimitValue logic
  2. The implementation of the extractLimitValue method to understand what it returns for non-LogicalSort instances
  3. Any usages of EnumerableSort in the codebase that might set fetch or offset values

Alternatively, if you can share the file content or the relevant diff from the PR, I can complete the verification and provide a definitive rewritten review comment.

Comment on lines +970 to +974
public void testSimpleSortExprPushdownForSMJ() throws Exception {
String query =
"source=opensearch-sql_test_index_bank | join left=a right=b on a.age + 1 = b.balance - 20"
+ " opensearch-sql_test_index_bank";
var result = explainQueryYaml(query);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@songkant-aws does this issue be fixed? I don't see any IT for correctness. can you add more tests in CalcitePPLJoinIT?

Comment on lines +1018 to +1019
"source=opensearch-sql_test_index_bank | rex field=lastname \\\"(?<initial>^[A-Z])\\\" |"
+ " join left=a right=b on a.initial = b.firstname opensearch-sql_test_index_bank";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a case of

source=opensearch-sql_test_index_bank
| rex field=lastname \\\"(?<initial>^[A-Z])\\\"
| join type=left max=1 initial opensearch-sql_test_index_bank

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this explain test, and some other IT test cases to verify correctness.

Comment on lines +507 to +511
if (SqlTypeName.CHAR_TYPES.contains(relDataType.getSqlTypeName())) {
return ScriptSortType.STRING;
} else if (SqlTypeName.INT_TYPES.contains(relDataType.getSqlTypeName())
|| SqlTypeName.APPROX_TYPES.contains(relDataType.getSqlTypeName())) {
return ScriptSortType.NUMBER;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it check type family?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SqlTypeFamily check is almost the same with above raw types check. Also, SqlTypeFamily check allows ANY type, which is not expected here.

Signed-off-by: Songkan Tang <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

1015-1055: Consider adding correctness verification tests.

These four new test methods appropriately verify the explain plans and optimization paths for sort expression pushdown with SortMergeJoin. However, they only validate plan structure, not query execution correctness. Given past review comments noting runtime type-casting issues, consider adding integration tests in CalcitePPLJoinIT or similar that verify the actual query results to ensure the optimization produces correct output.

Based on learnings, Calcite integration changes should test SQL generation and optimization paths—which these tests do. However, end-to-end correctness verification would strengthen confidence in the pushdown logic.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48dd737 and e5f0bba.

📒 Files selected for processing (4)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)

  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
🔇 Additional comments (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLJoinIT.java (3)

960-975: LGTM! Validates simple expression-based join for SMJ.

The test correctly validates SortMergeJoin with arithmetic expression join keys (a.age + 3 = b.age - 2). The expected results are accurate (Jane's age 20+3 = John's age 25-2 = 23, and John's age 25+3 = Hello's age 30-2 = 28).


977-1003: LGTM! Validates complex expression-based join with derived fields.

The test correctly validates SortMergeJoin with complex substring-based join keys. The expected results accurately reflect the cartesian product behavior when multiple records share the same derived field value (name2 and state2).


1005-1027: LGTM! Validates SMJ pushdown with max option and field list.

The test correctly validates that sort expression pushdown for SortMergeJoin works in combination with the max=1 option and field list syntax. The expected results align with the max-limiting behavior (one match per left-side record) and the field list join semantics.

@yuancu yuancu merged commit e7fc5f5 into opensearch-project:main Dec 10, 2025
36 of 37 checks passed
songkant-aws added a commit to songkant-aws/sql that referenced this pull request Dec 10, 2025
…t#4830)

* Support sort expression pushdown for SortMergeJoin

Signed-off-by: Songkan Tang <[email protected]>

* Remove old duplicate methods in CalciteLogicalIndexScan

Signed-off-by: Songkan Tang <[email protected]>

* Add more tests

Signed-off-by: Songkan Tang <[email protected]>

---------

Signed-off-by: Songkan Tang <[email protected]>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Dec 11, 2025
LantaoJin pushed a commit that referenced this pull request Dec 11, 2025
…#4830) (#4928)

* Support sort expression pushdown for SortMergeJoin (#4830)

* Support sort expression pushdown for SortMergeJoin

Signed-off-by: Songkan Tang <[email protected]>

* Remove old duplicate methods in CalciteLogicalIndexScan

Signed-off-by: Songkan Tang <[email protected]>

* Add more tests

Signed-off-by: Songkan Tang <[email protected]>

---------

Signed-off-by: Songkan Tang <[email protected]>

* Fix known issue of IT failure in 2.19-dev

Signed-off-by: Songkan Tang <[email protected]>

* Fix wrong pipe character

Signed-off-by: Songkan Tang <[email protected]>

---------

Signed-off-by: Songkan Tang <[email protected]>
@songkant-aws songkant-aws deleted the smj-sort-pushdown branch December 11, 2025 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-manually Filed a PR to backport manually. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support sort expressions pushdown for SortMergeJoin

3 participants