Skip to content

Conversation

@qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Dec 8, 2025

Description

This PR continues #4795 work, it includes change:

  1. Implement RexCall standardization by using RexNormalize.normalize
  2. Implement RelDataType standardization by widening the type, see details in RexStandardizer::widenType
  3. Support Sarg literal value by expanding SEARCH to conjunction expressions.
  4. Support decimal literal value by downgrading to double value.

Related Issues

Resolves #4757

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 8, 2025

📝 Walkthrough

Summary by CodeRabbit

  • Refactor

    • Improved script parameter handling and expression standardization, including enhanced numeric type widening and normalization for serialized scripts.
  • Tests

    • Updated and added integration/unit tests and expected outputs to align with revised serialization and plan standardization.
  • Behavioral

    • Explain-plan outputs now contain updated serialized script payloads; query semantics and high-level results remain unchanged.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds RexNode standardization and numeric-type widening into script serialization, extends ScriptParameterHelper to accept and expose a RexBuilder and widening stack, updates its call sites, adapts tests/benchmarks, and refreshes many integration expected outputs to reflect modified serialized script payloads.

Changes

Cohort / File(s) Summary
Rex standardization
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java
Implements RexNode standardization: expands SEARCH with fallback, applies RexNormalize before rebuilding calls, introduces numeric type widening and decimal-to-double translation, manages a widening flag stack during traversal.
Script parameter helper
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java
Adds RexBuilder field and Deque<Boolean> widening stack; extends constructors to accept a RexBuilder and exposes the builder via the helper state.
Call sites
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
Updates ScriptParameterHelper instantiation to pass cluster.getRexBuilder() (three-arg constructor).
Tests & benchmarks
opensearch/src/test/java/.../RelJsonSerializerTest.java
benchmarks/src/jmh/java/.../ExpressionScriptSerdeBenchmark.java
Adjusts tests/benchmark to use the new ScriptParameterHelper constructor and updates expectations to match standardization/type-widening behavior.
Integration tests (IT) resources
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/src/test/resources/expectedOutput/calcite/explain_extended_for_standardization.json
Adds testRexStandardizationForScript() and an expected explain JSON exercising the new standardization path.
Integration expected outputs (many)
integ-test/src/test/resources/expectedOutput/calcite/...
Examples: agg_case_cannot_push.yaml, clickbench/*.yaml, explain_*.yaml, explain_skip_script_encoding.json, udf_geoip_in_agg_pushed.yaml, explain_complex_sort_expr_pushdown_for_smj.yaml, etc.
Replaces embedded script payloads (base64 or script strings) in numerous Calcite CalciteEnumerableIndexScan PushDownContext outputs to reflect the new standardized/script-serialization results; surrounding plan structure unchanged.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client
participant Calcite as CalciteEngine
participant PredicateAnalyzer
participant ScriptHelper as ScriptParameterHelper
participant RexStd as RexStandardizer
participant OpenSearch
Client->>Calcite: Submit query
Calcite->>PredicateAnalyzer: Build predicate for pushdown
PredicateAnalyzer->>ScriptHelper: new ScriptParameterHelper(fieldList, types, params, RexBuilder)
ScriptHelper->>RexStd: Standardize RexNode (normalize, expand, widen types)
RexStd-->>ScriptHelper: Standardized RexNode + parameter mapping
ScriptHelper->>OpenSearch: Build script payload (with params)
OpenSearch-->>Client: Execute and return results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing focused review:
    • RexStandardizer: SEARCH expansion fallback, RexNormalize integration, widenType correctness and edge cases.
    • ScriptParameterHelper: constructor change impacts, stack usage, thread-safety and all updated call sites.
    • PredicateAnalyzer usage: correct RexBuilder propagation.
    • Integration expected outputs: confirm changes are limited to serialized script payloads and not unintended plan changes.
    • Updated unit/integration tests for coverage of standardization and type coercion.

Suggested reviewers

  • penghuo
  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • GumpacG
  • Swiddis
  • yuancu
  • anirudha
  • Yury-Fridlyand
  • mengweieric

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.17% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'RexCall and RelDataType standardization for script push down' directly and clearly summarizes the main changes in the PR, accurately reflecting the implementation of RexCall and RelDataType standardization work.
Description check ✅ Passed The PR description is well-related to the changeset, explaining the continuation of prior work (#4795), detailing four key implementation changes (RexCall standardization, RelDataType standardization, Sarg literal support, and decimal literal handling), and linking to issue #4757.
Linked Issues check ✅ Passed The PR satisfies the core objectives from issue #4757: RexCall standardization via RexNormalize.normalize, RelDataType type-widening standardization, Sarg literal expansion support, and decimal literal downgrading. Code changes in RexStandardizer, ScriptParameterHelper, and tests validate these standardization implementations.
Out of Scope Changes check ✅ Passed All code changes are directly in-scope: core standardization logic updates (RexStandardizer, ScriptParameterHelper, PredicateAnalyzer), test coverage expansion, and integration test output updates reflecting the new standardization behavior. No unrelated refactoring or unscoped changes detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae9f6f2 and c84e0ba.

📒 Files selected for processing (5)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_extended_for_standardization.json (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)

  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧠 Learnings (3)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_extended_for_standardization.json
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_extended_for_standardization.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (5)
integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj_w_max_option.yaml (1)

20-20: Script payload update preserves plan semantics

Line 20 keeps PROJECT, LIMIT, and SORT_EXPR unchanged while only updating the encoded script/params to the new standardized representation (VARCHAR with precision, DIGESTS, SOURCES, MISSING_MAX). This aligns with the pushdown and standardization objectives without altering logical behavior.

integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_pushdown_for_smj.yaml (1)

15-15: LIMIT and missing-last handling correctly pushed down

Line 15 now reflects LIMIT->50000 and explicit "missing" : "_last" in the sort within PushDownContext, matching the JOIN_SUBSEARCH_MAXOUT logical limit and nulls-last ordering. The rest of the scan and script payload remain consistent, so behavior is preserved with better pushdown fidelity.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

2125-2136: New Rex standardization explain test is well-shaped and consistent

Lines 2125–2136 add a focused test that:

  • Gates on enabledOnlyWhenPushdownIsEnabled(),
  • Uses explainQueryToString(..., true) with an extended plan,
  • Asserts via assertJsonEqualsIgnoreId against explain_extended_for_standardization.json.

This matches existing Calcite explain test patterns and correctly uses the JSON matcher for extended output, giving stable coverage for CASE/SARG standardization in script pushdown.

integ-test/src/test/resources/expectedOutput/calcite/explain_sort_pass_through_join_then_pushdown.yaml (1)

16-16: Updated sort script payload matches logical plan and join semantics

Line 16’s CalciteEnumerableIndexScan still projects the same fields and pushes down the same sort expression, now with an updated standardized script blob and MISSING_MAX=false for this non‑max join. The sort direction and NULLS_FIRST behavior match the logical plan, so semantics are preserved with improved script standardization.

integ-test/src/test/resources/expectedOutput/calcite/explain_extended_for_standardization.json (1)

1-7: Extended expected plan accurately captures standardized CASE/SARG script

Lines 1–7 introduce a logical/physical/extended plan that matches the new test:

  • Logical uses CASE(<($10, 30), ..., SEARCH($10, Sarg[[30..40]]), ...) over age, aligning with the CASE expression in the query.
  • Physical shows a single CalciteEnumerableIndexScan with composite aggregation, AGGREGATION + PROJECT + LIMIT->10000 in PushDownContext, and a fully standardized script tree (dynamic params, widened BIGINT/VARCHAR, SOURCES/DIGESTS).
  • Extended binder uses the stashed CalciteEnumerableIndexScan and returns Object[].class, consistent with other extended explain resources.

This gives good golden coverage for the new Rex standardization behavior.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml (1)

8-8: Consider adding inline documentation for complex embedded scripts.

The base64-encoded script payloads make this test resource difficult to review and maintain. For future maintainability:

  • Add a comment before the physical plan section explaining what transformations the scripts represent (e.g., "Script standardized to use RexDynamicParameter for field references and literals").
  • Consider documenting the expected structure of SOURCES and DIGESTS arrays to clarify how they map to the new parameter scheme.

This will help reviewers and maintainers understand the intent of the standardization without requiring them to manually decode the base64 content.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java (3)

67-75: Silent exception swallowing may hide issues.

The empty catch block silently ignores all exceptions from RexUtil.expandSearch. While the fallback to use the original call is acceptable, logging the exception at DEBUG level would help with troubleshooting.

       try {
         return RexUtil.expandSearch(helper.rexBuilder, null, call).accept(this, helper);
-      } catch (Exception ignored) {
+      } catch (Exception e) {
         // Continue to use the original call if failing to expand.
         // We can downgrade to still use `Sarg` literal instead of replacing it with parameter.
+        // Log at debug level for troubleshooting
       }

107-107: Potential NPE when stack is empty during nested visitation.

helper.stack.peek() returns null when the stack is empty. This can occur when visitInputRef or visitLiteral is called outside of a visitCall context (e.g., top-level literal or input ref). The widenType method handles null correctly with allowNumericTypeWiden == null || allowNumericTypeWiden, but this implicit behavior should be documented.

Consider adding a brief comment explaining the null-safe behavior:

// stack.peek() returns null when outside RexCall context, which enables widening by default
return new RexDynamicParam(widenType(field.getType(), helper.stack.peek()), newIndex);

Also applies to: 129-129


248-248: Prefer throwing an exception over assertion for runtime invariant.

Assertions can be disabled at runtime with -da JVM flag. For a critical invariant that indicates a bug, throwing an explicit exception ensures the check is never bypassed.

-    assert helper.stack.isEmpty() : "[BUG]Stack should be empty before standardizing";
+    if (!helper.stack.isEmpty()) {
+      throw new IllegalStateException("[BUG] Stack should be empty before standardizing");
+    }
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java (1)

35-38: Consider encapsulating stack access.

The stack field is final but mutable, and Lombok's @Getter exposes it directly. RexStandardizer accesses helper.stack.push(), helper.stack.pop(), and helper.stack.peek() directly. While this works, exposing the raw Deque allows callers to manipulate it arbitrarily, which could lead to misuse.

Consider providing specific methods for stack operations and marking the field as package-private or excluding it from the getter:

@Getter(AccessLevel.NONE)
final Deque<Boolean> stack = new ArrayDeque<>();

public void pushNumericWidenFlag(boolean allow) { stack.push(allow); }
public void popNumericWidenFlag() { stack.pop(); }
public Boolean peekNumericWidenFlag() { return stack.peek(); }

This is optional since RexStandardizer is in the same package and the current approach is simpler.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8126367 and 384f776.

📒 Files selected for processing (37)
  • benchmarks/src/jmh/java/org/opensearch/sql/expression/operator/predicate/ExpressionScriptSerdeBenchmark.java (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_cannot_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_composite_cannot_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_num_res_cannot_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_counts_by6.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_udt_arg_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_no_expr_output_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_project_then_sort.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_single_expr_output_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_nested_expr.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push7.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_ip_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_isblank.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_isempty_or_others.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_min_max_agg_on_derived_field.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_simple_pattern_agg_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_regex.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_regex_negated.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_regexp_match_in_where.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_script_push_on_text.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_skip_script_encoding.json (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_complex_and_simple_expr.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_text_ilike_function.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_text_like_function.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_text_like_function_case_insensitive.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/udf_geoip_in_agg_pushed.yaml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java (7 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java (3 hunks)
  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java (6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • benchmarks/src/jmh/java/org/opensearch/sql/expression/operator/predicate/ExpressionScriptSerdeBenchmark.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_num_res_cannot_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_regexp_match_in_where.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_sort_complex_and_simple_expr.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_isempty_or_others.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_regex_negated.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_counts_by6.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_simple_pattern_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_skip_script_encoding.json
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_cannot_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_text_like_function_case_insensitive.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_isblank.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_project_then_sort.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_no_expr_output_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_nested_expr.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_single_expr_output_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_min_max_agg_on_derived_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_text_ilike_function.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/udf_geoip_in_agg_pushed.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push7.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_regex.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_script_push_on_text.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_udt_arg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_ip_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_composite_cannot_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_text_like_function.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java
🧬 Code graph analysis (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/SortExprDigest.java (1)
  • Getter (23-111)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
  • OpenSearchTypeFactory (63-411)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (36)
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml (1)

8-8: Verify the embedded script payload transformations align with RexNode standardization logic.

The physical plan's embedded script sources (base64-encoded) have changed, which is expected given the RexNode standardization implementation. However, without decoding these scripts and comparing them to the implementation changes, I cannot verify the correctness of the transformations.

Key concerns:

  • The SOURCES array [0,2] and DIGESTS array structure need verification to ensure they correctly reflect the standardized dynamic parameters.
  • Confirm that the parameter remapping from field references and literals to RexDynamicParameter is correctly represented in the script payload.
  • Ensure this test output aligns with the changes in ScriptParameterHelper and the RexBuilder integration mentioned in the PR context.

Please confirm:

  1. The test passes after the RexNode standardization implementation changes.
  2. The base64-encoded scripts correctly represent the standardized RexNode expressions (e.g., RexInputRefRexDynamicParameter, type narrowing applied).
  3. The parameter arrays (SOURCES, DIGESTS) correctly map to the new standardized parameters and are consistent with ScriptParameterHelper's new behavior.

Alternatively, if you can provide details on how these expected outputs were generated or validated, that would help verify the standardization is working as intended.

integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_nested_expr.yaml (1)

10-10: Verify whether line 10 is a duplicate or a legitimate expected-output change.

The provided snippet shows only the final state of line 10 without context of removed/replaced code. This makes it unclear whether the CalciteEnumerableIndexScan represents:

  • A legitimate update to expected output due to RexNode standardization changes, or
  • An unintended duplication requiring removal.

Additionally, validate that the Base64-encoded script within the PushDownContext reflects the correct standardization transformations (e.g., dynamic parameters replacing literals, type widening to BIGINT).

integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_simple_pattern_agg_push.yaml (1)

10-10: Test expected output updated with new script serialization.

The base64-encoded script payload has changed, reflecting the RexNode standardization. The surrounding structure (PushDownContext, aggregation metadata, field references) is preserved correctly.

To validate the script content change is correct, please verify against the RexStandardizer implementation that the new base64 payload represents the expected standardized form of the expression. Since the actual source files (ScriptParameterHelper.java, RexStandardizer.java, PredicateAnalyzer.java) are not included in the review context, I cannot decode and validate the new script bytecode.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml (1)

16-16: Aggregation script payload updated for RexNode standardization.

The base64-encoded script in the multi-term aggregation for "Src" field has been updated. The PushDownContext (PROJECT, FILTER, AGGREGATION, SORT_AGG_METRICS, LIMIT) and overall request structure remain consistent.

Confirm that the new script payload represents the standardized form from RexStandardizer. Additionally, verify that the "SOURCES" and "DIGESTS" arrays in the script parameters match the expected field mappings and constant values after standardization.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml (1)

18-18: Complex aggregation plan updated with standardized scripts.

The plan includes multiple embedded scripts (REGEXP_REPLACE for "k" and CHAR_LENGTH for "l"). Both base64-encoded payloads have been updated, while the composite aggregation structure and nested aggregations remain intact.

Verify that both embedded scripts use standardized RexNode representations and that their parameter mappings (SOURCES/DIGESTS arrays) are correctly aligned with the composite aggregation structure. Confirm script parameter indices don't conflict across multiple scripts in the same request.

integ-test/src/test/resources/expectedOutput/calcite/agg_case_num_res_cannot_push.yaml (1)

9-9: CASE expression script updated with literal standardization.

The script payload reflects a CASE expression with standardized literal parameters. The DIGESTS array shows "age" at index 0 and numeric literals (30, 30, 100) assigned to dynamicParam indices, consistent with literal standardization objectives.

Confirm that identical literals (e.g., the two 30 values) are correctly mapped to the same dynamicParam index in SOURCES array [0,2,2,2], enabling script cache reuse across queries with different literal values.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q19.yaml (1)

12-12: UDT (User-Defined Type) script updated for EXTRACT function.

The script now includes a UDT field ("udt": "EXPR_TIMESTAMP") for timestamp handling, which may relate to the RFC objective of addressing "timestamp encoding that prevents cache hits." The EXTRACT('minute') function is marked as deterministic.

Confirm that the UDT standardization approach (including timestamp UDT metadata) enables script cache sharing across queries with different timestamp values. Verify that the deterministic flag is set correctly based on the function's idempotency.

integ-test/src/test/resources/expectedOutput/calcite/explain_sort_complex_and_simple_expr.yaml (1)

10-10: Sort expression script with potential parameter mapping concern.

The script for sort operation (+($10, $7)) shows SOURCES=[0,0] with DIGESTS=["age","balance"]. Verify that this mapping correctly represents the arithmetic operation; SOURCES array may need indices [0,1] if operands reference different fields.

Confirm the SOURCES array values [0,0] are intentional. If both operands should reference distinct fields (age and balance), SOURCES should likely be [0,1]. Additionally, verify the "MISSING_MAX": false parameter is set correctly for the sort behavior.

integ-test/src/test/resources/expectedOutput/calcite/explain_skip_script_encoding.json (1)

1-1: JSON structure reformatted with standardized scripts.

The file now uses a single top-level JSON object with nested "calcite" structure. Two embedded scripts within the AND condition both use standardized RexNode representations with SOURCES/DIGESTS parameter arrays.

Confirm that:

  1. The JSON structure change (single top-level object vs. separate blocks) is intentional and aligns with test infrastructure changes
  2. Both embedded scripts have non-conflicting parameter indices (Script 1: SOURCES=[1,2]; Script 2: SOURCES=[0,2,2])
  3. The escaped JSON content is complete and correctly represents the original script expressions
integ-test/src/test/resources/expectedOutput/calcite/agg_case_cannot_push.yaml (1)

9-9: Nested CASE expression script with comprehensive literal standardization.

The script encodes a complex CASE with nested conditions and AND operators. The SOURCES array [0,2,2,0,2,0,2,2,2] demonstrates literal standardization where identical values (e.g., 30 at indices 1&2, 40, 100) are reused via consistent dynamicParam references.

Verify that:

  1. The nested CASE expression is correctly flattened and standardized
  2. Literal reuse across multiple branches (DIGESTS: age, 30, 30, 40, 100) reduces script size as intended
  3. The SOURCES indices correctly map operands to parameters for all nested conditions
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java (3)

46-57: Good documentation of standardization steps.

The updated JavaDoc clearly documents the standardization process including the type widening step. The TODO for constant folding is appropriately noted.


76-87: Good stack management with try-finally.

The stack push/pop pattern using try-finally ensures proper cleanup even if exceptions occur during operand processing. The determination of allowNumericTypeWiden based on arithmetic/comparison operations is appropriate.


228-244: Type widening logic is correct and well-documented.

The widening rules are appropriate:

  • Integer types → BIGINT
  • Floating-point/decimal types → DOUBLE
  • Character types → VARCHAR

The nullable=true for all widened types ensures consistency across standardized expressions.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ScriptParameterHelper.java (1)

61-79: Constructor changes are appropriate.

The addition of RexBuilder parameter to both constructors and proper delegation is correct. The field assignments are consistent.

integ-test/src/test/resources/expectedOutput/calcite/explain_regexp_match_in_where.yaml (1)

7-8: Test expectation updated for type standardization.

The updated base64-encoded script reflects the new type widening (VARCHAR with nullable: true and precision: -1). The SOURCES and DIGESTS params ([1,2] and ["name","hello"]) correctly identify the field reference (source 1) and literal value (source 2).

integ-test/src/test/resources/expectedOutput/calcite/explain_text_like_function_case_insensitive.yaml (1)

7-8: Test expectation correctly reflects ILIKE standardization.

The updated script payload shows proper standardization of the ILIKE expression with three operands. The SOURCES array [1,2,2] correctly identifies: address field (source 1=SOURCE), and two literals (source 2=LITERAL). All parameters are widened to VARCHAR as expected.

integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_single_expr_output_push.yaml (1)

9-10: Test expectation shows correct BIGINT widening for arithmetic expression.

The sort expression +($0, $1) correctly has both operands widened to BIGINT (nullable: true) since it's a binary arithmetic operation. The SOURCES array [0,0] correctly identifies both as DOC_VALUE sources, and DIGESTS contains the field names ["age","balance"].

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_script_push.yaml (1)

8-8: Test expected output requires verification against implementation code.

The embedded script payload (base64-encoded) has been updated, but comprehensive verification requires reviewing the corresponding implementation changes in ScriptParameterHelper, RexStandardizer, and PredicateAnalyzer to confirm the standardization logic correctly transformed the RexNode expressions.

integ-test/src/test/resources/expectedOutput/calcite/explain_regex_negated.yaml (1)

9-9: LGTM - Script payload updated for standardization.

The embedded script content reflects the new RexNode standardization. The overall plan structure and parameters remain consistent.

integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_udt_arg_push.yaml (1)

11-11: LGTM - Aggregation script payload updated.

The script standardization is applied consistently within the aggregation push-down context.

integ-test/src/test/resources/expectedOutput/calcite/explain_regex.yaml (1)

9-9: LGTM - Regex script standardization applied.

The updated script payload is consistent with the standardization objectives.

integ-test/src/test/resources/expectedOutput/calcite/explain_isblank.yaml (1)

8-8: LGTM - IsBlank function script updated.

The script payload change maintains the same logical behavior with standardized representation.

integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_project_then_sort.yaml (1)

9-9: LGTM - Sort expression script with type standardization.

The script payload shows type widening (INTEGER → BIGINT) which aligns with the type standardization objectives. The sort expression push-down is correctly maintained.

benchmarks/src/jmh/java/org/opensearch/sql/expression/operator/predicate/ExpressionScriptSerdeBenchmark.java (1)

78-82: LGTM - Benchmark updated for new constructor signature.

The benchmark correctly passes the rexBuilder parameter to ScriptParameterHelper, aligning with the constructor signature update. This ensures the benchmark tests the new RexNode standardization serialization path.

integ-test/src/test/resources/expectedOutput/calcite/explain_isempty_or_others.yaml (1)

8-8: LGTM - Script standardization for composite filter.

The script payload for the OR condition with IS NULL, EQUALS, and IS EMPTY is correctly updated with standardized representation.

integ-test/src/test/resources/expectedOutput/calcite/explain_text_like_function.yaml (1)

8-8: Verify test expectations reflect correct standardization.

The base64-encoded script payload has been updated as part of RexNode standardization. Ensure this expected output was regenerated by running the actual integration tests, not manually edited.

integ-test/src/test/resources/expectedOutput/calcite/explain_complex_sort_expr_push.yaml (1)

10-10: Test expectation updated to reflect standardized script generation.

The updated base64-encoded script payload is the expected result of RexNode standardization changes implemented in this PR.

integ-test/src/test/resources/expectedOutput/calcite/explain_text_ilike_function.yaml (1)

8-8: Test expectation updated to reflect standardized script generation.

The updated base64-encoded script payload aligns with the RexNode standardization changes.

integ-test/src/test/resources/expectedOutput/calcite/explain_min_max_agg_on_derived_field.yaml (1)

8-8: Test expectation updated with standardized aggregation scripts.

The updated MIN and MAX aggregation script payloads reflect the standardization changes while maintaining the same aggregation semantics.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java (1)

1472-1474: LGTM! RexBuilder correctly propagated to ScriptParameterHelper.

The updated constructor call correctly passes cluster.getRexBuilder() to enable RexNode standardization in script parameter handling.

opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java (6)

64-65: LGTM! Test updated to use new ScriptParameterHelper constructor.

The constructor call correctly includes rexBuilder parameter to support RexNode standardization testing.


112-113: LGTM! Consistent constructor update.

The constructor call correctly includes rexBuilder parameter, consistent with other test methods.


128-129: LGTM! Exception test updated consistently.

Constructor call properly updated to include rexBuilder parameter even in the exception test path.


145-146: LGTM! Consistent constructor update.

The constructor call correctly includes rexBuilder parameter for out-of-scope function testing.


178-179: LGTM! Consistent constructor update.

The constructor call correctly includes rexBuilder parameter for index remapping test.


213-214: LGTM! All test methods consistently updated.

The final constructor call correctly includes rexBuilder parameter, completing the consistent update across all test methods in this file.

Comment on lines 8 to 9
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_weblogs]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0},count()=COUNT()), PROJECT->[count(), info.city], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"info.city":{"terms":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQErXsKICAib3AiOiB7CiAgICAibmFtZSI6ICJJVEVNIiwKICAgICJraW5kIjogIklURU0iLAogICAgInN5bnRheCI6ICJTUEVDSUFMIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAib3AiOiB7CiAgICAgICAgIm5hbWUiOiAiR0VPSVAiLAogICAgICAgICJraW5kIjogIk9USEVSX0ZVTkNUSU9OIiwKICAgICAgICAic3ludGF4IjogIkZVTkNUSU9OIgogICAgICB9LAogICAgICAib3BlcmFuZHMiOiBbCiAgICAgICAgewogICAgICAgICAgImR5bmFtaWNQYXJhbSI6IDAsCiAgICAgICAgICAidHlwZSI6IHsKICAgICAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgICAgIH0KICAgICAgICB9LAogICAgICAgIHsKICAgICAgICAgICJkeW5hbWljUGFyYW0iOiAxLAogICAgICAgICAgInR5cGUiOiB7CiAgICAgICAgICAgICJ1ZHQiOiAiRVhQUl9JUCIsCiAgICAgICAgICAgICJ0eXBlIjogIk9USEVSIiwKICAgICAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICAgICAgfQogICAgICAgIH0KICAgICAgXSwKICAgICAgImNsYXNzIjogIm9yZy5vcGVuc2VhcmNoLnNxbC5leHByZXNzaW9uLmZ1bmN0aW9uLlVzZXJEZWZpbmVkRnVuY3Rpb25CdWlsZGVyJDEiLAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJNQVAiLAogICAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAgICJrZXkiOiB7CiAgICAgICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAgICAgInByZWNpc2lvbiI6IC0xCiAgICAgICAgfSwKICAgICAgICAidmFsdWUiOiB7CiAgICAgICAgICAidHlwZSI6ICJBTlkiLAogICAgICAgICAgIm51bGxhYmxlIjogZmFsc2UsCiAgICAgICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICAgICAic2NhbGUiOiAtMjE0NzQ4MzY0OAogICAgICAgIH0KICAgICAgfSwKICAgICAgImRldGVybWluaXN0aWMiOiB0cnVlLAogICAgICAiZHluYW1pYyI6IGZhbHNlCiAgICB9LAogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogMiwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogZmFsc2UsCiAgICAgICAgInByZWNpc2lvbiI6IDQKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[2,0,2],"DIGESTS":["my-datasource","host","city"]}},"missing_bucket":true,"missing_order":"first","order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_weblogs]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0},count()=COUNT()), PROJECT->[count(), info.city], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"info.city":{"terms":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQEr3sKICAib3AiOiB7CiAgICAibmFtZSI6ICJJVEVNIiwKICAgICJraW5kIjogIklURU0iLAogICAgInN5bnRheCI6ICJTUEVDSUFMIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAib3AiOiB7CiAgICAgICAgIm5hbWUiOiAiR0VPSVAiLAogICAgICAgICJraW5kIjogIk9USEVSX0ZVTkNUSU9OIiwKICAgICAgICAic3ludGF4IjogIkZVTkNUSU9OIgogICAgICB9LAogICAgICAib3BlcmFuZHMiOiBbCiAgICAgICAgewogICAgICAgICAgImR5bmFtaWNQYXJhbSI6IDAsCiAgICAgICAgICAidHlwZSI6IHsKICAgICAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICAgICAgICJwcmVjaXNpb24iOiAtMQogICAgICAgICAgfQogICAgICAgIH0sCiAgICAgICAgewogICAgICAgICAgImR5bmFtaWNQYXJhbSI6IDEsCiAgICAgICAgICAidHlwZSI6IHsKICAgICAgICAgICAgInVkdCI6ICJFWFBSX0lQIiwKICAgICAgICAgICAgInR5cGUiOiAiT1RIRVIiLAogICAgICAgICAgICAibnVsbGFibGUiOiB0cnVlCiAgICAgICAgICB9CiAgICAgICAgfQogICAgICBdLAogICAgICAiY2xhc3MiOiAib3JnLm9wZW5zZWFyY2guc3FsLmV4cHJlc3Npb24uZnVuY3Rpb24uVXNlckRlZmluZWRGdW5jdGlvbkJ1aWxkZXIkMSIsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIk1BUCIsCiAgICAgICAgIm51bGxhYmxlIjogZmFsc2UsCiAgICAgICAgImtleSI6IHsKICAgICAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAgICAgIm51bGxhYmxlIjogZmFsc2UsCiAgICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgICB9LAogICAgICAgICJ2YWx1ZSI6IHsKICAgICAgICAgICJ0eXBlIjogIkFOWSIsCiAgICAgICAgICAibnVsbGFibGUiOiBmYWxzZSwKICAgICAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgICAgICJzY2FsZSI6IC0yMTQ3NDgzNjQ4CiAgICAgICAgfQogICAgICB9LAogICAgICAiZGV0ZXJtaW5pc3RpYyI6IHRydWUsCiAgICAgICJkeW5hbWljIjogZmFsc2UKICAgIH0sCiAgICB7CiAgICAgICJkeW5hbWljUGFyYW0iOiAyLAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAgICJwcmVjaXNpb24iOiAtMQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[2,0,2],"DIGESTS":["my-datasource","host","city"]}},"missing_bucket":true,"missing_order":"first","order":"asc"}}}]}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Explain the base64-encoded script changes and verify consistency across related test files.

The base64-encoded Calcite script payload on line 9 has been updated. Please clarify:

  1. What changed in the script content relative to the previous version and why
  2. Whether all related expected output YAML files in expectedOutput/calcite/ have been updated consistently for the standardization objectives

Comment on lines 203 to 207
// OpenSearch doesn't accept big decimal value in its script.
// So try to return its double value directly as we don't really support decimal literal.
if (SqlTypeUtil.isDecimal(literal.getType()) && literal.getValue() instanceof BigDecimal) {
return ((BigDecimal) literal.getValue()).doubleValue();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Precision loss converting BigDecimal to double.

Converting BigDecimal to double via doubleValue() can lose precision for values with more than ~15-17 significant digits. While the comment acknowledges OpenSearch's limitation and that decimal literals aren't truly supported, this could silently corrupt high-precision numeric literals.

Consider logging a warning when precision might be lost:

     if (SqlTypeUtil.isDecimal(literal.getType()) && literal.getValue() instanceof BigDecimal) {
-      return ((BigDecimal) literal.getValue()).doubleValue();
+      BigDecimal bd = (BigDecimal) literal.getValue();
+      // Note: Precision may be lost for values with > 15-17 significant digits
+      return bd.doubleValue();
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// OpenSearch doesn't accept big decimal value in its script.
// So try to return its double value directly as we don't really support decimal literal.
if (SqlTypeUtil.isDecimal(literal.getType()) && literal.getValue() instanceof BigDecimal) {
return ((BigDecimal) literal.getValue()).doubleValue();
}
// OpenSearch doesn't accept big decimal value in its script.
// So try to return its double value directly as we don't really support decimal literal.
if (SqlTypeUtil.isDecimal(literal.getType()) && literal.getValue() instanceof BigDecimal) {
BigDecimal bd = (BigDecimal) literal.getValue();
// Note: Precision may be lost for values with > 15-17 significant digits
return bd.doubleValue();
}

@qianheng-aws qianheng-aws marked this pull request as draft December 8, 2025 04:18
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
@qianheng-aws qianheng-aws added the enhancement New feature or request label Dec 8, 2025
@qianheng-aws qianheng-aws added the pushdown pushdown related issues label Dec 8, 2025
Signed-off-by: Heng Qian <[email protected]>
# Conflicts:
#	integ-test/src/test/resources/expectedOutput/calcite/agg_case_cannot_push.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/agg_case_composite_cannot_push.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/agg_case_num_res_cannot_push.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/clickbench/q29.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/explain_agg_counts_by6.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_udt_arg_push.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_sum_enhancement.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_simple_pattern_agg_push.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/explain_script_push_on_text.yaml
#	integ-test/src/test/resources/expectedOutput/calcite/udf_geoip_in_agg_pushed.yaml
Signed-off-by: Heng Qian <[email protected]>
@qianheng-aws qianheng-aws marked this pull request as ready for review December 8, 2025 09:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java (1)

204-210: Precision loss acknowledged with comment.

The comment now documents the precision limitation when converting BigDecimal to double. This addresses the concern raised in previous reviews.

🧹 Nitpick comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java (1)

67-75: Overly broad exception handling may hide bugs.

Catching Exception is too broad and could mask programming errors (e.g., NullPointerException, IllegalArgumentException). Consider catching only the specific exception(s) that RexUtil.expandSearch can throw when expansion isn't possible.

     if (call.op.kind == SqlKind.SEARCH) {
       try {
         return RexUtil.expandSearch(helper.rexBuilder, null, call).accept(this, helper);
-      } catch (Exception ignored) {
+      } catch (UnsupportedOperationException ignored) {
         // Continue to use the original call if failing to expand.
         // We can downgrade to still use `Sarg` literal instead of replacing it with parameter.
       }
     }

Please verify which specific exceptions RexUtil.expandSearch throws when expansion fails and narrow the catch clause accordingly.

opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java (1)

312-350: Minor: Inconsistent variable naming.

The variable rexUpper doesn't reflect the test purpose (SEARCH expansion). Consider renaming to rexSearch for clarity.

-    RexNode rexUpper =
+    RexNode rexSearch =
         rexBuilder.makeCall(
             SqlStdOperatorTable.SEARCH,
             ...
-    String code = serializer.serialize(rexUpper, helper);
+    String code = serializer.serialize(rexSearch, helper);

The test logic itself correctly verifies SEARCH/Sarg expansion to OR expressions with widened BIGINT types.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 384f776 and 2f1495d.

📒 Files selected for processing (15)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_cannot_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_composite_cannot_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_num_res_cannot_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_counts_by6.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_udt_arg_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_sum_enhancement.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_min_max_agg_on_derived_field.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_simple_pattern_agg_push.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_script_push_on_text.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/udf_geoip_in_agg_pushed.yaml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java (7 hunks)
  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java (9 hunks)
✅ Files skipped from review due to trivial changes (1)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_sum_enhancement.yaml
🚧 Files skipped from review as they are similar to previous changes (8)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_with_script.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_script_udt_arg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_cannot_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/clickbench/q40.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_min_max_agg_on_derived_field.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/explain_agg_counts_by6.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/udf_geoip_in_agg_pushed.yaml
🧰 Additional context used
📓 Path-based instructions (3)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java
🧠 Learnings (3)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/resources/expectedOutput/calcite/explain_script_push_on_text.yaml
  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java
  • integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_simple_pattern_agg_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_composite_cannot_push.yaml
  • integ-test/src/test/resources/expectedOutput/calcite/agg_case_num_res_cannot_push.yaml
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java
🧬 Code graph analysis (2)
opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
  • OpenSearchTypeFactory (63-411)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
  • OpenSearchTypeFactory (63-411)
🔇 Additional comments (10)
integ-test/src/test/resources/expectedOutput/calcite/explain_patterns_simple_pattern_agg_push.yaml (1)

10-10: Expected output updated to reflect standardized script payload.

The embedded script payload in the PushDownContext has been updated as part of RexNode standardization. The identical base64-encoded script strings in both patterns_field and pattern_count locations correctly represent the same aggregation expression. The YAML structure is valid and maintains proper indentation.

Please confirm that:

  1. Integration tests pass with these updated expectations
  2. The standardized scripts in the base64 payloads represent properly canonical/standardized RexNode expressions (with literals converted to dynamic parameters, types widened, etc. per the PR objectives)

If needed, I can generate a verification script to decode and inspect the base64 payload to validate the standardization.

integ-test/src/test/resources/expectedOutput/calcite/agg_case_composite_cannot_push.yaml (1)

9-9: Verify the base64-encoded script payload reflects RexNode standardization changes.

Line 9 updates an embedded base64-encoded script payload in the composite aggregation context as part of broader RexNode standardization, type widening, and literal downgrading across test expected-output YAMLs.

To confirm the payload update is correct:

  1. Decode the base64 script (source: "script":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQDIXsK...\") and verify it reflects the intended RexCall standardization (operand reordering, type widening for CASE expressions, literal downgrades).
  2. Confirm query semantics remain unchanged—the aggregation still produces correct avg_balance, age_range, and state output.
  3. Cross-reference this change with related code modifications (RexBuilder parameter additions, RexNode standardization logic) to ensure consistency.
  4. Verify this pattern is applied consistently across all similar expected-output updates in the test suite.
integ-test/src/test/resources/expectedOutput/calcite/explain_script_push_on_text.yaml (1)

10-10: Verify test expectations and validation for standardized Calcite script payloads.

This test expectations file updates the embedded OpenSearch scripts (base64-encoded) within the PushDownContext to reflect RexNode standardization. When decoded, the filter script should show the expected parametrization pattern with dynamic parameters replacing literals and field references, along with appropriate type widening.

To confirm the changes are correct:

  1. Run integration tests to validate the updated expectations pass
  2. Verify the standardization is correctly applied for both filter and aggregation scripts
  3. Confirm no regressions in script behavior
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java (2)

223-252: Type widening logic is well-documented and implemented.

The widenType method correctly implements the widening rules described in the JavaDoc:

  • Integer types → BIGINT
  • Floating-point/decimal → DOUBLE
  • Character types → VARCHAR
  • All results are made nullable

The conditional guard on allowNumericTypeWiden properly handles the null case.


254-258: Good defensive assertion.

The assertion ensures the helper stack is in a clean state before starting standardization, which helps catch programming errors early.

integ-test/src/test/resources/expectedOutput/calcite/agg_case_num_res_cannot_push.yaml (1)

1-9: Test expected output updated to reflect standardization changes.

The updated base64-encoded script payload correctly reflects the new type widening behavior (e.g., BIGINT for comparison operands). This aligns with the standardization implementation in RexStandardizer.java.

opensearch/src/test/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializerTest.java (4)

54-56: Test schema extended with numeric field.

The addition of the Number field to the row type enables testing numeric type widening scenarios. This is a well-structured test setup change.


227-252: New test validates RexCall normalization behavior.

The test correctly verifies that:

  1. GREATER(Number, 10) normalizes to LESS(10, Number) (operand swap)
  2. INTEGER types widen to BIGINT for comparison operations
  3. Parameter digests reflect the swapped operand order [10, "Number"]

254-279: New test validates character type widening.

The test correctly verifies that CHAR(2) is widened to VARCHAR during standardization, which improves script cache reuse across different character type literals.


281-310: Test validates selective type widening for non-arithmetic functions.

This test correctly verifies that SUBSTRING does not widen INTEGER arguments to BIGINT, as widening is only enabled for binary arithmetic and comparison operations. This demonstrates proper scoping of the allowNumericTypeWiden flag.

CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]])
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0},avg_age=AVG($1)), PROJECT->[avg_age, age_range], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"age_range":{"terms":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQGFHsKICAib3AiOiB7CiAgICAibmFtZSI6ICJDQVNFIiwKICAgICJraW5kIjogIkNBU0UiLAogICAgInN5bnRheCI6ICJTUEVDSUFMIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAib3AiOiB7CiAgICAgICAgIm5hbWUiOiAiPCIsCiAgICAgICAgImtpbmQiOiAiTEVTU19USEFOIiwKICAgICAgICAic3ludGF4IjogIkJJTkFSWSIKICAgICAgfSwKICAgICAgIm9wZXJhbmRzIjogWwogICAgICAgIHsKICAgICAgICAgICJkeW5hbWljUGFyYW0iOiAwLAogICAgICAgICAgInR5cGUiOiB7CiAgICAgICAgICAgICJ0eXBlIjogIklOVEVHRVIiLAogICAgICAgICAgICAibnVsbGFibGUiOiB0cnVlCiAgICAgICAgICB9CiAgICAgICAgfSwKICAgICAgICB7CiAgICAgICAgICAiZHluYW1pY1BhcmFtIjogMSwKICAgICAgICAgICJ0eXBlIjogewogICAgICAgICAgICAidHlwZSI6ICJJTlRFR0VSIiwKICAgICAgICAgICAgIm51bGxhYmxlIjogZmFsc2UKICAgICAgICAgIH0KICAgICAgICB9CiAgICAgIF0KICAgIH0sCiAgICB7CiAgICAgICJkeW5hbWljUGFyYW0iOiAyLAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgICAibnVsbGFibGUiOiBmYWxzZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgIm9wIjogewogICAgICAgICJuYW1lIjogIlNFQVJDSCIsCiAgICAgICAgImtpbmQiOiAiU0VBUkNIIiwKICAgICAgICAic3ludGF4IjogIklOVEVSTkFMIgogICAgICB9LAogICAgICAib3BlcmFuZHMiOiBbCiAgICAgICAgewogICAgICAgICAgImR5bmFtaWNQYXJhbSI6IDMsCiAgICAgICAgICAidHlwZSI6IHsKICAgICAgICAgICAgInR5cGUiOiAiSU5URUdFUiIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgICAgIH0KICAgICAgICB9LAogICAgICAgIHsKICAgICAgICAgICJsaXRlcmFsIjogewogICAgICAgICAgICAicmFuZ2VTZXQiOiBbCiAgICAgICAgICAgICAgWwogICAgICAgICAgICAgICAgImNsb3NlZCIsCiAgICAgICAgICAgICAgICAiMzAiLAogICAgICAgICAgICAgICAgIjQwIgogICAgICAgICAgICAgIF0KICAgICAgICAgICAgXSwKICAgICAgICAgICAgIm51bGxBcyI6ICJVTktOT1dOIgogICAgICAgICAgfSwKICAgICAgICAgICJ0eXBlIjogewogICAgICAgICAgICAidHlwZSI6ICJJTlRFR0VSIiwKICAgICAgICAgICAgIm51bGxhYmxlIjogZmFsc2UKICAgICAgICAgIH0KICAgICAgICB9CiAgICAgIF0KICAgIH0sCiAgICB7CiAgICAgICJkeW5hbWljUGFyYW0iOiA0LAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgICAibnVsbGFibGUiOiBmYWxzZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgImR5bmFtaWNQYXJhbSI6IDUsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAgICJwcmVjaXNpb24iOiAtMQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0,2,2,0,2,2],"DIGESTS":["age",30,"u30","age","u40","u100"]}},"missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"avg_age":{"avg":{"field":"age"}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_bank]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={0},avg_age=AVG($1)), PROJECT->[avg_age, age_range], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"composite_buckets":{"composite":{"size":1000,"sources":[{"age_range":{"terms":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQITHsKICAib3AiOiB7CiAgICAibmFtZSI6ICJDQVNFIiwKICAgICJraW5kIjogIkNBU0UiLAogICAgInN5bnRheCI6ICJTUEVDSUFMIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAib3AiOiB7CiAgICAgICAgIm5hbWUiOiAiPCIsCiAgICAgICAgImtpbmQiOiAiTEVTU19USEFOIiwKICAgICAgICAic3ludGF4IjogIkJJTkFSWSIKICAgICAgfSwKICAgICAgIm9wZXJhbmRzIjogWwogICAgICAgIHsKICAgICAgICAgICJkeW5hbWljUGFyYW0iOiAwLAogICAgICAgICAgInR5cGUiOiB7CiAgICAgICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgICAgIH0KICAgICAgICB9LAogICAgICAgIHsKICAgICAgICAgICJkeW5hbWljUGFyYW0iOiAxLAogICAgICAgICAgInR5cGUiOiB7CiAgICAgICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgICAgIH0KICAgICAgICB9CiAgICAgIF0KICAgIH0sCiAgICB7CiAgICAgICJkeW5hbWljUGFyYW0iOiAyLAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAgICJwcmVjaXNpb24iOiAtMQogICAgICB9CiAgICB9LAogICAgewogICAgICAib3AiOiB7CiAgICAgICAgIm5hbWUiOiAiQU5EIiwKICAgICAgICAia2luZCI6ICJBTkQiLAogICAgICAgICJzeW50YXgiOiAiQklOQVJZIgogICAgICB9LAogICAgICAib3BlcmFuZHMiOiBbCiAgICAgICAgewogICAgICAgICAgIm9wIjogewogICAgICAgICAgICAibmFtZSI6ICI8PSIsCiAgICAgICAgICAgICJraW5kIjogIkxFU1NfVEhBTl9PUl9FUVVBTCIsCiAgICAgICAgICAgICJzeW50YXgiOiAiQklOQVJZIgogICAgICAgICAgfSwKICAgICAgICAgICJvcGVyYW5kcyI6IFsKICAgICAgICAgICAgewogICAgICAgICAgICAgICJkeW5hbWljUGFyYW0iOiAzLAogICAgICAgICAgICAgICJ0eXBlIjogewogICAgICAgICAgICAgICAgInR5cGUiOiAiQklHSU5UIiwKICAgICAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgICAgICAgICB9CiAgICAgICAgICAgIH0sCiAgICAgICAgICAgIHsKICAgICAgICAgICAgICAiZHluYW1pY1BhcmFtIjogNCwKICAgICAgICAgICAgICAidHlwZSI6IHsKICAgICAgICAgICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgICAgICAgICAibnVsbGFibGUiOiB0cnVlCiAgICAgICAgICAgICAgfQogICAgICAgICAgICB9CiAgICAgICAgICBdCiAgICAgICAgfSwKICAgICAgICB7CiAgICAgICAgICAib3AiOiB7CiAgICAgICAgICAgICJuYW1lIjogIjw9IiwKICAgICAgICAgICAgImtpbmQiOiAiTEVTU19USEFOX09SX0VRVUFMIiwKICAgICAgICAgICAgInN5bnRheCI6ICJCSU5BUlkiCiAgICAgICAgICB9LAogICAgICAgICAgIm9wZXJhbmRzIjogWwogICAgICAgICAgICB7CiAgICAgICAgICAgICAgImR5bmFtaWNQYXJhbSI6IDUsCiAgICAgICAgICAgICAgInR5cGUiOiB7CiAgICAgICAgICAgICAgICAidHlwZSI6ICJCSUdJTlQiLAogICAgICAgICAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICAgICAgICAgIH0KICAgICAgICAgICAgfSwKICAgICAgICAgICAgewogICAgICAgICAgICAgICJkeW5hbWljUGFyYW0iOiA2LAogICAgICAgICAgICAgICJ0eXBlIjogewogICAgICAgICAgICAgICAgInR5cGUiOiAiQklHSU5UIiwKICAgICAgICAgICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgICAgICAgICB9CiAgICAgICAgICAgIH0KICAgICAgICAgIF0KICAgICAgICB9CiAgICAgIF0KICAgIH0sCiAgICB7CiAgICAgICJkeW5hbWljUGFyYW0iOiA3LAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAgICJwcmVjaXNpb24iOiAtMQogICAgICB9CiAgICB9LAogICAgewogICAgICAiZHluYW1pY1BhcmFtIjogOCwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgICAicHJlY2lzaW9uIjogLTEKICAgICAgfQogICAgfQogIF0KfQ==\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0,2,2,2,0,0,2,2,2],"DIGESTS":["age",30,"u30",30,"age","age",40,"u40","u100"]}},"missing_bucket":true,"missing_order":"first","order":"asc"}}}]},"aggregations":{"avg_age":{"avg":{"field":"age"}}}}}}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a IT to verify the generated script with skip encoding via format=extended.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have test testSkipScriptEncodingOnExtendedFormat. Do you mean set format=extended for agg_case_cannot_push?

Copy link
Collaborator Author

@qianheng-aws qianheng-aws Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in test case testRexStandardizationForScript

The script is formatted below, with several changes:

  • >= -> <=
  • SEARCH -> AND(...)
  • INT nullable=false -> BIGINT nullable=true
  • CHAR[xxx] nullable=false -> VARCHAR nullable=true
{
  "op": {
    "name": "CASE",
    "kind": "CASE",
    "syntax": "SPECIAL"
  },
  "operands": [
    {
      "op": {
        "name": "<",
        "kind": "LESS_THAN",
        "syntax": "BINARY"
      },
      "operands": [
        {
          "dynamicParam": 0,
          "type": {
            "type": "BIGINT",
            "nullable": true
          }
        },
        {
          "dynamicParam": 1,
          "type": {
            "type": "BIGINT",
            "nullable": true
          }
        }
      ]
    },
    {
      "dynamicParam": 2,
      "type": {
        "type": "VARCHAR",
        "nullable": true,
        "precision": -1
      }
    },
    {
      "op": {
        "name": "AND",
        "kind": "AND",
        "syntax": "BINARY"
      },
      "operands": [
        {
          "op": {
            "name": "<=",
            "kind": "LESS_THAN_OR_EQUAL",
            "syntax": "BINARY"
          },
          "operands": [
            {
              "dynamicParam": 3,
              "type": {
                "type": "BIGINT",
                "nullable": true
              }
            },
            {
              "dynamicParam": 4,
              "type": {
                "type": "BIGINT",
                "nullable": true
              }
            }
          ]
        },
        {
          "op": {
            "name": "<=",
            "kind": "LESS_THAN_OR_EQUAL",
            "syntax": "BINARY"
          },
          "operands": [
            {
              "dynamicParam": 5,
              "type": {
                "type": "BIGINT",
                "nullable": true
              }
            },
            {
              "dynamicParam": 6,
              "type": {
                "type": "BIGINT",
                "nullable": true
              }
            }
          ]
        }
      ]
    },
    {
      "dynamicParam": 7,
      "type": {
        "type": "VARCHAR",
        "nullable": true,
        "precision": -1
      }
    },
    {
      "dynamicParam": 8,
      "type": {
        "type": "VARCHAR",
        "nullable": true,
        "precision": -1
      }
    }
  ]
}

Signed-off-by: Heng Qian <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f1495d and ae9f6f2.

📒 Files selected for processing (2)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_extended_for_standardization.json (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_extended_for_standardization.json
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)

  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)

Comment on lines 2083 to 2094
@Test
public void testRexStandardizationForScript() throws IOException {
enabledOnlyWhenPushdownIsEnabled();
assertYamlEqualsIgnoreId(
loadExpectedPlan("explain_extended_for_standardization.json"),
explainQueryToString(
String.format(
"source=%s | eval age_range = case(age < 30, 'u30', age >= 30 and age <= 40, 'u40'"
+ " else 'u100') | stats avg(age) as avg_age by age_range",
TEST_INDEX_BANK),
true));
}
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use JSON assertion helper for extended explain output

This test loads a *.json expected plan and uses explainQueryToString(..., true), which is JSON in all other tests, but asserts via assertYamlEqualsIgnoreId. For consistency with the rest of this class and to avoid format confusion, this should use the JSON matcher:

-    assertYamlEqualsIgnoreId(
-        loadExpectedPlan("explain_extended_for_standardization.json"),
-        explainQueryToString(
+    assertJsonEqualsIgnoreId(
+        loadExpectedPlan("explain_extended_for_standardization.json"),
+        explainQueryToString(
             String.format(
                 "source=%s | eval age_range = case(age < 30, 'u30', age >= 30 and age <= 40, 'u40'"
                     + " else 'u100') | stats avg(age) as avg_age by age_range",
                 TEST_INDEX_BANK),
             true));

This keeps the new Rex standardization coverage while aligning with existing extended explain tests. Based on learnings, this strengthens Calcite SQL/script generation verification.

🤖 Prompt for AI Agents
In
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
around lines 2083 to 2094, the test constructs an extended JSON explain
(explainQueryToString(..., true)) but asserts with assertYamlEqualsIgnoreId;
replace the YAML assertion with the JSON assertion helper (use the existing
assertJsonEqualsIgnoreId or the project's JSON matcher) so the loaded expected
JSON plan is compared as JSON to the explain output, keeping the same expected
file and parameters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid reusing FORMAT parameter for EXPLAIN OPTION and add a new parameter for it. EXPLAIN OPTION affects the content of this API while FORMAT only affects the format like JSON, CSV, YAML.

@coderabbitai Could you create an issue to track this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for explainQueryToString method definition and usage to understand FORMAT parameter
rg -n "explainQueryToString" --type java -C 3

Length of output: 54012


🏁 Script executed:

#!/bin/bash
# Search for FORMAT and EXPLAIN related code in the codebase
rg -n "FORMAT.*EXPLAIN|EXPLAIN.*FORMAT" --type java -C 2

Length of output: 5954


🏁 Script executed:

#!/bin/bash
# Look for the explain query implementation to understand current parameter usage
ast-grep --pattern 'explainQueryToString($$$) {
  $$$
}'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Find explain-related API endpoints and format handling
rg -n "explain.*format|format.*explain" --type java -i -C 2

Length of output: 53486


🏁 Script executed:

#!/bin/bash
# Create GitHub issue to track separation of FORMAT from EXPLAIN OPTION
gh issue create \
  --repo opensearch-project/sql \
  --title "Separate FORMAT parameter from EXPLAIN verbosity option in explain API" \
  --body "## Description

The \`format\` query parameter in the explain API is currently overloaded to serve two distinct purposes:

1. **Output format control** (JSON, CSV, YAML) - controls how the response is serialized
2. **Explain verbosity level** (SIMPLE, STANDARD, EXTENDED, COST) - controls the content/detail level of the explain output

This violates the single responsibility principle and creates confusion, as FORMAT should only control the serialization format of the response, not the semantic content.

## Proposal

Introduce a new query parameter (e.g., \`explain_level\`, \`explain_option\`, or \`verbosity\`) specifically for controlling the explain verbosity, while keeping \`format\` strictly for output serialization format.

### Current behavior:
\`\`\`
/_plugins/_ppl/_explain?format=extended  // controls BOTH verbosity AND output format
\`\`\`

### Proposed behavior:
\`\`\`
/_plugins/_ppl/_explain?format=json&explain_level=extended  // separate concerns
/_plugins/_ppl/_explain?format=yaml&explain_level=standard
\`\`\`

## Context

This issue was identified during code review of PR #4914 (RexCall and RelDataType standardization for script push down).

**Related discussion:** https://github.com/opensearch-project/sql/pull/4914#discussion_r2608963825

## Benefits

1. **Clear separation of concerns:** FORMAT controls serialization, explain option controls content
2. **Better API design:** Each parameter has a single, well-defined responsibility
3. **Improved extensibility:** Easier to add new output formats or explain levels independently
4. **Reduced confusion:** Users understand what each parameter does

## Implementation considerations

- Maintain backward compatibility by supporting the old format parameter behavior
- Add deprecation warnings for the old usage pattern
- Update documentation to reflect the new parameter
- Update integration tests

**Reported by:** @qianheng-aws" \
  --label "enhancement" \
  --assignee "qianheng-aws"

Length of output: 53


@qianheng-aws Done! I've created issue #4940 to track the separation of the FORMAT parameter from the EXPLAIN verbosity option.

The issue proposes introducing a new query parameter (e.g., explain_level or verbosity) to control the explain detail level (SIMPLE, STANDARD, EXTENDED, COST), while keeping format strictly for output serialization (JSON, YAML, CSV). This will provide a cleaner separation of concerns where:

  • format controls how the response is serialized
  • The new parameter controls what content is included in the explain output

The issue includes context from this PR and your comment, and is assigned to you with the enhancement label.


🧬 Code Graph Analysis Results

<no_relevant_code_snippets>


🧠 Learnings used
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

return call.clone(call.getType(), clonedOperands);
} else {
return call;
// Try to expand `SEARCH` before standardization since we cannot translate literal `Sarg`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered a similar problem -- Sarg can be serialized but can not be properly deserialized. I addressed it by replacing Sarg sarg = sargFromJson((Map) literal) with Sarg sarg = sargFromJson((Map) literal, type) in ExtendedRelJson.java. Maybe it helps.

Copy link
Collaborator Author

@qianheng-aws qianheng-aws Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I met here is I cannot translate a Sarg literal into a java object by using RexToLixTranslator.translateLiteral. Thus, we cannot generate a parameter for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the implementation of RexToLixTranslator , it seems Sarg can only be used as literal in RexNode expression but not DynamicParams.

@LantaoJin LantaoJin merged commit bcfcd00 into opensearch-project:main Dec 12, 2025
39 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19-dev failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4914-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 bcfcd002d5ec402f257a92f9689097d1c9bf8979
# Push it to GitHub
git push --set-upstream origin backport/backport-4914-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev

Then, create a pull request where the base branch is 2.19-dev and the compare/head branch is backport/backport-4914-to-2.19-dev.

qianheng-aws added a commit to qianheng-aws/sql that referenced this pull request Dec 12, 2025
…rch-project#4914)

* RexCall & SqlType standardization

Signed-off-by: Heng Qian <[email protected]>

* Change script in IT

Signed-off-by: Heng Qian <[email protected]>

* Fix UT

Signed-off-by: Heng Qian <[email protected]>

* Fix IT

Signed-off-by: Heng Qian <[email protected]>

* Add UT

Signed-off-by: Heng Qian <[email protected]>

* Fix IT after merging main

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>

(cherry picked from commit bcfcd00)
Signed-off-by: Heng Qian <[email protected]>
@LantaoJin LantaoJin added the backport-manually Filed a PR to backport manually. label Dec 12, 2025
LantaoJin pushed a commit that referenced this pull request Dec 12, 2025
…4951)

* RexCall & SqlType standardization



* Change script in IT



* Fix UT



* Fix IT



* Add UT



* Fix IT after merging main



* Address comments



* Address comments



---------



(cherry picked from commit bcfcd00)

Signed-off-by: Heng Qian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev backport-failed backport-manually Filed a PR to backport manually. enhancement New feature or request pushdown pushdown related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] RexNode standardization for script push down

3 participants