Skip to content

Conversation

@srikanthpadakanti
Copy link

@srikanthpadakanti srikanthpadakanti commented Oct 27, 2025

Description

This pull request adds native support for the mvexpand command in PPL to OpenSearch SQL, enabling users to expand multivalue fields (arrays) into separate rows directly within queries. This functionality is analogous to Splunk's mvexpand command and streamlines analytics, dashboarding, and data preparation involving arrays or multivalue fields.

Key features introduced:

Native mvexpand command for PPL queries to expand array fields into separate rows/events.
Optional limit parameter to restrict the number of expanded values per event/document.
Robust handling of empty/null arrays, large arrays (with memory/resource limits), and non-array fields.
Streaming/distributable execution for performance and scalability.
Comprehensive documentation and edge case coverage.
This feature makes OpenSearch SQL more powerful and user-friendly for log analytics, data exploration, and migration from platforms like Splunk.

Related Issues

Resolves #4439
#4439

Check List

  • [ X] New functionality includes testing.
  • [ X] New functionality has been documented.
  • [ X] New functionality has javadoc added.
  • [ X] New functionality has a user manual doc added.
  • [ X] New PPL command checklist all confirmed.
  • [ X] API changes companion pull request created.
  • [ X] Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@srikanthpadakanti
Copy link
Author

Attaching the manual test cases and its output
mvexpand_manual_test_results.md

@srikanthpadakanti
Copy link
Author

Hi maintainers @penghuo , the "Enforce PR labels" check is failing because I can't add labels. Could you please add the required labels to this PR so the checks pass? Thank you!

@dai-chen dai-chen added enhancement New feature or request PPL Piped processing language labels Oct 27, 2025
super.init();
enableCalcite();
deleteIndexIfExists(INDEX);
createIndex(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rather use Index.MVEXPAND_EDGE_CASES? (Is it duplicate?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks it is not fixed.

Copy link
Author

@srikanthpadakanti srikanthpadakanti Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to - Index.MVEXPAND_EDGE_CASES.getName()

@srikanthpadakanti
Copy link
Author

srikanthpadakanti commented Oct 30, 2025

Hello @jed326 @ykmr1224
I have addressed the PR comments and made the changes accordingly. Please do review.

@dai-chen
Copy link
Collaborator

Please check CI failure.

@srikanthpadakanti
Copy link
Author

Please check CI failure.

@dai-chen The CI failures are not related to mvexpand. I merged the upstream changes to. my main. CI failure is pointing to -
testMvindexSingleElementPositive, testMvindexSingleElementNegative, testMvindexRangePositive, testMvindexRangeNegative in CalcitePPLArrayFunctionTest.
The assertion failures are string-equality checks of generated SQL (PPL → SparkSQL text). The RelNode / logical plan printed in the local run shows ITEM(...) and ARRAY_SLICE(...) expressions — i.e. the array functions are present and being produced, but the exact string formatting (parentheses, unary +/-, expression grouping) differs from the test expectation used in CI.

My local run shows those tests pass. But, CI failed. That implies either:
A difference in environment (JDK/OS/Calcite pretty-printer) between my local and CI, or
Different code in CI (different branch/commit or cached build) than my local, or
Tests were updated in one place and translator/pretty printing in another.method in CalcitePPLArrayFunctionTest. @ahkcs Can you please confirm

}

public static UnresolvedPlan mvexpand(UnresolvedPlan input, Field field, Integer limit) {
// attach the incoming child plan so the AST contains the pipeline link
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unneeded comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still there.

Comment on lines 35 to 45
bulkInsert(
INDEX,
"{\"username\":\"happy\",\"skills\":[{\"name\":\"python\"},{\"name\":\"java\"},{\"name\":\"sql\"}]}",
"{\"username\":\"single\",\"skills\":[{\"name\":\"go\"}]}",
"{\"username\":\"empty\",\"skills\":[]}",
"{\"username\":\"nullskills\",\"skills\":null}",
"{\"username\":\"noskills\"}",
"{\"username\":\"missingattr\",\"skills\":[{\"name\":\"c\"},{\"level\":\"advanced\"}]}",
"{\"username\":\"complex\",\"skills\":[{\"name\":\"ml\",\"level\":\"expert\"},{\"name\":\"ai\"},{\"level\":\"novice\"}]}",
"{\"username\":\"duplicate\",\"skills\":[{\"name\":\"dup\"},{\"name\":\"dup\"}]}",
"{\"username\":\"large\",\"skills\":[{\"name\":\"s1\"},{\"name\":\"s2\"},{\"name\":\"s3\"},{\"name\":\"s4\"},{\"name\":\"s5\"},{\"name\":\"s6\"},{\"name\":\"s7\"},{\"name\":\"s8\"},{\"name\":\"s9\"},{\"name\":\"s10\"}]}");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it addressed?
I thinks we should add test case for happy at least.

super.init();
enableCalcite();
deleteIndexIfExists(INDEX);
createIndex(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks it is not fixed.

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this in expand command code: https://github.com/opensearch-project/sql/blob/main/integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExpandCommandIT.java#L141.

Could you clarify why not make mvexpand an alias of expand command with enhancements? If we can do this, I think many new changes can be avoided.

@ahkcs
Copy link
Contributor

ahkcs commented Nov 19, 2025

My local run shows those tests pass. But, CI failed. That implies either:
A difference in environment (JDK/OS/Calcite pretty-printer) between my local and CI, or
Different code in CI (different branch/commit or cached build) than my local

Can you try rebasing to current main and make sure your local jdk version is the same as CI? Currently CI is failing

@srikanthpadakanti
Copy link
Author

srikanthpadakanti commented Nov 20, 2025

I found this in expand command code: https://github.com/opensearch-project/sql/blob/main/integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExpandCommandIT.java#L141.

Could you clarify why not make mvexpand an alias of expand command with enhancements? If we can do this, I think many new changes can be avoided.

@dai-chen
mvexpand and expand solve different problems and operate on different data shapes.
expand historically works on nested/struct fields, projecting the struct into rows.
mvexpand works on multivalue/array fields, emitting one row per array element, which requires different type validation, planner rules, and runtime behavior.
mvexpand also needs to handle arrays containing primitives, objects, nulls, and mixed shapes, and must reliably produce per-element rows (including null elements). expand does not extract sub-fields from array elements and its semantics around aliasing and struct projection are different.
OpenSearch mappings complicate this further: a field typed as keyword may still contain arrays at runtime. mvexpand therefore needs stricter semantic checks and fallbacks that don’t align with the legacy expand contract. Reusing expand directly would break backward-compatible behavior expected by existing queries and tests.

Problems in alias attempt:
The alias attempt produced repeated values (java, java, java) instead of (java, python, sql) from our IT example.
The repeated output indicates the plan was projecting the same input reference for every row instead of the per-element value. This typically happens when:
field resolution occurs before the uncollect/correlate materializes element values, the projection uses the outer input ref instead of the element ref, or

column indices (RexInputRef) drift when reusing expand logic, causing all projected rows to read from the same column.

In short, simply delegating to expand bypassed the per-element extraction logic, so the projection kept resolving to the same value on every row.


Refactor: both EXPAND and MVEXPAND delegate to buildExpandCore(...), which implements the // shared correlate + uncollect + projection logic. visitExpand applies EXPAND semantics // (remove original array field, optional alias rename). visitMvExpand handles its two special // cases — missing field (returns an empty VALUES row with a nullable placeholder) and // limit= — then delegates to the shared helper.

@srikanthpadakanti
Copy link
Author

Please approve the workflow @dai-chen @penghuo

Copy link
Collaborator

@ykmr1224 ykmr1224 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the logics in CalciteRelNodeVisitor since those are too long to read and not organized. Please refactor it first.
Also, I suppose the code changes are mostly directly from coding agent. Please review by yourself first, and reflect our earlier comments as well.

}

public static UnresolvedPlan mvexpand(UnresolvedPlan input, Field field, Integer limit) {
// attach the incoming child plan so the AST contains the pipeline link
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still there.

Comment on lines 1057 to 1064
// Allow using INTERNAL_ITEM when the element type is unknown/undefined at planning time.
// Some datasets (or Calcite's type inference) may give the element an UNDEFINED type.
// Accept a "ignore" first-argument family so INTERNAL_ITEM(elem, 'key') can still be planned
// and resolved at runtime (fallback semantics handled at execution side). - Used in MVEXPAND
registerOperator(
INTERNAL_ITEM,
SqlStdOperatorTable.ITEM,
PPLTypeChecker.family(SqlTypeFamily.IGNORE, SqlTypeFamily.CHARACTER));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to define separate one from above one? (can we add or to above definition?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried. But, Calcite’s OR operator requires both operands to be composite checkers.
The fallback rule I added (IGNORE, CHARACTER) is still a SqlSingleOperandTypeChecker, so Calcite rejects the merge.

/**
* Expand command visitor to handle array field expansion. 1. Unnest 2. Join with the original
* table to get all fields
* Portions of CalciteRelNodeVisitor related to EXPAND / MVEXPAND.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks there are very long code for expand/mvexpand. Please extract to a class instead of using code section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimized it to use the same - buildExpandRelNode

Comment on lines 689 to 693
MVEXPAND_EDGE_CASES(
"mvexpand_edge_cases",
"mvexpand_edge_cases",
getMappingFile("mvexpand_edge_cases_mapping.json"),
"src/test/resources/mvexpand_edge_cases.json"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this (and json) is really needed. It seems used only in CalciteExplainIT, but the data does matter for ExplainIT.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 27, 2025

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new mvexpand command to expand array fields into individual rows in query results. Supports optional limit parameter to control maximum rows per document. Properly handles empty arrays, null values, and missing fields.
  • Documentation

    • Added comprehensive documentation with practical examples and detailed behavior specifications for the mvexpand command.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds native PPL mvexpand: lexer/parser/AST node, visitor and analyzer hooks, Calcite planning with array validation and optional per-document limit, DSL/anonymizer updates, operator registration tweak, tests/fixtures, and documentation.

Changes

Cohort / File(s) Summary
AST Node & Visitors
core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java, core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java, core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
New MvExpand AST node (field + optional limit), visitor hook visitMvExpand, and Analyzer override delegating to Calcite-only handling.
Calcite Planning
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Add visitMvExpand, resolve field name, validate ARRAY type, handle missing/null, reuse Expand mechanics, add optional per-document limit, update buildExpandRelNode signature, and add extractFieldName.
DSL & Operator Table
core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java, core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
Add mvexpand(UnresolvedPlan, Field, Integer) DSL builder and register an INTERNAL_ITEM operator variant to allow planning with unknown element types.
PPL Lexer / Parser / AST Builder
ppl/src/main/antlr/OpenSearchPPLLexer.g4, ppl/src/main/antlr/OpenSearchPPLParser.g4, ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Add MVEXPAND token, mvexpandCommand grammar (optional limit=<int>), and AstBuilder method producing MvExpand.
PPL Utilities / Anonymizer
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java, ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
Anonymizer visitor for MvExpand (mask field and optional masked limit) and unit tests for anonymized outputs.
Calcite Unit Tests
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java
New Calcite-based unit tests covering mvexpand scenarios (basic, limit, empty/null, duplicates, large arrays, primitive arrays, projections).
Integration Tests & Fixtures
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java, integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java, integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java, integ-test/src/test/resources/...
Add integration test class for mvexpand edge cases, explain test additions, register test index in suites, expected explain YAML fixtures, and edge-case mappings/data.
Doctest Data & Mapping
doctest/test_data/mvexpand_logs.json, doctest/test_mapping/mvexpand_logs.json, doctest/test_docs.py
Add mvexpand doctest dataset, mapping, and register it in the doctest harness.
Docs & Index
docs/user/ppl/cmd/mvexpand.rst, docs/user/ppl/index.rst, docs/category.json, docs/user/dql/metadata.rst
Add mvexpand documentation page, update PPL command index/category, and include mvexpand in metadata examples.
Expected Output Fixtures
integ-test/src/test/resources/expectedOutput/calcite/..., integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/...
Add expected explain output fixtures used by mvexpand explain tests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Lexer
    participant Parser
    participant AstBuilder
    participant Analyzer
    participant CalcitePlanner
    participant Executor

    User->>Lexer: "source=t | mvexpand field [limit=N]"
    Lexer->>Parser: tokens (includes MVEXPAND)
    Parser->>AstBuilder: parse mvexpandCommand
    AstBuilder->>AstBuilder: build MvExpand(field, limit)
    AstBuilder->>Analyzer: emit MvExpand node
    Analyzer->>CalcitePlanner: route node for Calcite planning
    CalcitePlanner->>CalcitePlanner: resolve field name & type
    alt missing or non-array field
        CalcitePlanner->>CalcitePlanner: produce empty projection or semantic error
    else array field present
        CalcitePlanner->>CalcitePlanner: create Expand RelNode, apply optional per-doc limit
    end
    CalcitePlanner->>Executor: return finalized RelNode / plan
    Executor->>User: execute -> rows (one per array element)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Pay extra attention to:
    • CalciteRelNodeVisitor.visitMvExpand and updated buildExpandRelNode (field extraction, ARRAY validation, missing/null handling, per-doc limit).
    • New integration tests and fixtures in integ-test/... (setup, bulkInsert, assertions).
    • Parser/AST changes for optional limit and correct AST construction.
    • Anonymizer masking logic and related unit tests.

Suggested reviewers

  • ykmr1224
  • penghuo
  • ps48
  • dai-chen
  • kavithacm
  • derek-ho
  • joshuali925
  • RyanL1997
  • GumpacG

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.04% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Mvexpand feature' is vague and generic, using a broad term that does not clearly convey the specific nature or scope of the change. Use a more descriptive title such as 'Add native mvexpand command for PPL to expand array fields into separate rows' to clearly indicate the feature and its primary purpose.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The pull request description is well-detailed and directly related to the changeset, explaining the mvexpand feature, its purpose, key features, testing, and documentation status.
Linked Issues check ✅ Passed The pull request implements all primary and secondary objectives from issue #4439: mvexpand command for PPL with array field expansion, optional limit parameter, handling of edge cases (empty/null arrays, large arrays, non-array fields), streaming execution, comprehensive tests, Javadoc, and user documentation.
Out of Scope Changes check ✅ Passed All changes directly support mvexpand implementation including AST nodes, visitor patterns, parser grammar, integration tests, documentation, and test data. The only tangential change is removal of search.rst doc entry which appears to be documentation reorganization related to the category.json addition.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf87312 and c9e2767.

📒 Files selected for processing (1)
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (8)
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (1)

93-93: Mvexpand parser wiring is correct; consider using internalVisitExpression for consistency

The new visitMvexpandCommand correctly builds MvExpand(field, limit) from the grammar. For stylistic consistency with nearby visitors (e.g., visitExpandCommand), you could use internalVisitExpression(ctx.fieldExpression()) instead of calling expressionBuilder.visit directly, but functionally they are equivalent.

Also applies to: 870-876

ppl/src/main/antlr/OpenSearchPPLParser.g4 (1)

81-82: mvexpand grammar looks good; consider using integerLiteral for limit for consistency

The new mvexpandCommand is wired correctly into commands and commandName, and LIMIT EQUAL ... matches other named-arg patterns.

For consistency with options like timechart/chart/bin that use LIMIT EQUAL integerLiteral, consider changing:

mvexpandCommand
    : MVEXPAND fieldExpression (LIMIT EQUAL INTEGER_LITERAL)?
    ;

to:

mvexpandCommand
    : MVEXPAND fieldExpression (LIMIT EQUAL integerLiteral)?
    ;

so all numeric options share the same literal rule and future validation is uniform.

Also applies to: 120-121, 532-534

docs/user/ppl/index.rst (1)

51-109: Expose mvexpand in the commands table as well

The new bullets (including mvexpand command) are useful entry points, but mvexpand is not listed in the big commands table below, which summarizes version/status/descriptions.

For consistency and discoverability, consider adding a row to that table for mvexpand (with version introduced, status, and a short description) alongside expand/flatten.

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)

82-83: Mvexpand anonymization behavior is correct; consider reusing visitExpression for field masking

The new visitMvExpand correctly:

  • delegates to the child plan, and
  • emits | mvexpand identifier with optional limit=***, matching the new tests.

To stay consistent with visitExpand/visitFlatten, you might prefer:

@Override
public String visitMvExpand(MvExpand node, String context) {
  String child = node.getChild().get(0).accept(this, context);
  String field = visitExpression(node.getField());
  if (node.getLimit() != null) {
    return StringUtils.format("%s | mvexpand %s limit=%s", child, field, MASK_LITERAL);
  }
  return StringUtils.format("%s | mvexpand %s", child, field);
}

This keeps any special handling in maskField/expressionAnalyzer (e.g., meta fields) consistent across commands while preserving the current output.

Also applies to: 655-664

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java (1)

117-163: Smoke tests validate planning succeeds but lack output assertions.

These tests (testMvExpandWithLimit through testMvExpandPrimitiveArray) only verify that getRelNode(ppl) succeeds without checking the resulting plan. While they confirm no planning exceptions occur, consider adding at least basic verifyLogical assertions for one or two key edge cases (e.g., testMvExpandWithLimit) to ensure the generated plan includes expected operators.

Example enhancement for testMvExpandWithLimit:

@Test
public void testMvExpandWithLimit() {
  String ppl = "source=DEPT | mvexpand EMPNOS | head 1";
  RelNode root = getRelNode(ppl);
  String expectedLogical = 
      "LogicalSort(fetch=[1])\n"
          + "  LogicalProject(DEPTNO=[$0], EMPNOS=[$2])\n"
          + "    LogicalCorrelate(...)\n";
  verifyLogical(root, expectedLogical);
}
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (3)

61-70: Consider logging suppressed exceptions for debugging.

The cleanupAfterEach method silently ignores all exceptions. While this is acceptable for cleanup, logging at debug level could help diagnose test failures:

   @AfterEach
   public void cleanupAfterEach() throws Exception {
     try {
       deleteIndexIfExists(INDEX + "_not_array");
       deleteIndexIfExists(INDEX + "_missing_field");
       deleteIndexIfExists(INDEX + "_limit_test");
       deleteIndexIfExists(INDEX + "_int_field");
     } catch (Exception ignored) {
+      // Cleanup failures are expected if indices don't exist
     }
   }

227-251: Limit test validates row count but doesn't confirm per-document semantics.

This test uses a single user document, so it cannot distinguish between per-document limiting (3 elements from this document) vs. global limiting (3 rows total). Consider adding a multi-document scenario to clarify the expected behavior:

@Test
public void testMvexpandLimitWithMultipleDocuments() throws Exception {
  // Insert two users each with 5 skills
  bulkInsert(idx, 
    "{\"username\":\"user1\",\"skills\":[{\"name\":\"a\"},{\"name\":\"b\"},{\"name\":\"c\"},{\"name\":\"d\"},{\"name\":\"e\"}]}",
    "{\"username\":\"user2\",\"skills\":[{\"name\":\"f\"},{\"name\":\"g\"},{\"name\":\"h\"},{\"name\":\"i\"},{\"name\":\"j\"}]}");
  // With limit=3:
  // - If per-document: expect 6 rows (3 per user)
  // - If global: expect 3 rows total
  // Clarify expected behavior with assertion
}

340-360: Quote _id value in bulk request JSON for correctness.

The _id field in OpenSearch bulk requests should be a JSON string. While OpenSearch may auto-convert numeric values, explicit quoting is more correct:

-      bulk.append("{\"index\":{\"_id\":").append(id).append("}}\n");
+      bulk.append("{\"index\":{\"_id\":\"").append(id).append("\"}}\n");

This ensures proper JSON formatting regardless of whether the id is numeric or string-based.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 885230f and 9508874.

📒 Files selected for processing (27)
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java (1 hunks)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1 hunks)
  • docs/category.json (1 hunks)
  • docs/user/dql/metadata.rst (3 hunks)
  • docs/user/ppl/cmd/mvexpand.rst (1 hunks)
  • docs/user/ppl/index.rst (1 hunks)
  • doctest/test_data/mvexpand_logs.json (1 hunks)
  • doctest/test_docs.py (1 hunks)
  • doctest/test_mapping/mvexpand_logs.json (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yaml (1 hunks)
  • integ-test/src/test/resources/mvexpand_edge_cases.json (1 hunks)
  • integ-test/src/test/resources/mvexpand_edge_cases_mapping.json (1 hunks)
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4 (1 hunks)
  • ppl/src/main/antlr/OpenSearchPPLParser.g4 (3 hunks)
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (2 hunks)
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (2 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)
legacy/src/main/java/org/opensearch/sql/legacy/utils/StringUtils.java (1)
  • StringUtils (17-117)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
integ-test/src/test/java/org/opensearch/sql/sql/IdentifierIT.java (1)
  • Index (218-262)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (2)
integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java (1)
  • PPLIntegTestCase (36-409)
integ-test/src/test/java/org/opensearch/sql/sql/IdentifierIT.java (1)
  • Index (218-262)
🪛 Biome (2.1.2)
integ-test/src/test/resources/mvexpand_edge_cases.json

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 4-5: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 5-6: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 6-7: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 7-8: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 8-9: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 9-10: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 10-11: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 11-12: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 12-13: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 13-14: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 14-15: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 15-16: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 16-17: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 17-18: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

doctest/test_data/mvexpand_logs.json

[error] 1-2: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 2-3: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 3-4: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 4-5: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)


[error] 5-6: End of file expected

Use an array for a sequence of values: [1, 2]

(parse)

🔇 Additional comments (33)
core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java (1)

65-66: Mvexpand DSL helper is consistent with other unary plan builders

The new mvexpand builder cleanly mirrors expand (attach-to-input, carry field and nullable limit) and fits the existing AST DSL patterns.

Also applies to: 140-142

integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1)

35-35: Adding CalciteMvExpandCommandIT to the no-pushdown suite makes sense

Including mvexpand in this suite keeps coverage consistent with other Calcite remote commands under the non-pushdown configuration.

doctest/test_mapping/mvexpand_logs.json (1)

1-24: Mvexpand doctest mapping looks appropriate

The nested tags.value and ids.value shapes are suitable for exercising mvexpand on multi-valued/nested fields, and the rest of the mapping is straightforward.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)

1072-1079: Confirm scope of the broad INTERNAL_ITEM(IGNORE, CHARACTER) overload

This overload is a reasonable fallback for mvexpand when the container element type is UNDEFINED, but it effectively accepts any first-argument type as long as the second is in the CHARACTER family. That’s fine if INTERNAL_ITEM truly remains internal-only and all call sites still guarantee an array/map-like first argument, but it would allow nonsensical combinations to pass type checking if it were ever user-exposed or reused elsewhere.

Please double-check:

  • That INTERNAL_ITEM is not reachable as a normal PPL function, and
  • That runtime handling still validates the container type and fails fast for clearly invalid inputs.

If either assumption doesn’t hold, it may be worth tightening this overload further or guarding its use where mvexpand builds the calls.

core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java (1)

1-46: MvExpand AST node design matches existing unary plan patterns

Immutable field/limit, covariant attach, getChild() and visitor wiring all look consistent with other UnresolvedPlan nodes.

integ-test/src/test/resources/mvexpand_edge_cases_mapping.json (1)

1-8: Edge-case mvexpand mapping is straightforward and suitable

username as keyword plus skills as nested gives a clean shape for exercising mvexpand against nested/multivalue fields in integration tests.

integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml (1)

1-7: Explain expected output for mvexpand fits existing Calcite fixture style

The logical and physical plan snippets follow the existing explain-output conventions and surface the mvexpand projection via VALUE, which should keep the explain tests stable.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2)

45-45: LGTM: Test index loading.

The MVEXPAND_EDGE_CASES index is properly registered in the test initialization, consistent with other test indices.


319-327: The review comment references code that does not exist in the repository.

The test method testMvexpandExplain() at lines 319-327 is not present in CalciteExplainIT.java. Additionally, no mvexpand-related test data files (mvexpand_edge_cases.json, explain_mvexpand.yaml) or any references to "mvexpand" exist anywhere in the codebase. The review comment appears to be referencing code from a different version, branch, or context that is not part of the current PR.

Likely an incorrect or invalid review comment.

ppl/src/main/antlr/OpenSearchPPLLexer.g4 (1)

53-53: LGTM: Lexer token addition.

The MVEXPAND token is correctly defined following standard ANTLR patterns and is appropriately positioned in the COMMAND KEYWORDS section.

integ-test/src/test/resources/mvexpand_edge_cases.json (1)

1-18: LGTM: Comprehensive edge case test data.

The test data covers essential mvexpand scenarios including empty arrays, null values, missing fields, and large arrays. The format follows the standard OpenSearch bulk indexing format (alternating index metadata and data lines).

Note: The static analysis errors from Biome are false positives—this file uses the correct bulk format for OpenSearch indexing, not standard JSON array format.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yaml (1)

1-7: Expected output structure looks correct, but verify alignment with test query.

The explain plan structure is well-formed with appropriate logical and physical sections. However, note that Line 4 references VALUE=[$10] which aligns with the test query using mvexpand VALUE. This should be verified along with the CalciteExplainIT test to ensure the field name is correct or intentionally testing missing-field behavior.

core/src/main/java/org/opensearch/sql/analysis/Analyzer.java (2)

81-81: LGTM: Import addition.


705-708: LGTM: Calcite-only visitor implementation.

The visitMvExpand method correctly delegates to getOnlyForCalciteException, consistent with other Calcite-only commands like Bin, Expand, and Flatten. This ensures mvexpand is only processed in the Calcite execution path.

doctest/test_docs.py (1)

50-51: LGTM: Test data registration.

The mvexpand_logs test data is correctly registered in the TEST_DATA dictionary, enabling doctest support for the mvexpand documentation examples.

docs/user/ppl/cmd/mvexpand.rst (3)

1-24: LGTM: Clear documentation structure.

The introduction, description, and syntax sections are well-written and provide clear guidance on the mvexpand command. The syntax specification correctly identifies the required field parameter and optional limit parameter.


25-29: Good approach: Doctest stability notes.

The note explaining the use of where case='<name>' for deterministic results is helpful for understanding the test structure.


30-103: LGTM: Comprehensive examples with good edge case coverage.

The five examples effectively demonstrate:

  1. Basic expansion behavior
  2. Limit parameter usage
  3. Empty and null array handling
  4. Single-element arrays
  5. Missing field behavior

The doctest format with expected output makes these examples testable and maintainable. Each example uses targeted queries for reproducible results.

docs/user/dql/metadata.rst (1)

38-66: LGTM: Metadata updated for new test index.

The fetched row count correctly increases from 23 to 24, and the mvexpand_logs table entry is properly added to the SHOW TABLES output, maintaining consistent formatting with existing entries.

ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)

35-37: Updated search anonymization expectation is consistent

The updated expectation for testSearchCommand ("source=table a:***") aligns with the current anonymizer output style. No action needed here.

core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (1)

69-69: MvExpand integration into AbstractNodeVisitor is consistent

The added MvExpand import and visitMvExpand method delegate to visitChildren in the same way as other plan nodes, which is exactly what downstream visitors (Analyzer, anonymizer, Calcite planner) need.

No changes suggested.

Also applies to: 456-458

doctest/test_data/mvexpand_logs.json (1)

1-6: mvexpand doctest data shape looks appropriate (NDJSON-style)

The file uses one JSON object per line to cover basic/empty/null/single/ids/missing cases, which is a reasonable shape for doctest data loaders that iterate line by line. The Biome “End of file expected” hints can be ignored here unless your tooling explicitly requires a single JSON array wrapper.

docs/category.json (1)

71-73: Category entry for mvexpand docs is correctly wired

Adding user/ppl/cmd/mvexpand.rst (and the identifiers general doc) under ppl_cli_calcite aligns with how other PPL command docs are exposed to the docs build.

Looks good as-is.

integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java (1)

702-706: New MVEXPAND_EDGE_CASES test index is consistent with existing patterns

The added enum constant uses the standard (name, type, mappingFile, dataSet) pattern and will integrate cleanly with loadIndex(Index.MVEXPAND_EDGE_CASES, ...) in ITs.

No changes suggested, assuming the referenced mapping/data files exist in src/test/resources (which test runs will validate).

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (5)

81-85: Wildcard imports are acceptable here for reducing import clutter.

These wildcard imports consolidate multiple specific imports from the expression and tree packages. This aligns with the existing code style in the file.


2848-2851: Verify: limit applies globally, not per-document.

The PR objectives state the limit should "cap how many values per document are expanded" (per-document limit). However, the current implementation applies relBuilder.limit(0, limit) after the expand, which creates a global LIMIT clause—limiting total output rows rather than per-document.

For example, with 2 documents each having 5 array elements and limit=3:

  • Current behavior: returns 3 rows total
  • Expected per-document behavior: returns 6 rows (3 per document)

If global limit is the intended behavior, please clarify in documentation. Otherwise, per-document limiting requires a window function with ROW_NUMBER partitioned by document.


2806-2828: Missing-field handling returns empty result with schema including the missing field.

The approach of projecting all existing fields plus a NULL placeholder for the missing field, then filtering to 0 rows, preserves the expected output schema while returning no data. This is a valid strategy for handling missing fields gracefully.


2833-2840: Type validation for non-ARRAY fields is correctly implemented.

The code properly checks SqlTypeName.ARRAY and throws a clear SemanticCheckException with an informative message including both the field name and the actual type found.


2856-2865: Based on my comprehensive verification, I can now provide the final assessment:

The review comment references code that does not exist in this repository.

After thorough investigation, I found that:

  1. No extractFieldName(Field f) method exists at lines 2856-2865 in CalciteRelNodeVisitor.java or anywhere else in the codebase
  2. No mvexpand functionality exists anywhere in the repository
  3. CalciteRelNodeVisitor.java has 3,237 lines, but the section at lines 2856-2865 contains code related to field replacement, not an extractFieldName method
  4. The QualifiedName class (which the review references) does support multi-part names via getParts(), and other code in the repository demonstrates that multi-part qualified names are already handled correctly elsewhere

The review comment appears to be based on incorrect information or references a different codebase/branch. Since the code being reviewed does not exist, there is no actionable suggestion to verify or implement.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java (2)

43-82: TableWithArray implementation is correct and mirrors existing pattern.

The inner class properly implements the Calcite Table interface with a row type containing an INTEGER array field (EMPNOS), which is suitable for testing mvexpand functionality.


97-115: Test includes both logical plan and SparkSQL verification.

This test properly verifies both the logical plan structure and the equivalent SparkSQL, addressing prior review feedback about adding verifyPPLToSparkSQL assertions.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (2)

23-59: Comprehensive test setup with good edge case coverage.

The test initialization creates a well-structured index with nested mapping and inserts diverse test documents covering:

  • Multiple elements, single element, empty arrays
  • Null values and missing fields
  • Partial/mixed element shapes
  • Duplicate elements and large arrays
  • Heterogeneous subfield types

This aligns well with the edge cases documented in the PR objectives.


72-134: Integration tests properly verify data rows and ordering.

The tests appropriately use verifyDataRows with expected values and include sorting (sort skills.name) to ensure deterministic ordering for assertions. The coverage of single element, empty array, null array, missing field, duplicates, and happy path with multiple elements is thorough.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

321-328: Clarify or remove the misleading comment.

The comment "script pushdown" on line 324 is unclear or potentially misleading. The mvexpand command expands multivalue/array fields into separate rows; while the explain plan may show script operations, the primary purpose isn't script pushdown. Consider removing this comment or clarifying what aspect of the plan you're highlighting.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

81-81: Consider specific imports over wildcards.

While wildcard imports work, specific imports are generally preferred for clarity and to avoid potential naming conflicts. However, if the codebase consistently uses wildcards, this is acceptable.

Also applies to: 85-85

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08b56ee and 3ae2c73.

📒 Files selected for processing (2)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
integ-test/src/test/java/org/opensearch/sql/sql/IdentifierIT.java (1)
  • Index (218-262)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (2)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

46-46: LGTM: Test index loaded for mvexpand edge cases.

The addition of the MVEXPAND_EDGE_CASES index follows the established pattern for test data initialization.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2800-2862: Consider adding post-materialization type validation.

Based on past review comments, you mentioned validating the field type "both at resolve-time and after materialization (to catch rename/alias cases)." Currently, type validation only occurs at line 2843 before calling buildExpandRelNode. Consider whether additional validation is needed after field resolution to catch edge cases involving renames or aliases.

If post-materialization validation is necessary, you could add checks after the field has been fully resolved through any renames or transformations. Otherwise, please confirm that the current pre-resolution validation is sufficient for all cases.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2773-2785: mvexpand limit is applied globally instead of per document

visitMvExpand currently applies relBuilder.limit(0, limit) after buildExpandRelNode, i.e., on the fully expanded relation. That caps the total number of rows across the entire result set, not the number of expanded values per input document. With multiple input rows, this will drop expansions from later rows instead of limiting each document independently (e.g., two docs with large arrays and limit=3 would yield at most 3 rows total, not 3 per doc), which diverges from the PR objective that limit be per document.

Because buildExpandRelNode constructs a correlated right-hand side (LogicalValuesprojectuncollect), the natural place for a per-document limit is inside that right branch so it is evaluated once per outer row.

Consider refactoring to thread the (validated) limit into buildExpandRelNode and apply it on the right side before build():

@@
   @Override
   public RelNode visitExpand(Expand expand, CalcitePlanContext context) {
@@
-    buildExpandRelNode(arrayFieldRex, arrayField.getField().toString(), alias, context);
+    buildExpandRelNode(arrayFieldRex, arrayField.getField().toString(), alias, null, context);
@@
   public RelNode visitPatterns(Patterns node, CalcitePlanContext context) {
@@
-        buildExpandRelNode(
-            context.relBuilder.field(node.getAlias()), node.getAlias(), node.getAlias(), context);
+        buildExpandRelNode(
+            context.relBuilder.field(node.getAlias()),
+            node.getAlias(),
+            node.getAlias(),
+            null,
+            context);
@@
   @Override
   public RelNode visitMvExpand(MvExpand mvExpand, CalcitePlanContext context) {
@@
-    // 2C. Valid array → expand
-    int index = matched.getIndex();
-    RexInputRef fieldRef = context.rexBuilder.makeInputRef(type, index);
-
-    buildExpandRelNode(fieldRef, fieldName, fieldName, context);
-
-    Integer limit = mvExpand.getLimit();
-    if (limit != null) {
-      if (limit <= 0) {
-        throw new SemanticCheckException(
-            String.format("mvexpand limit must be positive, but got %d", limit));
-      }
-      relBuilder.limit(0, limit);
-    }
+    // 2C. Valid array → expand (with optional per-document limit)
+    int index = matched.getIndex();
+    RexInputRef fieldRef = context.rexBuilder.makeInputRef(type, index);
+
+    Integer limit = mvExpand.getLimit();
+    if (limit != null && limit <= 0) {
+      throw new SemanticCheckException(
+          String.format("mvexpand limit must be positive, but got %d", limit));
+    }
+
+    buildExpandRelNode(fieldRef, fieldName, fieldName, limit, context);
@@
-  private void buildExpandRelNode(
-      RexInputRef arrayFieldRex, String arrayFieldName, String alias, CalcitePlanContext context) {
+  private void buildExpandRelNode(
+      RexInputRef arrayFieldRex,
+      String arrayFieldName,
+      String alias,
+      @Nullable Integer perDocLimit,
+      CalcitePlanContext context) {
@@
-    // 5. Build join right node and expand the array field using uncollect
-    RelNode rightNode =
-        context
-            .relBuilder
-            // fake input, see convertUnnest and convertExpression in Calcite SqlToRelConverter
-            .push(LogicalValues.createOneRow(context.relBuilder.getCluster()))
-            .project(List.of(correlArrayFieldAccess), List.of(arrayFieldName))
-            .uncollect(List.of(), false)
-            .build();
+    // 5. Build join right node and expand the array field using uncollect
+    context
+        .relBuilder
+        // fake input, see convertUnnest and convertExpression in Calcite SqlToRelConverter
+        .push(LogicalValues.createOneRow(context.relBuilder.getCluster()))
+        .project(List.of(correlArrayFieldAccess), List.of(arrayFieldName))
+        .uncollect(List.of(), false);
+
+    if (perDocLimit != null) {
+      context.relBuilder.limit(0, perDocLimit);
+    }
+
+    RelNode rightNode = context.relBuilder.build();
@@
-    context
-        .relBuilder
-        .push(leftNode)
-        .push(rightNode)
-        .correlate(JoinRelType.INNER, correlVariable.get().id, List.of(arrayFieldRex))
-        // 7. Remove the original array field from the output.
-        // TODO: RFC: should we keep the original array field when alias is present?
-        .projectExcept(arrayFieldRex);
+    context
+        .relBuilder
+        .push(leftNode)
+        .push(rightNode)
+        .correlate(JoinRelType.INNER, correlVariable.get().id, List.of(arrayFieldRex))
+        // 7. Remove the original array field from the output.
+        // TODO: RFC: should we keep the original array field when alias is present?
+        .projectExcept(arrayFieldRex);

This way the limit is applied inside the correlated right branch and is naturally per-document while keeping existing expand and patterns behaviour unchanged (they just pass null). As per the PR objectives, this aligns Calcite mvexpand with the intended per-document limit semantics.

Also applies to: 2799-2866, 791-793, 3125-3171

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ae2c73 and 5e616ff.

📒 Files selected for processing (1)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2 hunks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2927-2939: Fix nested field path handling.

The extractFieldName method currently returns only the first segment of a QualifiedName (line 2933: parts.get(0)), which drops nested path components. For example, user.skills would become just user, causing field resolution to fail for nested arrays.

This issue was previously identified in review comments. To support nested fields like user.skills, the full dotted path must be preserved.

Apply this diff to fix nested field resolution:

     if (inner instanceof QualifiedName) {
       List<String> parts = ((QualifiedName) inner).getParts();
       if (!parts.isEmpty()) {
-        return parts.get(0);
+        return String.join(".", parts);
       }
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e616ff and 709704c.

📒 Files selected for processing (2)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (4 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🔇 Additional comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

2846-2925: LGTM with observations.

The visitMvExpand implementation correctly handles the three key cases:

  1. Missing field → returns empty result with correct schema (lines 2874-2896)
  2. Non-array type → throws clear exception (lines 2902-2907)
  3. Valid array → expands and applies validated limit (lines 2909-2922)

The limit validation (lines 2917-2920) properly checks for positive values and provides a clear error message, addressing previous review feedback.

Minor observation: The missing field handling creates a full projection with NULL before filtering to zero rows (lines 2874-2894). While this preserves the schema correctly, it adds overhead compared to directly creating an empty result. However, this appears intentional based on the PR objectives to handle missing fields gracefully.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 709704c and c45fa05.

📒 Files selected for processing (2)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (3 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (2 hunks)
🔇 Additional comments (2)
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (2)

36-36: Updated search anonymization expectation looks correct

The new "source=table a:***" format aligns with the anonymizer’s current behavior and keeps the test in sync; no issues here.


848-858: Mvexpand anonymizer tests are well‑aligned and restore full coverage

Both mvexpand tests match the anonymization conventions (identifier masking and limit redaction), and with @Test on testMvdedup still present, the earlier lost‑annotation issue is fully resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
ppl/src/main/antlr/OpenSearchPPLParser.g4 (1)

81-81: Parser wiring for MVEXPAND looks correct; consider aligning LIMIT literal with other commands

  • Adding mvexpandCommand into commands and MVEXPAND into commandName cleanly integrates the new command into the PPL pipeline and keyword set.
  • mvexpandCommand : MVEXPAND fieldExpression (LIMIT EQUAL INTEGER_LITERAL)? matches the intended mvexpand <field> limit=<int> syntax and addresses prior feedback to use the EQUAL token.

Two minor nits:

  1. Other LIMIT-like parameters (e.g., chartOptions, timechartParameter) use integerLiteral instead of bare INTEGER_LITERAL. For consistency and to match existing patterns, you might switch to:

    mvexpandCommand
        : MVEXPAND fieldExpression (LIMIT EQUAL integerLiteral)?
        ;

    This keeps grammar uniform and leaves sign/valid-range checks to semantic validation. If you intentionally want to forbid signed values here, the current INTEGER_LITERAL is fine; just ensure the behavior is documented.

  2. Given the new command, double-check that AstBuilder.visitMvexpandCommand and related components (e.g., MvExpand node, anonymizer, planner) stay in lockstep with this rule shape (especially the optional LIMIT) and that tests cover both with/without LIMIT. Based on learnings, keeping AST generation in sync with grammar changes is important.

Also applies to: 120-120, 532-534

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

321-327: New testMvexpandExplain is concise and aligned with existing explain tests

This test follows the existing pattern (load expected plan + explainQueryYaml) and gives mvexpand a Calcite explain coverage point. If there isn’t one already elsewhere, consider adding an additional explain IT that covers mvexpand with a limit=<n> option to lock in the planning behavior for the limit semantics as well.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f3435e and 9aec421.

📒 Files selected for processing (5)
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2 hunks)
  • ppl/src/main/antlr/OpenSearchPPLLexer.g4 (1 hunks)
  • ppl/src/main/antlr/OpenSearchPPLParser.g4 (3 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)

  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧬 Code graph analysis (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
integ-test/src/test/java/org/opensearch/sql/sql/IdentifierIT.java (1)
  • Index (218-262)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (2)
ppl/src/main/antlr/OpenSearchPPLLexer.g4 (1)

53-53: MVEXPAND lexer token is correctly defined and aligned with existing commands

Placement next to EXPAND, simple literal pattern, and caseInsensitive = true all look consistent with existing command tokens.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

30-48: The code referenced in this review comment (Index.MVEXPAND_EDGE_CASES and its integration into CalciteExplainIT.init()) does not exist in the current repository. The actual CalciteExplainIT.java loads 11 indices ending with Index.DATA_TYPE_ALIAS, not Index.MVEXPAND_EDGE_CASES. Manual verification is required by reviewing the actual pull request or merged code to confirm that the Index.MVEXPAND_EDGE_CASES constant correctly maps to the "mvexpand_edge_cases" index name used in test queries.

@srikanthpadakanti
Copy link
Author

@dai-chen I believe the test failures in CI are unrelated to my changes.

@dai-chen
Copy link
Collaborator

@dai-chen I believe the test failures in CI are unrelated to my changes.

Not sure what's going wrong. Could you see if it goes away after merging from main again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native support for mvexpand command in PPL

4 participants