Mvexpand feature #4675

srikanthpadakanti · 2025-10-27T19:47:15Z

Description

This pull request adds native support for the mvexpand command in PPL to OpenSearch SQL, enabling users to expand multivalue fields (arrays) into separate rows directly within queries. This functionality is analogous to Splunk's mvexpand command and streamlines analytics, dashboarding, and data preparation involving arrays or multivalue fields.

Key features introduced:

Native mvexpand command for PPL queries to expand array fields into separate rows/events.
Optional limit parameter to restrict the number of expanded values per event/document.
Robust handling of empty/null arrays, large arrays (with memory/resource limits), and non-array fields.
Streaming/distributable execution for performance and scalability.
Comprehensive documentation and edge case coverage.
This feature makes OpenSearch SQL more powerful and user-friendly for log analytics, data exploration, and migration from platforms like Splunk.

Related Issues

Resolves #4439
#4439

Check List

[ X] New functionality includes testing.
[ X] New functionality has been documented.
[ X] New functionality has javadoc added.
[ X] New functionality has a user manual doc added.
[ X] New PPL command checklist all confirmed.
[ X] API changes companion pull request created.
[ X] Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

srikanthpadakanti · 2025-10-27T19:58:48Z

Attaching the manual test cases and its output
mvexpand_manual_test_results.md

srikanthpadakanti · 2025-10-27T20:16:49Z

Hi maintainers @penghuo , the "Enforce PR labels" check is failing because I can't add labels. Could you please add the required labels to this PR so the checks pass? Thank you!

core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java

ppl/src/main/antlr/OpenSearchPPLParser.g4

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java

docs/user/ppl/cmd/mvexpand.rst

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java

ykmr1224 · 2025-10-30T18:49:05Z

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java

+    super.init();
+    enableCalcite();
+    deleteIndexIfExists(INDEX);
+    createIndex(


Should we rather use Index.MVEXPAND_EDGE_CASES? (Is it duplicate?)

Looks it is not fixed.

changed to - Index.MVEXPAND_EDGE_CASES.getName()

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java

srikanthpadakanti · 2025-10-30T20:32:07Z

Hello @jed326 @ykmr1224
I have addressed the PR comments and made the changes accordingly. Please do review.

dai-chen · 2025-11-14T22:58:59Z

Please check CI failure.

srikanthpadakanti · 2025-11-15T18:47:25Z

Please check CI failure.

@dai-chen The CI failures are not related to mvexpand. I merged the upstream changes to. my main. CI failure is pointing to -
testMvindexSingleElementPositive, testMvindexSingleElementNegative, testMvindexRangePositive, testMvindexRangeNegative in CalcitePPLArrayFunctionTest.
The assertion failures are string-equality checks of generated SQL (PPL → SparkSQL text). The RelNode / logical plan printed in the local run shows ITEM(...) and ARRAY_SLICE(...) expressions — i.e. the array functions are present and being produced, but the exact string formatting (parentheses, unary +/-, expression grouping) differs from the test expectation used in CI.

My local run shows those tests pass. But, CI failed. That implies either:
A difference in environment (JDK/OS/Calcite pretty-printer) between my local and CI, or
Different code in CI (different branch/commit or cached build) than my local, or
Tests were updated in one place and translator/pretty printing in another.method in CalcitePPLArrayFunctionTest. @ahkcs Can you please confirm

ykmr1224 · 2025-11-18T00:31:10Z

core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java

  }

+  public static UnresolvedPlan mvexpand(UnresolvedPlan input, Field field, Integer limit) {
+    // attach the incoming child plan so the AST contains the pipeline link


nit: unneeded comment.

It is still there.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

core/src/test/java/org/opensearch/sql/calcite/CalciteRelNodeVisitorExpandTest.java

ykmr1224 · 2025-11-18T01:01:54Z

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java

+    bulkInsert(
+        INDEX,
+        "{\"username\":\"happy\",\"skills\":[{\"name\":\"python\"},{\"name\":\"java\"},{\"name\":\"sql\"}]}",
+        "{\"username\":\"single\",\"skills\":[{\"name\":\"go\"}]}",
+        "{\"username\":\"empty\",\"skills\":[]}",
+        "{\"username\":\"nullskills\",\"skills\":null}",
+        "{\"username\":\"noskills\"}",
+        "{\"username\":\"missingattr\",\"skills\":[{\"name\":\"c\"},{\"level\":\"advanced\"}]}",
+        "{\"username\":\"complex\",\"skills\":[{\"name\":\"ml\",\"level\":\"expert\"},{\"name\":\"ai\"},{\"level\":\"novice\"}]}",
+        "{\"username\":\"duplicate\",\"skills\":[{\"name\":\"dup\"},{\"name\":\"dup\"}]}",
+        "{\"username\":\"large\",\"skills\":[{\"name\":\"s1\"},{\"name\":\"s2\"},{\"name\":\"s3\"},{\"name\":\"s4\"},{\"name\":\"s5\"},{\"name\":\"s6\"},{\"name\":\"s7\"},{\"name\":\"s8\"},{\"name\":\"s9\"},{\"name\":\"s10\"}]}");


Is it addressed?
I thinks we should add test case for happy at least.

ykmr1224 · 2025-11-18T01:05:28Z

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java

+    super.init();
+    enableCalcite();
+    deleteIndexIfExists(INDEX);
+    createIndex(


Looks it is not fixed.

dai-chen

I found this in expand command code: https://github.com/opensearch-project/sql/blob/main/integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExpandCommandIT.java#L141.

Could you clarify why not make mvexpand an alias of expand command with enhancements? If we can do this, I think many new changes can be avoided.

ahkcs · 2025-11-19T20:25:15Z

My local run shows those tests pass. But, CI failed. That implies either:
A difference in environment (JDK/OS/Calcite pretty-printer) between my local and CI, or
Different code in CI (different branch/commit or cached build) than my local

Can you try rebasing to current main and make sure your local jdk version is the same as CI? Currently CI is failing

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java

docs/user/ppl/cmd/mvexpand.rst

srikanthpadakanti · 2025-11-20T17:24:32Z

I found this in expand command code: https://github.com/opensearch-project/sql/blob/main/integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExpandCommandIT.java#L141.

Could you clarify why not make mvexpand an alias of expand command with enhancements? If we can do this, I think many new changes can be avoided.

@dai-chen
mvexpand and expand solve different problems and operate on different data shapes.
expand historically works on nested/struct fields, projecting the struct into rows.
mvexpand works on multivalue/array fields, emitting one row per array element, which requires different type validation, planner rules, and runtime behavior.
mvexpand also needs to handle arrays containing primitives, objects, nulls, and mixed shapes, and must reliably produce per-element rows (including null elements). expand does not extract sub-fields from array elements and its semantics around aliasing and struct projection are different.
OpenSearch mappings complicate this further: a field typed as keyword may still contain arrays at runtime. mvexpand therefore needs stricter semantic checks and fallbacks that don’t align with the legacy expand contract. Reusing expand directly would break backward-compatible behavior expected by existing queries and tests.

Problems in alias attempt:
The alias attempt produced repeated values (java, java, java) instead of (java, python, sql) from our IT example.
The repeated output indicates the plan was projecting the same input reference for every row instead of the per-element value. This typically happens when:
field resolution occurs before the uncollect/correlate materializes element values, the projection uses the outer input ref instead of the element ref, or

column indices (RexInputRef) drift when reusing expand logic, causing all projected rows to read from the same column.

In short, simply delegating to expand bypassed the per-element extraction logic, so the projection kept resolving to the same value on every row.

Refactor: both EXPAND and MVEXPAND delegate to buildExpandCore(...), which implements the // shared correlate + uncollect + projection logic. visitExpand applies EXPAND semantics // (remove original array field, optional alias rename). visitMvExpand handles its two special // cases — missing field (returns an empty VALUES row with a nullable placeholder) and // limit= — then delegates to the shared helper.

srikanthpadakanti · 2025-11-20T23:56:26Z

Please approve the workflow @dai-chen @penghuo

ykmr1224

I haven't reviewed the logics in CalciteRelNodeVisitor since those are too long to read and not organized. Please refactor it first.
Also, I suppose the code changes are mostly directly from coding agent. Please review by yourself first, and reflect our earlier comments as well.

ykmr1224 · 2025-11-21T21:08:18Z

core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java

  }

+  public static UnresolvedPlan mvexpand(UnresolvedPlan input, Field field, Integer limit) {
+    // attach the incoming child plan so the AST contains the pipeline link


It is still there.

ykmr1224 · 2025-11-21T21:12:03Z

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

+      // Allow using INTERNAL_ITEM when the element type is unknown/undefined at planning time.
+      // Some datasets (or Calcite's type inference) may give the element an UNDEFINED type.
+      // Accept a "ignore" first-argument family so INTERNAL_ITEM(elem, 'key') can still be planned
+      // and resolved at runtime (fallback semantics handled at execution side). - Used in MVEXPAND
+      registerOperator(
+          INTERNAL_ITEM,
+          SqlStdOperatorTable.ITEM,
+          PPLTypeChecker.family(SqlTypeFamily.IGNORE, SqlTypeFamily.CHARACTER));


Why do we need to define separate one from above one? (can we add or to above definition?)

I tried. But, Calcite’s OR operator requires both operands to be composite checkers.
The fallback rule I added (IGNORE, CHARACTER) is still a SqlSingleOperandTypeChecker, so Calcite rejects the merge.

ykmr1224 · 2025-11-21T21:14:05Z

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

  /**
-   * Expand command visitor to handle array field expansion. 1. Unnest 2. Join with the original
-   * table to get all fields
+   * Portions of CalciteRelNodeVisitor related to EXPAND / MVEXPAND.


Looks there are very long code for expand/mvexpand. Please extract to a class instead of using code section.

Minimized it to use the same - buildExpandRelNode

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java

ykmr1224 · 2025-11-21T21:44:53Z

integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java

+    MVEXPAND_EDGE_CASES(
+        "mvexpand_edge_cases",
+        "mvexpand_edge_cases",
+        getMappingFile("mvexpand_edge_cases_mapping.json"),
+        "src/test/resources/mvexpand_edge_cases.json"),


I suspect this (and json) is really needed. It seems used only in CalciteExplainIT, but the data does matter for ExplainIT.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAbstractTest.java

coderabbitai · 2025-11-27T22:30:21Z

📝 Walkthrough

Summary by CodeRabbit

Release Notes

New Features
- Introduced new mvexpand command to expand array fields into individual rows in query results. Supports optional limit parameter to control maximum rows per document. Properly handles empty arrays, null values, and missing fields.
Documentation
- Added comprehensive documentation with practical examples and detailed behavior specifications for the mvexpand command.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

Adds native PPL mvexpand: lexer/parser/AST node, visitor and analyzer hooks, Calcite planning with array validation and optional per-document limit, DSL/anonymizer updates, operator registration tweak, tests/fixtures, and documentation.

Changes

Cohort / File(s)	Summary
AST Node & Visitors `core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java`, `core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java`, `core/src/main/java/org/opensearch/sql/analysis/Analyzer.java`	New `MvExpand` AST node (field + optional limit), visitor hook `visitMvExpand`, and Analyzer override delegating to Calcite-only handling.
Calcite Planning `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`	Add `visitMvExpand`, resolve field name, validate ARRAY type, handle missing/null, reuse Expand mechanics, add optional per-document `limit`, update `buildExpandRelNode` signature, and add `extractFieldName`.
DSL & Operator Table `core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java`, `core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java`	Add `mvexpand(UnresolvedPlan, Field, Integer)` DSL builder and register an `INTERNAL_ITEM` operator variant to allow planning with unknown element types.
PPL Lexer / Parser / AST Builder `ppl/src/main/antlr/OpenSearchPPLLexer.g4`, `ppl/src/main/antlr/OpenSearchPPLParser.g4`, `ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java`	Add `MVEXPAND` token, `mvexpandCommand` grammar (optional `limit=<int>`), and AstBuilder method producing `MvExpand`.
PPL Utilities / Anonymizer `ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java`, `ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java`	Anonymizer visitor for `MvExpand` (mask field and optional masked limit) and unit tests for anonymized outputs.
Calcite Unit Tests `ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java`	New Calcite-based unit tests covering mvexpand scenarios (basic, limit, empty/null, duplicates, large arrays, primitive arrays, projections).
Integration Tests & Fixtures `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java`, `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java`, `integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java`, `integ-test/src/test/resources/...`	Add integration test class for mvexpand edge cases, explain test additions, register test index in suites, expected explain YAML fixtures, and edge-case mappings/data.
Doctest Data & Mapping `doctest/test_data/mvexpand_logs.json`, `doctest/test_mapping/mvexpand_logs.json`, `doctest/test_docs.py`	Add mvexpand doctest dataset, mapping, and register it in the doctest harness.
Docs & Index `docs/user/ppl/cmd/mvexpand.rst`, `docs/user/ppl/index.rst`, `docs/category.json`, `docs/user/dql/metadata.rst`	Add mvexpand documentation page, update PPL command index/category, and include mvexpand in metadata examples.
Expected Output Fixtures `integ-test/src/test/resources/expectedOutput/calcite/...`, `integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/...`	Add expected explain output fixtures used by mvexpand explain tests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Lexer
    participant Parser
    participant AstBuilder
    participant Analyzer
    participant CalcitePlanner
    participant Executor

    User->>Lexer: "source=t | mvexpand field [limit=N]"
    Lexer->>Parser: tokens (includes MVEXPAND)
    Parser->>AstBuilder: parse mvexpandCommand
    AstBuilder->>AstBuilder: build MvExpand(field, limit)
    AstBuilder->>Analyzer: emit MvExpand node
    Analyzer->>CalcitePlanner: route node for Calcite planning
    CalcitePlanner->>CalcitePlanner: resolve field name & type
    alt missing or non-array field
        CalcitePlanner->>CalcitePlanner: produce empty projection or semantic error
    else array field present
        CalcitePlanner->>CalcitePlanner: create Expand RelNode, apply optional per-doc limit
    end
    CalcitePlanner->>Executor: return finalized RelNode / plan
    Executor->>User: execute -> rows (one per array element)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pay extra attention to:
- CalciteRelNodeVisitor.visitMvExpand and updated buildExpandRelNode (field extraction, ARRAY validation, missing/null handling, per-doc limit).
- New integration tests and fixtures in integ-test/... (setup, bulkInsert, assertions).
- Parser/AST changes for optional limit and correct AST construction.
- Anonymizer masking logic and related unit tests.

Suggested reviewers

ykmr1224
penghuo
ps48
dai-chen
kavithacm
derek-ho
joshuali925
RyanL1997
GumpacG

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.04% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Mvexpand feature' is vague and generic, using a broad term that does not clearly convey the specific nature or scope of the change.	Use a more descriptive title such as 'Add native mvexpand command for PPL to expand array fields into separate rows' to clearly indicate the feature and its primary purpose.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The pull request description is well-detailed and directly related to the changeset, explaining the mvexpand feature, its purpose, key features, testing, and documentation status.
Linked Issues check	✅ Passed	The pull request implements all primary and secondary objectives from issue #4439: mvexpand command for PPL with array field expansion, optional limit parameter, handling of edge cases (empty/null arrays, large arrays, non-array fields), streaming execution, comprehensive tests, Javadoc, and user documentation.
Out of Scope Changes check	✅ Passed	All changes directly support mvexpand implementation including AST nodes, visitor patterns, parser grammar, integration tests, documentation, and test data. The only tangential change is removal of search.rst doc entry which appears to be documentation reorganization related to the category.json addition.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf87312 and c9e2767.

📒 Files selected for processing (1)

ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (8)

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (1)

93-93: Mvexpand parser wiring is correct; consider using internalVisitExpression for consistency

The new visitMvexpandCommand correctly builds MvExpand(field, limit) from the grammar. For stylistic consistency with nearby visitors (e.g., visitExpandCommand), you could use internalVisitExpression(ctx.fieldExpression()) instead of calling expressionBuilder.visit directly, but functionally they are equivalent.

Also applies to: 870-876
ppl/src/main/antlr/OpenSearchPPLParser.g4 (1)
81-82: mvexpand grammar looks good; consider using integerLiteral for limit for consistency

The new mvexpandCommand is wired correctly into commands and commandName, and LIMIT EQUAL ... matches other named-arg patterns.

For consistency with options like timechart/chart/bin that use LIMIT EQUAL integerLiteral, consider changing:
mvexpandCommand
    : MVEXPAND fieldExpression (LIMIT EQUAL INTEGER_LITERAL)?
    ;
to:
mvexpandCommand
    : MVEXPAND fieldExpression (LIMIT EQUAL integerLiteral)?
    ;
so all numeric options share the same literal rule and future validation is uniform.

Also applies to: 120-121, 532-534
docs/user/ppl/index.rst (1)

51-109: Expose mvexpand in the commands table as well

The new bullets (including mvexpand command) are useful entry points, but mvexpand is not listed in the big commands table below, which summarizes version/status/descriptions.

For consistency and discoverability, consider adding a row to that table for mvexpand (with version introduced, status, and a short description) alongside expand/flatten.
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)
82-83: Mvexpand anonymization behavior is correct; consider reusing visitExpression for field masking

The new visitMvExpand correctly:

delegates to the child plan, and

emits | mvexpand identifier with optional limit=***, matching the new tests.

To stay consistent with visitExpand/visitFlatten, you might prefer:
@Override
public String visitMvExpand(MvExpand node, String context) {
  String child = node.getChild().get(0).accept(this, context);
  String field = visitExpression(node.getField());
  if (node.getLimit() != null) {
    return StringUtils.format("%s | mvexpand %s limit=%s", child, field, MASK_LITERAL);
  }
  return StringUtils.format("%s | mvexpand %s", child, field);
}
This keeps any special handling in maskField/expressionAnalyzer (e.g., meta fields) consistent across commands while preserving the current output.

Also applies to: 655-664
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java (1)
117-163: Smoke tests validate planning succeeds but lack output assertions.

These tests (testMvExpandWithLimit through testMvExpandPrimitiveArray) only verify that getRelNode(ppl) succeeds without checking the resulting plan. While they confirm no planning exceptions occur, consider adding at least basic verifyLogical assertions for one or two key edge cases (e.g., testMvExpandWithLimit) to ensure the generated plan includes expected operators.

Example enhancement for testMvExpandWithLimit:
@Test
public void testMvExpandWithLimit() {
  String ppl = "source=DEPT | mvexpand EMPNOS | head 1";
  RelNode root = getRelNode(ppl);
  String expectedLogical = 
      "LogicalSort(fetch=[1])\n"
          + "  LogicalProject(DEPTNO=[$0], EMPNOS=[$2])\n"
          + "    LogicalCorrelate(...)\n";
  verifyLogical(root, expectedLogical);
}
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (3)
61-70: Consider logging suppressed exceptions for debugging.

The cleanupAfterEach method silently ignores all exceptions. While this is acceptable for cleanup, logging at debug level could help diagnose test failures:
   @AfterEach
   public void cleanupAfterEach() throws Exception {
     try {
       deleteIndexIfExists(INDEX + "_not_array");
       deleteIndexIfExists(INDEX + "_missing_field");
       deleteIndexIfExists(INDEX + "_limit_test");
       deleteIndexIfExists(INDEX + "_int_field");
     } catch (Exception ignored) {
+      // Cleanup failures are expected if indices don't exist
     }
   }
227-251: Limit test validates row count but doesn't confirm per-document semantics.

This test uses a single user document, so it cannot distinguish between per-document limiting (3 elements from this document) vs. global limiting (3 rows total). Consider adding a multi-document scenario to clarify the expected behavior:
@Test
public void testMvexpandLimitWithMultipleDocuments() throws Exception {
  // Insert two users each with 5 skills
  bulkInsert(idx, 
    "{\"username\":\"user1\",\"skills\":[{\"name\":\"a\"},{\"name\":\"b\"},{\"name\":\"c\"},{\"name\":\"d\"},{\"name\":\"e\"}]}",
    "{\"username\":\"user2\",\"skills\":[{\"name\":\"f\"},{\"name\":\"g\"},{\"name\":\"h\"},{\"name\":\"i\"},{\"name\":\"j\"}]}");
  // With limit=3:
  // - If per-document: expect 6 rows (3 per user)
  // - If global: expect 3 rows total
  // Clarify expected behavior with assertion
}
340-360: Quote _id value in bulk request JSON for correctness.

The _id field in OpenSearch bulk requests should be a JSON string. While OpenSearch may auto-convert numeric values, explicit quoting is more correct:
-      bulk.append("{\"index\":{\"_id\":").append(id).append("}}\n");
+      bulk.append("{\"index\":{\"_id\":\"").append(id).append("\"}}\n");
This ensures proper JSON formatting regardless of whether the id is numeric or string-based.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 885230f and 9508874.

📒 Files selected for processing (27)

core/src/main/java/org/opensearch/sql/analysis/Analyzer.java (2 hunks)
core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java (2 hunks)
core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java (2 hunks)
core/src/main/java/org/opensearch/sql/ast/tree/MvExpand.java (1 hunks)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2 hunks)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1 hunks)
docs/category.json (1 hunks)
docs/user/dql/metadata.rst (3 hunks)
docs/user/ppl/cmd/mvexpand.rst (1 hunks)
docs/user/ppl/index.rst (1 hunks)
doctest/test_data/mvexpand_logs.json (1 hunks)
doctest/test_docs.py (1 hunks)
doctest/test_mapping/mvexpand_logs.json (1 hunks)
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1 hunks)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (2 hunks)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (1 hunks)
integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/explain_mvexpand.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_mvexpand.yaml (1 hunks)
integ-test/src/test/resources/mvexpand_edge_cases.json (1 hunks)
integ-test/src/test/resources/mvexpand_edge_cases_mapping.json (1 hunks)
ppl/src/main/antlr/OpenSearchPPLLexer.g4 (1 hunks)
ppl/src/main/antlr/OpenSearchPPLParser.g4 (3 hunks)
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java (2 hunks)
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (2 hunks)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLMvExpandTest.java (1 hunks)
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java (1)

legacy/src/main/java/org/opensearch/sql/legacy/utils/StringUtils.java (1)

StringUtils (17-117)

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

integ-test/src/test/java/org/opensearch/sql/sql/IdentifierIT.java (1)

Index (218-262)

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (2)

integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java (1)

PPLIntegTestCase (36-409)

integ-test/src/test/java/org/opensearch/sql/sql/IdentifierIT.java (1)

Index (218-262)

🪛 Biome (2.1.2)

integ-test/src/test/resources/mvexpand_edge_cases.json

[error] 1-2: End of file expected