Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Dec 4, 2025

Summary

Resolves #4896ArrayIndexOutOfBoundsException when querying an index containing malformed field names (e.g., ".", "..", ".a", "a.", "a..b") inside disabled object fields.

Disabled objects ("enabled": false) bypass field-name validation, allowing malformed names to be indexed and subsequently causing crashes in the SQL/PPL engines.


Root Cause

OpenSearchExprValueFactory.JsonPath constructs field paths using:

rawPath.split("\\.");

For malformed field names, split("\\.") behaves unexpectedly:

Field Name Result of split() Issue
".", ".." [] (empty array) dot-only → paths.get(0) crashes
".a" ["", "a"] leading dot → empty path segment
"a." ["a"] (trailing empty removed) trailing dot silently lost
"a..b" ["a", "", "b"] consecutive dots → empty segment

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved handling of queries on object fields containing malformed field names (dot-only names, leading/trailing dots, or consecutive dots). Invalid fields now return null while valid fields remain accessible.
  • Documentation

    • Added documentation describing limitations when querying object fields with malformed field names and recommendations to avoid problematic naming patterns.

✏️ Tip: You can customize this high-level summary in your review settings.

ahkcs added 3 commits December 2, 2025 16:52
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Walkthrough

Adds validation and handling for malformed dot-containing field names during struct parsing in the SQL OpenSearch value factory, treats malformed fields as null, adds unit and integration tests covering dot-only and other malformed names, and documents the behavior for disabled object fields.

Changes

Cohort / File(s) Summary
Integration test
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml
New YAML-based integration test that enables Calcite, creates an index with a disabled object field, indexes documents containing various malformed dot-named subfields, verifies queries return expected results and that malformed fields resolve to null, then disables Calcite and deletes the index.
Runtime handling
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
Changes struct parsing to compute fieldKey and fullFieldPath; adds isFieldNameMalformed(String) detection for dot-only, leading/trailing, or consecutive-dot names; sets malformed fields to ExprNullValue instead of parsing; uses JsonPath(fieldKey) and propagates fullFieldPath for accurate type mapping.
Unit tests
opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
Adds tests for malformed field-name detection and struct/tuple construction behavior, asserting malformed dot-containing names yield null while valid fields are preserved. (Note: test blocks appear duplicated in the diff.)
Documentation
docs/user/ppl/limitations/limitations.rst
New section "Malformed Field Names in Object Fields" describing that malformed subfield names inside enabled: false object fields are ignored by PPL (returned as null), with examples and recommendations to avoid leading/trailing/consecutive/dot-only names.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Inspect isFieldNameMalformed implementation for correct handling of dot-only, leading/trailing, and consecutive-dot cases.
  • Verify propagation and usage of fullFieldPath vs fieldKey in recursive parsing to ensure type mapping remains correct.
  • Review new unit tests (and duplicated blocks) for correctness and sufficient coverage of edge cases.
  • Confirm treating malformed fields as ExprNullValue does not introduce regressions for other consumers of parsed structs.

"🐰 I nudged through dots where chaos lay,
Turned crashy paths to quiet hay,
Malformed keys now sleep as null,
Queries hop through fields, all whole,
A little hop — no more dismay!"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: handling errors for field names containing dots, which is the core fix for issue #4896.
Linked Issues check ✅ Passed The PR fulfills all key requirements from issue #4896: prevents ArrayIndexOutOfBoundsException for dot-containing field names in disabled objects, handles malformed field paths via isFieldNameMalformed logic, and includes comprehensive tests and documentation.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing issue #4896: code fix in OpenSearchExprValueFactory, unit tests, integration test, and documentation on limitations. No unrelated modifications detected.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a6ba71 and adba5ec.

📒 Files selected for processing (1)
  • docs/user/ppl/limitations/limitations.rst (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (1)
docs/user/ppl/limitations/limitations.rst (1)

110-132: Documentation clearly explains malformed field name handling.

The section effectively communicates the key point—that PPL ignores malformed field names—while providing helpful context via the example and recommendation. The documentation directly addresses the limitation users may encounter.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (1)

60-90: LGTM: Test validates the fix comprehensively.

The test properly verifies that:

  1. Queries no longer crash with ArrayIndexOutOfBoundsException
  2. Valid fields return correct values
  3. Invalid dot-only fields are handled gracefully (return empty object)

The test comments clearly explain the expected behavior, including JSON serialization of null values.

Consider adding test cases for additional dot-only patterns like ".." or "..." to ensure comprehensive coverage of edge cases, though the current test is sufficient to validate the core fix.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0139bf1 and 8d8103f.

📒 Files selected for processing (2)
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (5)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java (1)
  • Getter (65-442)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)
  • Getter (80-531)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java (2)
  • Getter (921-1070)
  • Getter (1072-1100)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java (1)
  • Getter (20-170)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializer.java (1)
  • Getter (41-137)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (10)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (7)

35-35: LGTM: Logging infrastructure imports and setup are correct.

The added imports and Logger instance follow standard Log4j2 patterns and are necessary for the invalid field tracking feature.

Also applies to: 39-39, 43-44, 79-79


81-86: LGTM: ThreadLocal usage is appropriate for thread-safe invalid field tracking.

The ThreadLocal pattern correctly avoids log spam by collecting invalid fields per thread and logging them once per construct() call. The JavaDoc clearly explains the purpose.


180-195: LGTM: ThreadLocal cleanup is properly handled.

The construct() method correctly initializes the ThreadLocal at the start and cleans it up in the finally block, ensuring no memory leaks. The logInvalidFields() call is appropriately placed before cleanup.


197-206: LGTM: Invalid field logging is clear and helpful.

The logInvalidFields() method appropriately uses warning level and provides a clear message explaining why fields are invalid, which aids debugging.


426-436: LGTM: Invalid field name detection is correct.

The isInvalidFieldName() method correctly identifies dot-only field names using the same split logic as JsonPath, ensuring consistency.


406-422: LGTM: Invalid field handling in parseStruct() is correct and defensive.

The code properly checks for invalid field names before creating JsonPath, avoiding the IllegalArgumentException. Invalid fields are gracefully handled by substituting null and collecting them for logging. The two-layered approach (early detection + JsonPath validation) provides good defense in depth.


464-472: LGTM: JsonPath constructor validation provides good defense in depth.

The constructor now validates that the split result is not empty and throws a clear IllegalArgumentException if it is. This acts as a safety net, though with the parseStruct() changes, this exception should not be reached during normal operation. The validation ensures the constructor's preconditions are met.

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (3)

1-14: LGTM: Excellent test documentation.

The header comments clearly explain the issue, root cause, and fix, which will help future maintainers understand this test case.


15-48: LGTM: Test setup correctly reproduces the issue.

The setup properly enables Calcite, creates an index with a disabled object field (which bypasses field name validation), and indexes a document with a dot-only field name that would trigger the original bug.


49-59: LGTM: Teardown properly cleans up test resources.

The teardown correctly disables Calcite and deletes the test index, ensuring no side effects for subsequent tests.

ahkcs added 2 commits December 4, 2025 11:23
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
@ahkcs ahkcs changed the title Fix parsing of dot-containing field names Error handling for dot-containing field names Dec 4, 2025
@penghuo penghuo added bugFix PPL Piped processing language backport 2.19-dev labels Dec 4, 2025

OpenSearch normally rejects field names containing problematic dot patterns (such as ``.``, ``..``, ``.a``, ``a.``, or ``a..b``). However, when an object field has ``enabled: false``, OpenSearch bypasses field name validation and allows storing documents with any field names.

If a document contains malformed field names inside a disabled object field, PPL queries will return ``null`` for those specific fields. Other valid fields in the document are returned normally.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabled object field,

remove disabled.

PPL queries will return null for those specific fields.

PPL ignore malformed fieldname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

}
}
When ``log`` is a disabled object field (``enabled: false``), all subfields with malformed names will resolve to ``null``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

malformed names field are ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Signed-off-by: Kai Huang <[email protected]>
@Swiddis Swiddis merged commit 8126367 into opensearch-project:main Dec 5, 2025
37 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 5, 2025
(cherry picked from commit 8126367)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
LantaoJin pushed a commit that referenced this pull request Dec 5, 2025
(cherry picked from commit 8126367)

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
asifabashar pushed a commit to asifabashar/sql that referenced this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev bugFix PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants