Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Dec 3, 2025

Description

Resolves #4896ArrayIndexOutOfBoundsException when querying an index containing dot-only field names (e.g., ".", "..") inside disabled object fields.

Root Cause

In OpenSearchExprValueFactory.JsonPath, the constructor uses:

rawPath.split("\\.")

For field names that consist only of dots (e.g., ".", ".."), split("\\.") returns an empty array [].
Subsequent access to paths.get(0) results in ArrayIndexOutOfBoundsException.

This occurs frequently when an index contains disabled objects ("enabled": false), because disabled objects bypass field name validation and allow invalid field names such as ".".

Fix

If split("\\.") returns an empty array, treat the entire raw path as a literal field name instead of a parsed path:

if (parts.length == 0) {
    this.paths = List.of(rawPath);   // Use "." as a literal field name
} else {
    this.paths = List.of(parts);
}

Result

Input JSON Before After
{"log": {".": "value"}} Crash (ArrayIndexOutOfBounds) Successfully returns the field value

Summary by CodeRabbit

  • Bug Fixes

    • Field names containing dots (leading, trailing, middle, multiple, or malformed) are preserved and remain queryable instead of being split.
  • Tests

    • Added extensive unit tests validating dot-containing field-name parsing and retrieval.
    • Added an integration test exercising end-to-end queries and behavior for various dot-containing field-name scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Fixes JsonPath parsing to preserve dot-only and dot-containing field names as literal path elements; adds static factory methods and tests, and an integration test exercising documents with such field names inside a disabled object mapping.

Changes

Cohort / File(s) Summary
Integration test
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml
New YAML REST integration test: creates an index with a disabled object mapping, indexes seven documents containing various dot-containing field names (e.g., ., .., a..., .a, a..b, log.), toggles the calcite plugin in setup/teardown, runs PPL queries against those fields, and asserts expected values and total count.
Core parsing change
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
Reworked inner JsonPath: removed public raw-string constructor; added static factories JsonPath.literal(String) and JsonPath.fromPath(String); made the List-based constructor private; updated internal usages to centralize path parsing and preserve literal field names containing dots.
Unit tests
opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
Updated tests to use JsonPath.fromPath(...) and added tests covering dot-only, multiple-dots, leading/trailing/middle-dot field names, malformed top-level names, coexistence with normal fields, and explicit JsonPath.literal(...) vs fromPath(...) behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review focus:
    • Confirm JsonPath.literal vs JsonPath.fromPath semantics and edge-case handling for inputs like ".", "..", and leading/trailing dots.
    • Ensure call-site updates preserve intended nested-splitting behavior where appropriate.
    • Validate new and updated unit tests for correctness and coverage.
    • Verify integration test setup/teardown toggles plugin and cleans up index.

Poem

🐰 I hopped through keys both shy and dotty,
Found tiny names that once made things spotty.
Paths now hold firm, no slices in sight,
Queries skip crashes and dance through the night.
🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Fix parsing of dot-containing field names' clearly and concisely summarizes the main change: addressing how field names containing dots are parsed.
Linked Issues check ✅ Passed The PR implements all key requirements: (1) fixes ArrayIndexOutOfBoundsException by treating dot-only names as literals, (2) handles empty-key segments edge cases, (3) adds comprehensive unit tests validating dot-name preservation, (4) includes integration tests verifying query correctness with disabled objects containing dot-containing field names.
Out of Scope Changes check ✅ Passed All changes are within scope: factory method refactoring centralizes path parsing logic, test additions validate the fix across unit and integration levels, and changes align directly with issue #4896's requirements.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4dd36f and 0139bf1.

📒 Files selected for processing (1)
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (1 hunks)

Comment @coderabbitai help to get the list of available commands and usage tips.

Swiddis
Swiddis previously approved these changes Dec 3, 2025
// This can happen with disabled object fields that bypass field name validation
// Log warning to hint users about corrupt data
LOG.warn(
"Field '{}' has invalid name and will return null: {}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be a lot of logging if the user ingests a lot of data like this. Maybe not too bad after compression though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should only log one message like this per query (e.g. listing every field in one shot), instead of doing it for every call to parseStruct. Not sure how much effort that is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to

This is going to be a lot of logging if the user ingests a lot of data like this. Maybe not too bad after compression though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Now using ThreadLocal to collect invalid fields and log once per query

@ahkcs ahkcs changed the title Error handling for ArrayIndexOutOfBoundsException Error handling for ArrayIndexOutOfBoundsException with corrupt data Dec 3, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (1)

428-437: Add JavaDoc for validation behavior.

The new validation logic in the JsonPath constructor should be documented with JavaDoc to explain when IllegalArgumentException is thrown and why, especially since this is a critical part of the fix.

Apply this diff:

+  /**
+   * Constructs a JsonPath by splitting the raw path on dots.
+   *
+   * @param rawPath The raw field path string
+   * @throws IllegalArgumentException if the field name consists only of dots
+   */
   public JsonPath(String rawPath) {
     String[] parts = rawPath.split("\\.");
     // Handle edge case where split returns empty array (e.g., rawPath = "." or "..")
     if (parts.length == 0) {
       throw new IllegalArgumentException(
           String.format(
               "Invalid field name '%s': field names cannot consist only of dots", rawPath));
     }
     this.paths = List.of(parts);
   }

Based on coding guidelines: "All public classes and methods must have proper JavaDoc"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52fe8aa and bca5a72.

📒 Files selected for processing (2)
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
🔇 Additional comments (6)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (2)

36-48: LGTM!

The test correctly sets up the problematic scenario by indexing a document with a dot-only field name (".") inside a disabled object field, which bypasses OpenSearch's normal field name validation.


19-20: Verify necessity of Calcite plugin setting.

The test enables plugins.calcite.enabled: true, but the fix targets field name validation in OpenSearchExprValueFactory, not query parsing or Calcite-specific functionality. This setting may be unnecessary for this test.

Run the following script to check if this setting is required for PPL queries or if it's specific to this issue:

#!/bin/bash
# Description: Check if Calcite plugin is required for PPL queries or this specific fix

# Search for other PPL tests to see if they use this setting
rg -n "plugins.calcite.enabled" --type=yaml -C 3

# Search for references to Calcite in the OpenSearchExprValueFactory changes
rg -n "calcite|Calcite" opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (4)

41-42: LGTM!

Logger setup follows standard Log4j2 patterns and naming conventions (UPPER_SNAKE_CASE for constant).

Also applies to: 77-77


388-397: Clarify exception handling approach and verify against PR objectives.

The implementation catches IllegalArgumentException, logs a warning, and returns null for invalid fields. The PR objectives state the plugin should "return a clear, user-facing validation error instead of an internal exception." The current approach returns null silently (only logged), which may not surface the validation issue to users in the query response.

Additionally, the warning message refers to "corrupt data," but invalid field names are not corrupted data—they are field names that OpenSearch allowed despite disabled object field validation.

Verify:

  1. Whether the null-return approach aligns with the intended user experience, or if validation errors should propagate to the query response as specified in issue #4896
  2. Accuracy of the warning message: consider "Field '{}' has an invalid name (field names cannot consist only of dots) and will return null: {}"

429-436: Verify downstream behavior of empty strings in path array.

The validation correctly identifies that parts.length == 0 will never occur with Java's default split("\\.") behavior—for inputs like "." or "..", the result is [""] or ["", ""], not an empty array. However, this means the current check cannot catch dot-only field names.

The actual issue is empty strings within the array. Confirm whether line 444's code (e.g., paths.getFirst()) fails with empty string values and whether the suggested fix to check (parts.length == 1 && parts[0].isEmpty()) properly handles all dot-only edge cases like ".", "..", and "...".


392-395: Parameterized logging in Log4j2 is safe from log injection attacks; no changes needed.

Log4j2's parameterized logging (compile-time constant format string with parameters) prevents log injection by separating the format from user data, preventing format-string attacks. The code correctly uses this pattern: LOG.warn("... {}", fullFieldPath, e.getMessage()). This is the recommended approach and does not require escaping.

While field names may contain control characters or special characters, Log4j2 parameterized logging handles them safely. No additional sanitization is required for this logging statement.

@Swiddis Swiddis added bugFix PPL Piped processing language labels Dec 3, 2025
@ahkcs ahkcs requested a review from RyanL1997 December 3, 2025 18:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (2)

81-86: ThreadLocal memory leak risk in long-lived threads.

ThreadLocal values can cause memory leaks if not properly cleaned up, especially in thread pools where threads are reused. While the finally block in construct() clears the set, if an exception occurs before construct() is called or in a different code path that creates threads, the set may persist.

Consider using try-finally with remove() instead of just clear() to fully release the ThreadLocal entry:

     } finally {
-      INVALID_FIELDS.get().clear();
+      INVALID_FIELDS.remove();
     }

Using remove() instead of clear() ensures the ThreadLocal entry is fully released from the thread's map, preventing potential memory leaks in thread pool environments.


197-206: Consider sanitizing field names before logging.

Per coding guidelines, data should be sanitized before logging to prevent injection attacks. The invalidFields set contains user-controlled field names from indexed documents that are directly interpolated into the log message. While Log4j2 in OpenSearch is typically patched against known injection attacks, sanitizing untrusted input is a defense-in-depth measure.

Consider limiting the size of logged field names or escaping special characters:

   private void logInvalidFields() {
     Set<String> invalidFields = INVALID_FIELDS.get();
     if (!invalidFields.isEmpty()) {
+      // Limit field name length to prevent log flooding with extremely long names
+      Set<String> sanitized = invalidFields.stream()
+          .map(f -> f.length() > 100 ? f.substring(0, 100) + "..." : f)
+          .collect(java.util.stream.Collectors.toSet());
       LOG.warn(
           "The following field(s) have invalid names and will return null: {}. "
               + "This can happen with disabled object fields that bypass field name validation.",
-          invalidFields);
+          sanitized);
     }
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bca5a72 and 8d8103f.

📒 Files selected for processing (2)
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (6)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (4)

179-195: LGTM! Good approach to address log spam concerns.

The pattern of clearing at start, collecting during parsing, logging once at end, and cleaning up in finally effectively addresses the past review feedback about excessive logging. This ensures invalid fields are logged once per construct() call rather than per field.


406-422: Good fix for the exception masking concern.

This implementation correctly addresses the past review feedback about catching all IllegalArgumentExceptions. By validating field names with isInvalidFieldName() before creating the JsonPath, the code avoids catching exceptions from unrelated issues while still handling the specific dot-only field name problem gracefully.


464-472: LGTM! Good defensive validation.

The constructor validation provides defense-in-depth. Even though parseStruct now filters invalid field names before creating JsonPath, this constructor validation ensures any other code paths that might create JsonPath directly are also protected. The error message is clear and meaningful per coding guidelines.


426-436: The empty string check is likely unnecessary due to OpenSearch SQL identifier rules.

The original isInvalidFieldName() method correctly identifies dot-only field names that cause String.split("\\.") to return an empty array. However, the suggested empty string check is defensive rather than necessary: OpenSearch SQL grammar requires field identifiers to either follow standard naming rules (starting with a letter, then alphanumeric/underscore) or use delimited identifiers with backticks. Empty strings cannot be valid field names and would be rejected during parsing before reaching this validation method.

The current implementation is sufficient for its intended purpose of catching dot-only field names.

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (2)

86-90: LGTM! Test correctly verifies the fix behavior.

The test appropriately validates that:

  1. Queries with valid fields succeed (lines 74-77)
  2. Querying the log field doesn't crash and returns an empty object because the invalid "." subfield becomes null and null values are omitted from JSON serialization (lines 88-90)

This addresses the past review feedback about asserting the null/empty behavior for invalid fields.


1-14: Well-documented test header.

The comment block provides excellent context explaining the root cause, when this scenario occurs (disabled object fields), and the expected behavior after the fix. This documentation will help future maintainers understand why this test exists.

"Field '{}' has invalid name and will return null: {}",
fullFieldPath,
e.getMessage());
result.tupleValue().put(fieldKey, ExprNullValue.of());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document this somewhere. I see Spark CSV reader for example, actually has PERMISSIVE, DROPMALFORMED and FAILFAST mode to let user decide. Ref: https://spark.apache.org/docs/latest/sql-data-sources-csv.html

// Check for invalid field names (e.g., fields consisting only of dots like "." or
// "..")
// before creating JsonPath to avoid masking other IllegalArgumentExceptions
if (isInvalidFieldName(fieldKey)) {
Copy link
Collaborator

@dai-chen dai-chen Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this and logging logic to higher level? I thought this should be general handling for any malformed field we cannot process? Ref: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala#L62

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

3 questions

  1. Why is defintion of corrupted dataset? The example in PR description is valid json.
  2. How should the query engine handle corrupted data? Should it skip the malformed record, replace it with null, or use a user-defined placeholder such as "CORRUPTED"?
  3. How can error details be surfaced to users who do not have permission to access server-side logs?

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update UT OpenSearchExprValueFactoryTest.

RyanL1997
RyanL1997 previously approved these changes Dec 3, 2025
Copy link
Collaborator

@RyanL1997 RyanL1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Please also address Chen's comments.

// Check for invalid field names (e.g., fields consisting only of dots like "." or
// "..")
// before creating JsonPath to avoid masking other IllegalArgumentExceptions
if (isInvalidFieldName(fieldKey)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

3 questions

  1. Why is defintion of corrupted dataset? The example in PR description is valid json.
  2. How should the query engine handle corrupted data? Should it skip the malformed record, replace it with null, or use a user-defined placeholder such as "CORRUPTED"?
  3. How can error details be surfaced to users who do not have permission to access server-side logs?

}

@Test
public void constructWithMultipleDotsFieldNameReturnsValue() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np: Could you verify what if the malformed field is not on top level? E.g., a...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to have more comprehensive test cases for both UT and yaml test

@ahkcs
Copy link
Contributor Author

ahkcs commented Dec 3, 2025

Updated PR to treat dot-containing field names as literal field names (returning actual data) instead of treating them as invalid.

@ahkcs ahkcs changed the title Error handling for ArrayIndexOutOfBoundsException with corrupt data Error handling for ArrayIndexOutOfBoundsException with malformed data Dec 3, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (1)

435-441: Consider adding defensive validation in fromPath().

path.split("\\.") returns an empty array when path consists only of dots (e.g., ., ..). While current production code only uses literal() for JSON field names, fromPath() is a public API that could fail unexpectedly.

     public static JsonPath fromPath(String path) {
-      return new JsonPath(List.of(path.split("\\.")));
+      String[] parts = path.split("\\.");
+      if (parts.length == 0) {
+        // Treat dot-only paths as literal field names
+        return new JsonPath(List.of(path));
+      }
+      return new JsonPath(List.of(parts));
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c1eabb and 7d0b0c3.

📒 Files selected for processing (3)
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (1 hunks)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (2 hunks)
  • opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
  • opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
🧠 Learnings (6)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Validate all user inputs, especially queries

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Maintain Java 11 compatibility when possible for OpenSearch 2.x

Applied to files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Include edge cases and boundary conditions in PPL parser tests

Applied to files:

  • opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
🔇 Additional comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (2)

379-386: Good fix for the root cause.

Using JsonPath.literal() for JSON object keys correctly treats field names like ., .., a... as literal field names rather than path separators. This directly addresses the ArrayIndexOutOfBoundsException from the linked issue.


416-441: Well-designed factory method pattern with clear documentation.

The separation between literal() for JSON field names and fromPath() for nested paths is clean and the JavaDoc clearly explains when to use each method.

opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java (2)

1001-1028: Test correctly updated to use the new API.

The test correctly uses JsonPath.fromPath() for dot-separated nested paths like "log.json.time", which aligns with the intended usage of the new factory method.


1044-1102: Comprehensive test coverage for dot-containing field names.

The new tests effectively cover the edge cases from the linked issue:

  • Single dot (.) and multiple dots (..)
  • Dots in various positions (leading, trailing, middle)
  • Mixed scenarios with valid fields alongside dot fields

These tests verify that values are correctly retrieved using the original field name, confirming the fix works as intended.

Signed-off-by: Kai Huang <[email protected]>
@ahkcs ahkcs changed the title Error handling for ArrayIndexOutOfBoundsException with malformed data Fix parsing of dot-containing field names Dec 3, 2025
Signed-off-by: Kai Huang <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (1)

445-461: The parts.length == 0 check is unreachable dead code.

With split("\\.", -1), an empty array is never returned for any non-empty string input. For example:

  • ".".split("\\.", -1)["", ""] (length 2)
  • "..".split("\\.", -1)["", "", ""] (length 3)

The empty-string loop check on lines 455-458 correctly handles these cases, making line 452-454 unreachable.

Consider removing the dead check or updating the comment:

-      // Handle edge cases where splitting would produce problematic results:
-      // 1. Empty array (dot-only field names like "." or "..")
-      // 2. Array with empty strings (e.g., ".a" -> ["", "a"], "a..." -> ["a", "", "", ""])
+      // Handle edge cases where splitting produces empty strings:
+      // e.g., "." -> ["", ""], ".a" -> ["", "a"], "a..." -> ["a", "", "", ""]
       // In these cases, treat the original string as a literal field name
-      if (parts.length == 0) {
-        return new JsonPath(List.of(path));
-      }
       for (String part : parts) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dd6f41 and b4dd36f.

📒 Files selected for processing (1)
  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
🧠 Learnings (1)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Validate all user inputs, especially queries

Applied to files:

  • opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (3)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (3)

379-387: LGTM! The fix correctly addresses the parsing issue.

The comment clearly explains the rationale for using fromPath(), and the implementation correctly separates path construction (for nested structure building) from the raw key (for type mapping lookup).


417-427: Well-designed factory method with clear documentation.

The literal() method provides a clean API for treating field names as literal values without path interpretation. The JavaDoc clearly explains the intended use case.


463-465: Good encapsulation by making the constructor private.

This ensures callers use the semantically clear factory methods (literal() or fromPath()) rather than directly constructing JsonPath objects.

*/
public static JsonPath fromPath(String path) {
// Use -1 limit to preserve trailing empty strings (e.g., "a..." -> ["a", "", "", ""])
String[] parts = path.split("\\.", -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Is this the right place for this logic?

As a wider design point (not just this PR): We've had bugs in different features that boil down to people rolling their own logic for field parsing. Spath was a victim too, and I've called out similar cases in past PRs.

Ideally we should be trying to move this logic into a central module so every part of the code is using the same logic with appropriate edge-case handling. I know QualifiedName defines DELIMITER = "." in a full constant and contains some logic like this, while other places still do these kinds of array operations directly.

Especially since there's more-or-less an implicit assumption in many code paths that . defines nesting, I'm expecting this isn't the last we'll see of this issue. Having the code be reusable for later seems like a good investment.

Swiddis
Swiddis previously approved these changes Dec 3, 2025
Copy link
Collaborator

@Swiddis Swiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm functionally, consider following up with docs

type: text

# Document 1: Single dot field name "."
- do:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change L35-L112 single doc index to bulk index

  - do:
      bulk:
        index: test
        refresh: true
        body:
          - '{"index": {}}'
          - '{"id": 1}'
          - '{"index": {}}'
          - '{"id": 2}'
          - '{"index": {}}'
          - '{"id": 3}'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Content-Type: 'application/json'
ppl:
body:
query: source=test_disabled_object_4896 | where message = 'leading dot' | fields log
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is results if using fields log.a? should be log.a not exist, right?, user should use fields log.'.a'

Copy link
Contributor Author

@ahkcs ahkcs Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have added test cases to verify this behavior
fields log.a → returns null
fields log.'.a'→ returns"leading dot value"

Signed-off-by: Kai Huang <[email protected]>
@ahkcs ahkcs closed this Dec 4, 2025
@ahkcs ahkcs deleted the fix/fidelity_data branch December 4, 2025 17:58
@ahkcs
Copy link
Contributor Author

ahkcs commented Dec 4, 2025

Moved to #4907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugFix PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] ArrayIndexOutOfBoundsException when querying index with disabled objects containing dot-only field names

5 participants