-
Notifications
You must be signed in to change notification settings - Fork 180
Fix parsing of dot-containing field names #4897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughFixes JsonPath parsing to preserve dot-only and dot-containing field names as literal path elements; adds static factory methods and tests, and an integration test exercising documents with such field names inside a disabled object mapping. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Comment |
| // This can happen with disabled object fields that bypass field name validation | ||
| // Log warning to hint users about corrupt data | ||
| LOG.warn( | ||
| "Field '{}' has invalid name and will return null: {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a lot of logging if the user ingests a lot of data like this. Maybe not too bad after compression though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should only log one message like this per query (e.g. listing every field in one shot), instead of doing it for every call to parseStruct. Not sure how much effort that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to
This is going to be a lot of logging if the user ingests a lot of data like this. Maybe not too bad after compression though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Now using ThreadLocal to collect invalid fields and log once per query
ArrayIndexOutOfBoundsExceptionArrayIndexOutOfBoundsException with corrupt data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (1)
428-437: Add JavaDoc for validation behavior.The new validation logic in the
JsonPathconstructor should be documented with JavaDoc to explain whenIllegalArgumentExceptionis thrown and why, especially since this is a critical part of the fix.Apply this diff:
+ /** + * Constructs a JsonPath by splitting the raw path on dots. + * + * @param rawPath The raw field path string + * @throws IllegalArgumentException if the field name consists only of dots + */ public JsonPath(String rawPath) { String[] parts = rawPath.split("\\."); // Handle edge case where split returns empty array (e.g., rawPath = "." or "..") if (parts.length == 0) { throw new IllegalArgumentException( String.format( "Invalid field name '%s': field names cannot consist only of dots", rawPath)); } this.paths = List.of(parts); }Based on coding guidelines: "All public classes and methods must have proper JavaDoc"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
🔇 Additional comments (6)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (2)
36-48: LGTM!The test correctly sets up the problematic scenario by indexing a document with a dot-only field name (
".") inside a disabled object field, which bypasses OpenSearch's normal field name validation.
19-20: Verify necessity of Calcite plugin setting.The test enables
plugins.calcite.enabled: true, but the fix targets field name validation inOpenSearchExprValueFactory, not query parsing or Calcite-specific functionality. This setting may be unnecessary for this test.Run the following script to check if this setting is required for PPL queries or if it's specific to this issue:
#!/bin/bash # Description: Check if Calcite plugin is required for PPL queries or this specific fix # Search for other PPL tests to see if they use this setting rg -n "plugins.calcite.enabled" --type=yaml -C 3 # Search for references to Calcite in the OpenSearchExprValueFactory changes rg -n "calcite|Calcite" opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.javaopensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (4)
41-42: LGTM!Logger setup follows standard Log4j2 patterns and naming conventions (UPPER_SNAKE_CASE for constant).
Also applies to: 77-77
388-397: Clarify exception handling approach and verify against PR objectives.The implementation catches
IllegalArgumentException, logs a warning, and returns null for invalid fields. The PR objectives state the plugin should "return a clear, user-facing validation error instead of an internal exception." The current approach returns null silently (only logged), which may not surface the validation issue to users in the query response.Additionally, the warning message refers to "corrupt data," but invalid field names are not corrupted data—they are field names that OpenSearch allowed despite disabled object field validation.
Verify:
- Whether the null-return approach aligns with the intended user experience, or if validation errors should propagate to the query response as specified in issue #4896
- Accuracy of the warning message: consider
"Field '{}' has an invalid name (field names cannot consist only of dots) and will return null: {}"
429-436: Verify downstream behavior of empty strings in path array.The validation correctly identifies that
parts.length == 0will never occur with Java's defaultsplit("\\.")behavior—for inputs like"."or"..", the result is[""]or["", ""], not an empty array. However, this means the current check cannot catch dot-only field names.The actual issue is empty strings within the array. Confirm whether line 444's code (e.g.,
paths.getFirst()) fails with empty string values and whether the suggested fix to check(parts.length == 1 && parts[0].isEmpty())properly handles all dot-only edge cases like".","..", and"...".
392-395: Parameterized logging in Log4j2 is safe from log injection attacks; no changes needed.Log4j2's parameterized logging (compile-time constant format string with parameters) prevents log injection by separating the format from user data, preventing format-string attacks. The code correctly uses this pattern:
LOG.warn("... {}", fullFieldPath, e.getMessage()). This is the recommended approach and does not require escaping.While field names may contain control characters or special characters, Log4j2 parameterized logging handles them safely. No additional sanitization is required for this logging statement.
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (2)
81-86: ThreadLocal memory leak risk in long-lived threads.
ThreadLocalvalues can cause memory leaks if not properly cleaned up, especially in thread pools where threads are reused. While thefinallyblock inconstruct()clears the set, if an exception occurs beforeconstruct()is called or in a different code path that creates threads, the set may persist.Consider using
try-finallywithremove()instead of justclear()to fully release the ThreadLocal entry:} finally { - INVALID_FIELDS.get().clear(); + INVALID_FIELDS.remove(); }Using
remove()instead ofclear()ensures the ThreadLocal entry is fully released from the thread's map, preventing potential memory leaks in thread pool environments.
197-206: Consider sanitizing field names before logging.Per coding guidelines, data should be sanitized before logging to prevent injection attacks. The
invalidFieldsset contains user-controlled field names from indexed documents that are directly interpolated into the log message. While Log4j2 in OpenSearch is typically patched against known injection attacks, sanitizing untrusted input is a defense-in-depth measure.Consider limiting the size of logged field names or escaping special characters:
private void logInvalidFields() { Set<String> invalidFields = INVALID_FIELDS.get(); if (!invalidFields.isEmpty()) { + // Limit field name length to prevent log flooding with extremely long names + Set<String> sanitized = invalidFields.stream() + .map(f -> f.length() > 100 ? f.substring(0, 100) + "..." : f) + .collect(java.util.stream.Collectors.toSet()); LOG.warn( "The following field(s) have invalid names and will return null: {}. " + "This can happen with disabled object fields that bypass field name validation.", - invalidFields); + sanitized); } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (6)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (4)
179-195: LGTM! Good approach to address log spam concerns.The pattern of clearing at start, collecting during parsing, logging once at end, and cleaning up in
finallyeffectively addresses the past review feedback about excessive logging. This ensures invalid fields are logged once perconstruct()call rather than per field.
406-422: Good fix for the exception masking concern.This implementation correctly addresses the past review feedback about catching all
IllegalArgumentExceptions. By validating field names withisInvalidFieldName()before creating theJsonPath, the code avoids catching exceptions from unrelated issues while still handling the specific dot-only field name problem gracefully.
464-472: LGTM! Good defensive validation.The constructor validation provides defense-in-depth. Even though
parseStructnow filters invalid field names before creatingJsonPath, this constructor validation ensures any other code paths that might createJsonPathdirectly are also protected. The error message is clear and meaningful per coding guidelines.
426-436: The empty string check is likely unnecessary due to OpenSearch SQL identifier rules.The original
isInvalidFieldName()method correctly identifies dot-only field names that causeString.split("\\.")to return an empty array. However, the suggested empty string check is defensive rather than necessary: OpenSearch SQL grammar requires field identifiers to either follow standard naming rules (starting with a letter, then alphanumeric/underscore) or use delimited identifiers with backticks. Empty strings cannot be valid field names and would be rejected during parsing before reaching this validation method.The current implementation is sufficient for its intended purpose of catching dot-only field names.
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (2)
86-90: LGTM! Test correctly verifies the fix behavior.The test appropriately validates that:
- Queries with valid fields succeed (lines 74-77)
- Querying the log field doesn't crash and returns an empty object because the invalid "." subfield becomes null and null values are omitted from JSON serialization (lines 88-90)
This addresses the past review feedback about asserting the null/empty behavior for invalid fields.
1-14: Well-documented test header.The comment block provides excellent context explaining the root cause, when this scenario occurs (disabled object fields), and the expected behavior after the fix. This documentation will help future maintainers understand why this test exists.
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml
Outdated
Show resolved
Hide resolved
| "Field '{}' has invalid name and will return null: {}", | ||
| fullFieldPath, | ||
| e.getMessage()); | ||
| result.tupleValue().put(fieldKey, ExprNullValue.of()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should document this somewhere. I see Spark CSV reader for example, actually has PERMISSIVE, DROPMALFORMED and FAILFAST mode to let user decide. Ref: https://spark.apache.org/docs/latest/sql-data-sources-csv.html
| // Check for invalid field names (e.g., fields consisting only of dots like "." or | ||
| // "..") | ||
| // before creating JsonPath to avoid masking other IllegalArgumentExceptions | ||
| if (isInvalidFieldName(fieldKey)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this and logging logic to higher level? I thought this should be general handling for any malformed field we cannot process? Ref: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala#L62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
3 questions
- Why is defintion of corrupted dataset? The example in PR description is valid json.
- How should the query engine handle corrupted data? Should it skip the malformed record, replace it with null, or use a user-defined placeholder such as "CORRUPTED"?
- How can error details be surfaced to users who do not have permission to access server-side logs?
dai-chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update UT OpenSearchExprValueFactoryTest.
RyanL1997
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Please also address Chen's comments.
| // Check for invalid field names (e.g., fields consisting only of dots like "." or | ||
| // "..") | ||
| // before creating JsonPath to avoid masking other IllegalArgumentExceptions | ||
| if (isInvalidFieldName(fieldKey)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
3 questions
- Why is defintion of corrupted dataset? The example in PR description is valid json.
- How should the query engine handle corrupted data? Should it skip the malformed record, replace it with null, or use a user-defined placeholder such as "CORRUPTED"?
- How can error details be surfaced to users who do not have permission to access server-side logs?
Signed-off-by: Kai Huang <[email protected]>
8d8103f to
2c1eabb
Compare
| } | ||
|
|
||
| @Test | ||
| public void constructWithMultipleDotsFieldNameReturnsValue() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np: Could you verify what if the malformed field is not on top level? E.g., a...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to have more comprehensive test cases for both UT and yaml test
|
Updated PR to treat dot-containing field names as literal field names (returning actual data) instead of treating them as invalid. |
ArrayIndexOutOfBoundsException with corrupt dataArrayIndexOutOfBoundsException with malformed data
Signed-off-by: Kai Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (1)
435-441: Consider adding defensive validation infromPath().
path.split("\\.")returns an empty array whenpathconsists only of dots (e.g.,.,..). While current production code only usesliteral()for JSON field names,fromPath()is a public API that could fail unexpectedly.public static JsonPath fromPath(String path) { - return new JsonPath(List.of(path.split("\\."))); + String[] parts = path.split("\\."); + if (parts.length == 0) { + // Treat dot-only paths as literal field names + return new JsonPath(List.of(path)); + } + return new JsonPath(List.of(parts)); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java(2 hunks)opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.javaopensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
**/*Test.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*Test.java: All new business logic requires unit tests
Name unit tests with*Test.javasuffix in OpenSearch SQL
Files:
opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
🧠 Learnings (6)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Validate all user inputs, especially queries
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL
Applied to files:
opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Maintain Java 11 compatibility when possible for OpenSearch 2.x
Applied to files:
opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Include edge cases and boundary conditions in PPL parser tests
Applied to files:
opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, integration)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
🔇 Additional comments (4)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (2)
379-386: Good fix for the root cause.Using
JsonPath.literal()for JSON object keys correctly treats field names like.,..,a...as literal field names rather than path separators. This directly addresses theArrayIndexOutOfBoundsExceptionfrom the linked issue.
416-441: Well-designed factory method pattern with clear documentation.The separation between
literal()for JSON field names andfromPath()for nested paths is clean and the JavaDoc clearly explains when to use each method.opensearch/src/test/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactoryTest.java (2)
1001-1028: Test correctly updated to use the new API.The test correctly uses
JsonPath.fromPath()for dot-separated nested paths like"log.json.time", which aligns with the intended usage of the new factory method.
1044-1102: Comprehensive test coverage for dot-containing field names.The new tests effectively cover the edge cases from the linked issue:
- Single dot (
.) and multiple dots (..)- Dots in various positions (leading, trailing, middle)
- Mixed scenarios with valid fields alongside dot fields
These tests verify that values are correctly retrieved using the original field name, confirming the fix works as intended.
Signed-off-by: Kai Huang <[email protected]>
ArrayIndexOutOfBoundsException with malformed dataSigned-off-by: Kai Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (1)
445-461: Theparts.length == 0check is unreachable dead code.With
split("\\.", -1), an empty array is never returned for any non-empty string input. For example:
".".split("\\.", -1)→["", ""](length 2)"..".split("\\.", -1)→["", "", ""](length 3)The empty-string loop check on lines 455-458 correctly handles these cases, making line 452-454 unreachable.
Consider removing the dead check or updating the comment:
- // Handle edge cases where splitting would produce problematic results: - // 1. Empty array (dot-only field names like "." or "..") - // 2. Array with empty strings (e.g., ".a" -> ["", "a"], "a..." -> ["a", "", "", ""]) + // Handle edge cases where splitting produces empty strings: + // e.g., "." -> ["", ""], ".a" -> ["", "a"], "a..." -> ["a", "", "", ""] // In these cases, treat the original string as a literal field name - if (parts.length == 0) { - return new JsonPath(List.of(path)); - } for (String part : parts) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
🧠 Learnings (1)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Validate all user inputs, especially queries
Applied to files:
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (3)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (3)
379-387: LGTM! The fix correctly addresses the parsing issue.The comment clearly explains the rationale for using
fromPath(), and the implementation correctly separates path construction (for nested structure building) from the raw key (for type mapping lookup).
417-427: Well-designed factory method with clear documentation.The
literal()method provides a clean API for treating field names as literal values without path interpretation. The JavaDoc clearly explains the intended use case.
463-465: Good encapsulation by making the constructor private.This ensures callers use the semantically clear factory methods (
literal()orfromPath()) rather than directly constructingJsonPathobjects.
| */ | ||
| public static JsonPath fromPath(String path) { | ||
| // Use -1 limit to preserve trailing empty strings (e.g., "a..." -> ["a", "", "", ""]) | ||
| String[] parts = path.split("\\.", -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Is this the right place for this logic?
As a wider design point (not just this PR): We've had bugs in different features that boil down to people rolling their own logic for field parsing. Spath was a victim too, and I've called out similar cases in past PRs.
Ideally we should be trying to move this logic into a central module so every part of the code is using the same logic with appropriate edge-case handling. I know QualifiedName defines DELIMITER = "." in a full constant and contains some logic like this, while other places still do these kinds of array operations directly.
Especially since there's more-or-less an implicit assumption in many code paths that . defines nesting, I'm expecting this isn't the last we'll see of this issue. Having the code be reusable for later seems like a good investment.
Swiddis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm functionally, consider following up with docs
| type: text | ||
|
|
||
| # Document 1: Single dot field name "." | ||
| - do: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change L35-L112 single doc index to bulk index
- do:
bulk:
index: test
refresh: true
body:
- '{"index": {}}'
- '{"id": 1}'
- '{"index": {}}'
- '{"id": 2}'
- '{"index": {}}'
- '{"id": 3}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
| Content-Type: 'application/json' | ||
| ppl: | ||
| body: | ||
| query: source=test_disabled_object_4896 | where message = 'leading dot' | fields log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is results if using fields log.a? should be log.a not exist, right?, user should use fields log.'.a'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have added test cases to verify this behavior
fields log.a → returns null
fields log.'.a'→ returns"leading dot value"
Signed-off-by: Kai Huang <[email protected]>
|
Moved to #4907 |
Description
Resolves #4896 —
ArrayIndexOutOfBoundsExceptionwhen querying an index containing dot-only field names (e.g.,".","..") inside disabled object fields.Root Cause
In
OpenSearchExprValueFactory.JsonPath, the constructor uses:For field names that consist only of dots (e.g.,
".",".."),split("\\.")returns an empty array[].Subsequent access to
paths.get(0)results inArrayIndexOutOfBoundsException.This occurs frequently when an index contains disabled objects (
"enabled": false), because disabled objects bypass field name validation and allow invalid field names such as".".Fix
If
split("\\.")returns an empty array, treat the entire raw path as a literal field name instead of a parsed path:Result
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.