-
Notifications
You must be signed in to change notification settings - Fork 180
Error handling for dot-containing field names #4907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
WalkthroughAdds validation and handling for malformed dot-containing field names during struct parsing in the SQL OpenSearch value factory, treats malformed fields as null, adds unit and integration tests covering dot-only and other malformed names, and documents the behavior for disabled object fields. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📓 Common learnings⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
🔇 Additional comments (1)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (1)
60-90: LGTM: Test validates the fix comprehensively.The test properly verifies that:
- Queries no longer crash with ArrayIndexOutOfBoundsException
- Valid fields return correct values
- Invalid dot-only fields are handled gracefully (return empty object)
The test comments clearly explain the expected behavior, including JSON serialization of null values.
Consider adding test cases for additional dot-only patterns like ".." or "..." to ensure comprehensive coverage of edge cases, though the current test is sufficient to validate the core fix.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml(1 hunks)opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
🧬 Code graph analysis (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (5)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java (1)
Getter(65-442)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)
Getter(80-531)opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java (2)
Getter(921-1070)Getter(1072-1100)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownContext.java (1)
Getter(20-170)opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RelJsonSerializer.java (1)
Getter(41-137)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (10)
opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java (7)
35-35: LGTM: Logging infrastructure imports and setup are correct.The added imports and Logger instance follow standard Log4j2 patterns and are necessary for the invalid field tracking feature.
Also applies to: 39-39, 43-44, 79-79
81-86: LGTM: ThreadLocal usage is appropriate for thread-safe invalid field tracking.The ThreadLocal pattern correctly avoids log spam by collecting invalid fields per thread and logging them once per construct() call. The JavaDoc clearly explains the purpose.
180-195: LGTM: ThreadLocal cleanup is properly handled.The construct() method correctly initializes the ThreadLocal at the start and cleans it up in the finally block, ensuring no memory leaks. The logInvalidFields() call is appropriately placed before cleanup.
197-206: LGTM: Invalid field logging is clear and helpful.The logInvalidFields() method appropriately uses warning level and provides a clear message explaining why fields are invalid, which aids debugging.
426-436: LGTM: Invalid field name detection is correct.The isInvalidFieldName() method correctly identifies dot-only field names using the same split logic as JsonPath, ensuring consistency.
406-422: LGTM: Invalid field handling in parseStruct() is correct and defensive.The code properly checks for invalid field names before creating JsonPath, avoiding the IllegalArgumentException. Invalid fields are gracefully handled by substituting null and collecting them for logging. The two-layered approach (early detection + JsonPath validation) provides good defense in depth.
464-472: LGTM: JsonPath constructor validation provides good defense in depth.The constructor now validates that the split result is not empty and throws a clear IllegalArgumentException if it is. This acts as a safety net, though with the parseStruct() changes, this exception should not be reached during normal operation. The validation ensures the constructor's preconditions are met.
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4896.yml (3)
1-14: LGTM: Excellent test documentation.The header comments clearly explain the issue, root cause, and fix, which will help future maintainers understand this test case.
15-48: LGTM: Test setup correctly reproduces the issue.The setup properly enables Calcite, creates an index with a disabled object field (which bypasses field name validation), and indexes a document with a dot-only field name that would trigger the original bug.
49-59: LGTM: Teardown properly cleans up test resources.The teardown correctly disables Calcite and deletes the test index, ensuring no side effects for subsequent tests.
...earch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
|
|
||
| OpenSearch normally rejects field names containing problematic dot patterns (such as ``.``, ``..``, ``.a``, ``a.``, or ``a..b``). However, when an object field has ``enabled: false``, OpenSearch bypasses field name validation and allows storing documents with any field names. | ||
|
|
||
| If a document contains malformed field names inside a disabled object field, PPL queries will return ``null`` for those specific fields. Other valid fields in the document are returned normally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabled object field,
remove disabled.
PPL queries will return
nullfor those specific fields.
PPL ignore malformed fieldname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
| } | ||
| } | ||
| When ``log`` is a disabled object field (``enabled: false``), all subfields with malformed names will resolve to ``null``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
malformed names field are ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Signed-off-by: Kai Huang <[email protected]>
(cherry picked from commit 8126367) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 8126367) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Asif Bashar <[email protected]>
Signed-off-by: Asif Bashar <[email protected]>
Summary
Resolves #4896 —
ArrayIndexOutOfBoundsExceptionwhen querying an index containing malformed field names (e.g.,".","..",".a","a.","a..b") inside disabled object fields.Disabled objects (
"enabled": false) bypass field-name validation, allowing malformed names to be indexed and subsequently causing crashes in the SQL/PPL engines.Root Cause
OpenSearchExprValueFactory.JsonPathconstructs field paths using:For malformed field names,
split("\\.")behaves unexpectedly:Summary by CodeRabbit
Release Notes
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.