fix: Query rewriter truncate#27115
Conversation
Reviewer's GuideAdds a JsonParseSafetyWrapper utility and wires it into QueryRewriter so that any json_parse() introduced by rewrites is post-processed and wrapped in TRY(), using a regex- and parenthesis-aware string pass that avoids SQL parsing errors on malformed JSON, with comprehensive unit tests for edge cases. Sequence diagram for query rewrite with JsonParseSafetyWrappersequenceDiagram
actor Verifier
participant QueryRewriter
participant FunctionCallRewriter
participant SqlParser
participant JsonParseSafetyWrapper
participant Logger
Verifier->>QueryRewriter: rewriteQuery(sql, queryConfiguration, clusterType, controlQuery)
QueryRewriter->>SqlParser: createStatement(sql, PARSING_OPTIONS)
SqlParser-->>QueryRewriter: Query
QueryRewriter->>FunctionCallRewriter: rewrite(Query)
FunctionCallRewriter-->>QueryRewriter: RewriterResult
QueryRewriter->>QueryRewriter: extract rewritten Query
QueryRewriter->>QueryRewriter: applyJsonParseSafetyWrapper(Query)
activate QueryRewriter
QueryRewriter->>SqlParser: formatSql(Query)
SqlParser-->>QueryRewriter: formattedSql
QueryRewriter->>JsonParseSafetyWrapper: wrapUnsafeJsonParse(formattedSql)
JsonParseSafetyWrapper-->>QueryRewriter: fixedSql
alt fixedSql differs from formattedSql
QueryRewriter->>SqlParser: createStatement(fixedSql, PARSING_OPTIONS)
SqlParser-->>QueryRewriter: QueryWithTryWrappedJsonParse
QueryRewriter-->>QueryRewriter: return QueryWithTryWrappedJsonParse
else fixedSql equals formattedSql
QueryRewriter-->>QueryRewriter: return original Query
end
deactivate QueryRewriter
QueryRewriter-->>Verifier: QueryObjectBundle
Class diagram for JsonParseSafetyWrapper integration with QueryRewriterclassDiagram
class QueryRewriter {
- com.facebook.airlift.log.Logger log
- SqlParser sqlParser
- TypeManager typeManager
- BlockEncodingSerde blockEncodingSerde
- Optional~FunctionCallRewriter~ functionCallRewriter
+ QueryRewriter(SqlParser sqlParser, TypeManager typeManager, BlockEncodingSerde blockEncodingSerde, Optional~FunctionCallRewriter~ functionCallRewriter)
- Query applyJsonParseSafetyWrapper(Query query)
+ QueryObjectBundle rewriteQuery(String query, QueryConfiguration queryConfiguration, ClusterType clusterType)
+ QueryObjectBundle rewriteQuery(String query, QueryConfiguration queryConfiguration, ClusterType clusterType, boolean controlQuery)
}
class JsonParseSafetyWrapper {
- Logger log
- Pattern JSON_PARSE_PATTERN
+ String wrapUnsafeJsonParse(String sql)
- int findMatchingParen(String sql, int openParenPos)
- boolean isEscaped(String sql, int pos)
}
class SqlParser {
+ Statement createStatement(String sql, ParsingOptions parsingOptions)
}
class Logger {
+ void warn(String message, Object arg1)
+ void warn(Throwable throwable, String message)
+ void debug(String message)
}
QueryRewriter ..> JsonParseSafetyWrapper : uses
QueryRewriter ..> SqlParser : uses
QueryRewriter ..> Logger : uses
JsonParseSafetyWrapper ..> Logger : uses
JsonParseSafetyWrapper ..> Pattern : uses
JsonParseSafetyWrapper ..> Matcher : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- The
JSON_PARSE_PATTERNregex uses a variable-length lookbehind(?<![Tt][Rr][Yy]\(\s*), which is not allowed by Java's regex engine and will cause aPatternSyntaxExceptionat class initialization; consider removing the lookbehind and instead detecting precedingTRY((with optional whitespace) in the Java logic around each match. - The current approach for avoiding double-wrapping
TRY(json_parse(...))only looks immediately beforejson_parse; this can easily miss or mis-handle edge cases (e.g., nested function calls, unusual whitespace), so it may be more robust to explicitly inspect the characters before each match inwrapUnsafeJsonParseto determine whether it's already inside aTRY(rather than encoding this in the regex.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `JSON_PARSE_PATTERN` regex uses a variable-length lookbehind `(?<![Tt][Rr][Yy]\(\s*)`, which is not allowed by Java's regex engine and will cause a `PatternSyntaxException` at class initialization; consider removing the lookbehind and instead detecting preceding `TRY(` (with optional whitespace) in the Java logic around each match.
- The current approach for avoiding double-wrapping `TRY(json_parse(...))` only looks immediately before `json_parse`; this can easily miss or mis-handle edge cases (e.g., nested function calls, unusual whitespace), so it may be more robust to explicitly inspect the characters before each match in `wrapUnsafeJsonParse` to determine whether it's already inside a `TRY(` rather than encoding this in the regex.
## Individual Comments
### Comment 1
<location> `presto-verifier/src/main/java/com/facebook/presto/verifier/framework/JsonParseSafetyWrapper.java:40-41` </location>
<code_context>
+ // Pattern to match json_parse( that is NOT already wrapped in TRY(
+ // Uses negative lookbehind to avoid double-wrapping.
+ // Handles case-insensitive TRY with optional whitespace before json_parse.
+ private static final Pattern JSON_PARSE_PATTERN = Pattern.compile(
+ "(?<![Tt][Rr][Yy]\\(\\s*)\\b(json_parse)\\s*\\(",
+ Pattern.CASE_INSENSITIVE);
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The regex uses a variable-length lookbehind (`\s*`), which is not supported by Java regex and will throw a PatternSyntaxException at class load time.
Lookbehinds in Java must be fixed-length, so the `\s*` inside `(?<![Tt][Rr][Yy]\(\s*)` makes this pattern invalid. Instead of relying on this lookbehind, consider either (a) removing it and doing the `TRY(...)` detection procedurally in `wrapUnsafeJsonParse` (e.g., checking characters before each match), or (b) redesigning the regex so the lookbehind contains only fixed-length constructs.
</issue_to_address>
### Comment 2
<location> `presto-verifier/src/test/java/com/facebook/presto/verifier/framework/TestJsonParseSafetyWrapper.java:194-43` </location>
<code_context>
+ }
+
+ @Test
+ public void testMixedTryWrappedAndUnwrappedWithWhitespace()
+ {
+ String sql = "SELECT TRY( json_parse(a)), json_parse(b), try(json_parse(c)) FROM table1";
+ String expected = "SELECT TRY( json_parse(a)), TRY(json_parse(b)), try(json_parse(c)) FROM table1";
+ assertEquals(wrapUnsafeJsonParse(sql), expected);
+ }
+}
</code_context>
<issue_to_address>
**issue (testing):** Add a test where TRY and the opening parenthesis are separated by whitespace (e.g., `TRY (json_parse(...))`) to cover a likely real-world edge case.
The current cases only vary whitespace after the opening parenthesis, not between `TRY` and `(`. With the current regex, `TRY (json_parse(...))` is treated as unwrapped and will likely be rewritten as `TRY(TRY (json_parse(...)))`. A test for this pattern will both capture the current behavior and surface this potential bug.
</issue_to_address>
### Comment 3
<location> `presto-verifier/src/test/java/com/facebook/presto/verifier/framework/TestJsonParseSafetyWrapper.java:77-85` </location>
<code_context>
+ }
+
+ @Test
+ public void testNestedParentheses()
+ {
+ String sql = "SELECT json_parse(concat(a, b, (SELECT c FROM t))) FROM table1";
+ String expected = "SELECT TRY(json_parse(concat(a, b, (SELECT c FROM t)))) FROM table1";
+ assertEquals(wrapUnsafeJsonParse(sql), expected);
+ }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for malformed or unbalanced parentheses to ensure the wrapper is a no-op instead of partially rewriting the SQL.
Since `findMatchingParen` returning -1 logs a warning and leaves `json_parse` unchanged, please add tests with malformed SQL like `"SELECT json_parse(data"` and `"SELECT json_parse((data"` to confirm `wrapUnsafeJsonParse` returns the original string (or at least doesn’t partially rewrite it), guarding against regressions in the parenthesis-matching logic.
```suggestion
@Test
public void testJsonParseWithCast()
{
String sql = "SELECT CAST(json_parse(data) AS MAP(VARCHAR, VARCHAR)) FROM table1";
String expected = "SELECT CAST(TRY(json_parse(data)) AS MAP(VARCHAR, VARCHAR)) FROM table1";
assertEquals(wrapUnsafeJsonParse(sql), expected);
}
@Test
public void testMalformedParenthesesMissingClosing()
{
String sql = "SELECT json_parse(data";
assertEquals(wrapUnsafeJsonParse(sql), sql);
}
@Test
public void testMalformedParenthesesUnbalancedOpening()
{
String sql = "SELECT json_parse((data";
assertEquals(wrapUnsafeJsonParse(sql), sql);
}
@Test
```
</issue_to_address>
### Comment 4
<location> `presto-verifier/src/test/java/com/facebook/presto/verifier/framework/TestJsonParseSafetyWrapper.java:133-139` </location>
<code_context>
+ }
+
+ @Test
+ public void testWhitespaceHandling()
+ {
+ String sql = "SELECT json_parse ( data ) FROM table1";
+ String expected = "SELECT TRY(json_parse ( data )) FROM table1";
+ assertEquals(wrapUnsafeJsonParse(sql), expected);
+ }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a multi-line SQL test (including newlines between TRY and json_parse) to validate the regex handling of `\s*` across lines.
Right now we only cover single-line whitespace. Because the regex uses `\s*` in the negative lookbehind and around `json_parse`, a multi-line query like:
```sql
SELECT
TRY(
json_parse(data)
)
FROM table1
```
should be considered already wrapped. Please add a test with newlines between `TRY(` and `json_parse`, and/or between `json_parse` and its argument, to confirm this behavior and protect against future regex or matcher changes.
```suggestion
@Test
public void testWhitespaceHandling()
{
String sql = "SELECT json_parse ( data ) FROM table1";
String expected = "SELECT TRY(json_parse ( data )) FROM table1";
assertEquals(wrapUnsafeJsonParse(sql), expected);
}
@Test
public void testMultilineWhitespaceHandlingAlreadyWrapped()
{
String sql = "SELECT\n" +
" TRY(\n" +
" json_parse(data)\n" +
" )\n" +
"FROM table1";
String expected = sql;
assertEquals(wrapUnsafeJsonParse(sql), expected);
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...to-verifier/src/main/java/com/facebook/presto/verifier/framework/JsonParseSafetyWrapper.java
Outdated
Show resolved
Hide resolved
| { | ||
| String sql = "SELECT json_parse(column1) FROM table1"; | ||
| String expected = "SELECT TRY(json_parse(column1)) FROM table1"; | ||
| assertEquals(wrapUnsafeJsonParse(sql), expected); |
There was a problem hiding this comment.
issue (testing): Add a test where TRY and the opening parenthesis are separated by whitespace (e.g., TRY (json_parse(...))) to cover a likely real-world edge case.
The current cases only vary whitespace after the opening parenthesis, not between TRY and (. With the current regex, TRY (json_parse(...)) is treated as unwrapped and will likely be rewritten as TRY(TRY (json_parse(...))). A test for this pattern will both capture the current behavior and surface this potential bug.
| @Test | ||
| public void testJsonParseWithCast() | ||
| { | ||
| String sql = "SELECT CAST(json_parse(data) AS MAP(VARCHAR, VARCHAR)) FROM table1"; | ||
| String expected = "SELECT CAST(TRY(json_parse(data)) AS MAP(VARCHAR, VARCHAR)) FROM table1"; | ||
| assertEquals(wrapUnsafeJsonParse(sql), expected); | ||
| } | ||
|
|
||
| @Test |
There was a problem hiding this comment.
suggestion (testing): Consider adding tests for malformed or unbalanced parentheses to ensure the wrapper is a no-op instead of partially rewriting the SQL.
Since findMatchingParen returning -1 logs a warning and leaves json_parse unchanged, please add tests with malformed SQL like "SELECT json_parse(data" and "SELECT json_parse((data" to confirm wrapUnsafeJsonParse returns the original string (or at least doesn’t partially rewrite it), guarding against regressions in the parenthesis-matching logic.
| @Test | |
| public void testJsonParseWithCast() | |
| { | |
| String sql = "SELECT CAST(json_parse(data) AS MAP(VARCHAR, VARCHAR)) FROM table1"; | |
| String expected = "SELECT CAST(TRY(json_parse(data)) AS MAP(VARCHAR, VARCHAR)) FROM table1"; | |
| assertEquals(wrapUnsafeJsonParse(sql), expected); | |
| } | |
| @Test | |
| @Test | |
| public void testJsonParseWithCast() | |
| { | |
| String sql = "SELECT CAST(json_parse(data) AS MAP(VARCHAR, VARCHAR)) FROM table1"; | |
| String expected = "SELECT CAST(TRY(json_parse(data)) AS MAP(VARCHAR, VARCHAR)) FROM table1"; | |
| assertEquals(wrapUnsafeJsonParse(sql), expected); | |
| } | |
| @Test | |
| public void testMalformedParenthesesMissingClosing() | |
| { | |
| String sql = "SELECT json_parse(data"; | |
| assertEquals(wrapUnsafeJsonParse(sql), sql); | |
| } | |
| @Test | |
| public void testMalformedParenthesesUnbalancedOpening() | |
| { | |
| String sql = "SELECT json_parse((data"; | |
| assertEquals(wrapUnsafeJsonParse(sql), sql); | |
| } | |
| @Test |
| @Test | ||
| public void testWhitespaceHandling() | ||
| { | ||
| String sql = "SELECT json_parse ( data ) FROM table1"; | ||
| String expected = "SELECT TRY(json_parse ( data )) FROM table1"; | ||
| assertEquals(wrapUnsafeJsonParse(sql), expected); | ||
| } |
There was a problem hiding this comment.
suggestion (testing): Add a multi-line SQL test (including newlines between TRY and json_parse) to validate the regex handling of \s* across lines.
Right now we only cover single-line whitespace. Because the regex uses \s* in the negative lookbehind and around json_parse, a multi-line query like:
SELECT
TRY(
json_parse(data)
)
FROM table1should be considered already wrapped. Please add a test with newlines between TRY( and json_parse, and/or between json_parse and its argument, to confirm this behavior and protect against future regex or matcher changes.
| @Test | |
| public void testWhitespaceHandling() | |
| { | |
| String sql = "SELECT json_parse ( data ) FROM table1"; | |
| String expected = "SELECT TRY(json_parse ( data )) FROM table1"; | |
| assertEquals(wrapUnsafeJsonParse(sql), expected); | |
| } | |
| @Test | |
| public void testWhitespaceHandling() | |
| { | |
| String sql = "SELECT json_parse ( data ) FROM table1"; | |
| String expected = "SELECT TRY(json_parse ( data )) FROM table1"; | |
| assertEquals(wrapUnsafeJsonParse(sql), expected); | |
| } | |
| @Test | |
| public void testMultilineWhitespaceHandlingAlreadyWrapped() | |
| { | |
| String sql = "SELECT\n" + | |
| " TRY(\n" + | |
| " json_parse(data)\n" + | |
| " )\n" + | |
| "FROM table1"; | |
| String expected = sql; | |
| assertEquals(wrapUnsafeJsonParse(sql), expected); | |
| } |
d3c4e84 to
fd60337
Compare
Summary: Some query rewrites (e.g., typeof() compatibility rewrites) generate json_parse() calls without TRY() wrappers, causing skip_control_failures on malformed/truncated JSON strings. Adds JsonParseSafetyWrapper utility that wraps unprotected json_parse() calls with TRY() as a post-processing step in QueryRewriter. Handles nested parentheses, quoted strings, escaped characters, case-insensitive TRY matching, and whitespace between TRY( and json_parse. # Releas Notes ``` == NO RELEASE NOTE == ``` Reviewed By: tanjialiang Differential Revision: D92548510
Summary: Some query rewrites (e.g., typeof() compatibility rewrites) generate json_parse() calls without TRY() wrappers, causing skip_control_failures on malformed/truncated JSON strings. Adds JsonParseSafetyWrapper utility that wraps unprotected json_parse() calls with TRY() as a post-processing step in QueryRewriter. Handles nested parentheses, quoted strings, escaped characters, case-insensitive TRY matching, and whitespace between TRY( and json_parse. # Releas Notes ``` == NO RELEASE NOTE == ``` Reviewed By: tanjialiang Differential Revision: D92548510
0111b2c to
085396c
Compare
Summary:
Some query rewrites (e.g., typeof() compatibility rewrites) generate json_parse() calls without TRY() wrappers, causing skip_control_failures on malformed/truncated JSON strings.
Adds JsonParseSafetyWrapper utility that wraps unprotected json_parse() calls with TRY() as a post-processing step in QueryRewriter. Handles nested parentheses, quoted strings, escaped characters, case-insensitive TRY matching, and whitespace between TRY( and json_parse.
Releas Notes
Differential Revision: D92548510
Summary by Sourcery
Add a post-processing step in the query rewriter to safely wrap json_parse() calls with TRY(), preventing query failures on malformed JSON in rewritten queries.
Bug Fixes:
Enhancements:
Tests: