Skip to content

fix: Query rewriter truncate#27115

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
han-yan01:export-D92548510
Feb 11, 2026
Merged

fix: Query rewriter truncate#27115
feilong-liu merged 1 commit intoprestodb:masterfrom
han-yan01:export-D92548510

Conversation

@han-yan01
Copy link
Copy Markdown
Contributor

@han-yan01 han-yan01 commented Feb 9, 2026

Summary:
Some query rewrites (e.g., typeof() compatibility rewrites) generate json_parse() calls without TRY() wrappers, causing skip_control_failures on malformed/truncated JSON strings.

Adds JsonParseSafetyWrapper utility that wraps unprotected json_parse() calls with TRY() as a post-processing step in QueryRewriter. Handles nested parentheses, quoted strings, escaped characters, case-insensitive TRY matching, and whitespace between TRY( and json_parse.

Releas Notes

== NO RELEASE NOTE ==

Differential Revision: D92548510

Summary by Sourcery

Add a post-processing step in the query rewriter to safely wrap json_parse() calls with TRY(), preventing query failures on malformed JSON in rewritten queries.

Bug Fixes:

  • Prevent verifier query failures caused by rewritten json_parse() calls on malformed or truncated JSON by wrapping them in TRY().

Enhancements:

  • Introduce JsonParseSafetyWrapper utility to detect unsafe json_parse() usages in SQL strings and wrap them in TRY() while handling nesting, quoting, escaping, and whitespace.
  • Apply the json_parse safety wrapper to CREATE TABLE AS SELECT, INSERT, and plain SELECT rewrites in QueryRewriter.
  • Add logging around failures to apply the json_parse safety wrapper during query rewriting.

Tests:

  • Add comprehensive unit tests for JsonParseSafetyWrapper covering multiple json_parse() patterns, nested expressions, case variations, string literals, escaping, and mixed TRY-wrapped and unwrapped calls.

@han-yan01 han-yan01 requested a review from a team as a code owner February 9, 2026 22:22
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Feb 9, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Feb 9, 2026

Reviewer's Guide

Adds a JsonParseSafetyWrapper utility and wires it into QueryRewriter so that any json_parse() introduced by rewrites is post-processed and wrapped in TRY(), using a regex- and parenthesis-aware string pass that avoids SQL parsing errors on malformed JSON, with comprehensive unit tests for edge cases.

Sequence diagram for query rewrite with JsonParseSafetyWrapper

sequenceDiagram
    actor Verifier
    participant QueryRewriter
    participant FunctionCallRewriter
    participant SqlParser
    participant JsonParseSafetyWrapper
    participant Logger

    Verifier->>QueryRewriter: rewriteQuery(sql, queryConfiguration, clusterType, controlQuery)
    QueryRewriter->>SqlParser: createStatement(sql, PARSING_OPTIONS)
    SqlParser-->>QueryRewriter: Query

    QueryRewriter->>FunctionCallRewriter: rewrite(Query)
    FunctionCallRewriter-->>QueryRewriter: RewriterResult
    QueryRewriter->>QueryRewriter: extract rewritten Query

    QueryRewriter->>QueryRewriter: applyJsonParseSafetyWrapper(Query)
    activate QueryRewriter
    QueryRewriter->>SqlParser: formatSql(Query)
    SqlParser-->>QueryRewriter: formattedSql

    QueryRewriter->>JsonParseSafetyWrapper: wrapUnsafeJsonParse(formattedSql)
    JsonParseSafetyWrapper-->>QueryRewriter: fixedSql

    alt fixedSql differs from formattedSql
        QueryRewriter->>SqlParser: createStatement(fixedSql, PARSING_OPTIONS)
        SqlParser-->>QueryRewriter: QueryWithTryWrappedJsonParse
        QueryRewriter-->>QueryRewriter: return QueryWithTryWrappedJsonParse
    else fixedSql equals formattedSql
        QueryRewriter-->>QueryRewriter: return original Query
    end
    deactivate QueryRewriter

    QueryRewriter-->>Verifier: QueryObjectBundle
Loading

Class diagram for JsonParseSafetyWrapper integration with QueryRewriter

classDiagram
    class QueryRewriter {
        - com.facebook.airlift.log.Logger log
        - SqlParser sqlParser
        - TypeManager typeManager
        - BlockEncodingSerde blockEncodingSerde
        - Optional~FunctionCallRewriter~ functionCallRewriter
        + QueryRewriter(SqlParser sqlParser, TypeManager typeManager, BlockEncodingSerde blockEncodingSerde, Optional~FunctionCallRewriter~ functionCallRewriter)
        - Query applyJsonParseSafetyWrapper(Query query)
        + QueryObjectBundle rewriteQuery(String query, QueryConfiguration queryConfiguration, ClusterType clusterType)
        + QueryObjectBundle rewriteQuery(String query, QueryConfiguration queryConfiguration, ClusterType clusterType, boolean controlQuery)
    }

    class JsonParseSafetyWrapper {
        - Logger log
        - Pattern JSON_PARSE_PATTERN
        + String wrapUnsafeJsonParse(String sql)
        - int findMatchingParen(String sql, int openParenPos)
        - boolean isEscaped(String sql, int pos)
    }

    class SqlParser {
        + Statement createStatement(String sql, ParsingOptions parsingOptions)
    }

    class Logger {
        + void warn(String message, Object arg1)
        + void warn(Throwable throwable, String message)
        + void debug(String message)
    }

    QueryRewriter ..> JsonParseSafetyWrapper : uses
    QueryRewriter ..> SqlParser : uses
    QueryRewriter ..> Logger : uses
    JsonParseSafetyWrapper ..> Logger : uses
    JsonParseSafetyWrapper ..> Pattern : uses
    JsonParseSafetyWrapper ..> Matcher : uses
Loading

File-Level Changes

Change Details Files
Introduce JsonParseSafetyWrapper utility to wrap unprotected json_parse() calls with TRY() using a regex-driven, parenthesis- and string-literal-aware scan.
  • Add JsonParseSafetyWrapper class in verifier framework to post-process SQL strings and wrap json_parse() calls that are not already inside TRY().
  • Use a negative-lookbehind, case-insensitive regex to detect json_parse( occurrences not preceded by TRY( plus optional whitespace.
  • Implement a custom findMatchingParen scanner that tracks nesting depth and respects single/double-quoted strings and escaped characters to locate the correct closing parenthesis for each json_parse call.
  • Guard against malformed input by returning the original SQL if no json_parse is found and logging a warning when matching parentheses cannot be found, while still processing other occurrences.
presto-verifier/src/main/java/com/facebook/presto/verifier/framework/JsonParseSafetyWrapper.java
Apply JsonParseSafetyWrapper to rewritten queries in QueryRewriter with defensive error handling and logging.
  • Add a logger to QueryRewriter and a private applyJsonParseSafetyWrapper helper that formats the Query to SQL, applies wrapUnsafeJsonParse, and reparses only when the SQL string changes.
  • Catch and log any exceptions thrown during wrapping or reparsing and fall back to the original Query AST to avoid impacting verification flows.
  • Invoke applyJsonParseSafetyWrapper after FunctionCallRewriter rewrites for CREATE TABLE AS SELECT, INSERT, and plain SELECT query paths so any introduced json_parse() is protected.
presto-verifier/src/main/java/com/facebook/presto/verifier/rewrite/QueryRewriter.java
Add comprehensive unit tests covering wrapping behavior and edge cases for JsonParseSafetyWrapper.
  • Create TestJsonParseSafetyWrapper with cases for null/empty input, no-op queries, simple and multiple json_parse calls, nested functions and parentheses, and json_parse in WHERE clauses and subqueries.
  • Verify that already TRY-wrapped json_parse invocations in any TRY casing and with whitespace are not double-wrapped while adjacent unwrapped calls are wrapped.
  • Test case-insensitive JSON_PARSE detection, handling of string literals and identifiers with parentheses or json_parse-like substrings, whitespace variations, and escaped quotes/backslashes within strings.
presto-verifier/src/test/java/com/facebook/presto/verifier/framework/TestJsonParseSafetyWrapper.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The JSON_PARSE_PATTERN regex uses a variable-length lookbehind (?<![Tt][Rr][Yy]\(\s*), which is not allowed by Java's regex engine and will cause a PatternSyntaxException at class initialization; consider removing the lookbehind and instead detecting preceding TRY( (with optional whitespace) in the Java logic around each match.
  • The current approach for avoiding double-wrapping TRY(json_parse(...)) only looks immediately before json_parse; this can easily miss or mis-handle edge cases (e.g., nested function calls, unusual whitespace), so it may be more robust to explicitly inspect the characters before each match in wrapUnsafeJsonParse to determine whether it's already inside a TRY( rather than encoding this in the regex.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `JSON_PARSE_PATTERN` regex uses a variable-length lookbehind `(?<![Tt][Rr][Yy]\(\s*)`, which is not allowed by Java's regex engine and will cause a `PatternSyntaxException` at class initialization; consider removing the lookbehind and instead detecting preceding `TRY(` (with optional whitespace) in the Java logic around each match.
- The current approach for avoiding double-wrapping `TRY(json_parse(...))` only looks immediately before `json_parse`; this can easily miss or mis-handle edge cases (e.g., nested function calls, unusual whitespace), so it may be more robust to explicitly inspect the characters before each match in `wrapUnsafeJsonParse` to determine whether it's already inside a `TRY(` rather than encoding this in the regex.

## Individual Comments

### Comment 1
<location> `presto-verifier/src/main/java/com/facebook/presto/verifier/framework/JsonParseSafetyWrapper.java:40-41` </location>
<code_context>
+    // Pattern to match json_parse( that is NOT already wrapped in TRY(
+    // Uses negative lookbehind to avoid double-wrapping.
+    // Handles case-insensitive TRY with optional whitespace before json_parse.
+    private static final Pattern JSON_PARSE_PATTERN = Pattern.compile(
+            "(?<![Tt][Rr][Yy]\\(\\s*)\\b(json_parse)\\s*\\(",
+            Pattern.CASE_INSENSITIVE);
+
</code_context>

<issue_to_address>
**issue (bug_risk):** The regex uses a variable-length lookbehind (`\s*`), which is not supported by Java regex and will throw a PatternSyntaxException at class load time.

Lookbehinds in Java must be fixed-length, so the `\s*` inside `(?<![Tt][Rr][Yy]\(\s*)` makes this pattern invalid. Instead of relying on this lookbehind, consider either (a) removing it and doing the `TRY(...)` detection procedurally in `wrapUnsafeJsonParse` (e.g., checking characters before each match), or (b) redesigning the regex so the lookbehind contains only fixed-length constructs.
</issue_to_address>

### Comment 2
<location> `presto-verifier/src/test/java/com/facebook/presto/verifier/framework/TestJsonParseSafetyWrapper.java:194-43` </location>
<code_context>
+    }
+
+    @Test
+    public void testMixedTryWrappedAndUnwrappedWithWhitespace()
+    {
+        String sql = "SELECT TRY( json_parse(a)), json_parse(b), try(json_parse(c)) FROM table1";
+        String expected = "SELECT TRY( json_parse(a)), TRY(json_parse(b)), try(json_parse(c)) FROM table1";
+        assertEquals(wrapUnsafeJsonParse(sql), expected);
+    }
+}
</code_context>

<issue_to_address>
**issue (testing):** Add a test where TRY and the opening parenthesis are separated by whitespace (e.g., `TRY (json_parse(...))`) to cover a likely real-world edge case.

The current cases only vary whitespace after the opening parenthesis, not between `TRY` and `(`. With the current regex, `TRY (json_parse(...))` is treated as unwrapped and will likely be rewritten as `TRY(TRY (json_parse(...)))`. A test for this pattern will both capture the current behavior and surface this potential bug.
</issue_to_address>

### Comment 3
<location> `presto-verifier/src/test/java/com/facebook/presto/verifier/framework/TestJsonParseSafetyWrapper.java:77-85` </location>
<code_context>
+    }
+
+    @Test
+    public void testNestedParentheses()
+    {
+        String sql = "SELECT json_parse(concat(a, b, (SELECT c FROM t))) FROM table1";
+        String expected = "SELECT TRY(json_parse(concat(a, b, (SELECT c FROM t)))) FROM table1";
+        assertEquals(wrapUnsafeJsonParse(sql), expected);
+    }
+
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for malformed or unbalanced parentheses to ensure the wrapper is a no-op instead of partially rewriting the SQL.

Since `findMatchingParen` returning -1 logs a warning and leaves `json_parse` unchanged, please add tests with malformed SQL like `"SELECT json_parse(data"` and `"SELECT json_parse((data"` to confirm `wrapUnsafeJsonParse` returns the original string (or at least doesn’t partially rewrite it), guarding against regressions in the parenthesis-matching logic.

```suggestion
    @Test
    public void testJsonParseWithCast()
    {
        String sql = "SELECT CAST(json_parse(data) AS MAP(VARCHAR, VARCHAR)) FROM table1";
        String expected = "SELECT CAST(TRY(json_parse(data)) AS MAP(VARCHAR, VARCHAR)) FROM table1";
        assertEquals(wrapUnsafeJsonParse(sql), expected);
    }

    @Test
    public void testMalformedParenthesesMissingClosing()
    {
        String sql = "SELECT json_parse(data";
        assertEquals(wrapUnsafeJsonParse(sql), sql);
    }

    @Test
    public void testMalformedParenthesesUnbalancedOpening()
    {
        String sql = "SELECT json_parse((data";
        assertEquals(wrapUnsafeJsonParse(sql), sql);
    }

    @Test
```
</issue_to_address>

### Comment 4
<location> `presto-verifier/src/test/java/com/facebook/presto/verifier/framework/TestJsonParseSafetyWrapper.java:133-139` </location>
<code_context>
+    }
+
+    @Test
+    public void testWhitespaceHandling()
+    {
+        String sql = "SELECT json_parse  (  data  ) FROM table1";
+        String expected = "SELECT TRY(json_parse  (  data  )) FROM table1";
+        assertEquals(wrapUnsafeJsonParse(sql), expected);
+    }
+
</code_context>

<issue_to_address>
**suggestion (testing):** Add a multi-line SQL test (including newlines between TRY and json_parse) to validate the regex handling of `\s*` across lines.

Right now we only cover single-line whitespace. Because the regex uses `\s*` in the negative lookbehind and around `json_parse`, a multi-line query like:

```sql
SELECT
  TRY(
    json_parse(data)
  )
FROM table1
```
should be considered already wrapped. Please add a test with newlines between `TRY(` and `json_parse`, and/or between `json_parse` and its argument, to confirm this behavior and protect against future regex or matcher changes.

```suggestion
    @Test
    public void testWhitespaceHandling()
    {
        String sql = "SELECT json_parse  (  data  ) FROM table1";
        String expected = "SELECT TRY(json_parse  (  data  )) FROM table1";
        assertEquals(wrapUnsafeJsonParse(sql), expected);
    }

    @Test
    public void testMultilineWhitespaceHandlingAlreadyWrapped()
    {
        String sql = "SELECT\n" +
                "  TRY(\n" +
                "    json_parse(data)\n" +
                "  )\n" +
                "FROM table1";
        String expected = sql;
        assertEquals(wrapUnsafeJsonParse(sql), expected);
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

{
String sql = "SELECT json_parse(column1) FROM table1";
String expected = "SELECT TRY(json_parse(column1)) FROM table1";
assertEquals(wrapUnsafeJsonParse(sql), expected);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Add a test where TRY and the opening parenthesis are separated by whitespace (e.g., TRY (json_parse(...))) to cover a likely real-world edge case.

The current cases only vary whitespace after the opening parenthesis, not between TRY and (. With the current regex, TRY (json_parse(...)) is treated as unwrapped and will likely be rewritten as TRY(TRY (json_parse(...))). A test for this pattern will both capture the current behavior and surface this potential bug.

Comment on lines +77 to +85
@Test
public void testJsonParseWithCast()
{
String sql = "SELECT CAST(json_parse(data) AS MAP(VARCHAR, VARCHAR)) FROM table1";
String expected = "SELECT CAST(TRY(json_parse(data)) AS MAP(VARCHAR, VARCHAR)) FROM table1";
assertEquals(wrapUnsafeJsonParse(sql), expected);
}

@Test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding tests for malformed or unbalanced parentheses to ensure the wrapper is a no-op instead of partially rewriting the SQL.

Since findMatchingParen returning -1 logs a warning and leaves json_parse unchanged, please add tests with malformed SQL like "SELECT json_parse(data" and "SELECT json_parse((data" to confirm wrapUnsafeJsonParse returns the original string (or at least doesn’t partially rewrite it), guarding against regressions in the parenthesis-matching logic.

Suggested change
@Test
public void testJsonParseWithCast()
{
String sql = "SELECT CAST(json_parse(data) AS MAP(VARCHAR, VARCHAR)) FROM table1";
String expected = "SELECT CAST(TRY(json_parse(data)) AS MAP(VARCHAR, VARCHAR)) FROM table1";
assertEquals(wrapUnsafeJsonParse(sql), expected);
}
@Test
@Test
public void testJsonParseWithCast()
{
String sql = "SELECT CAST(json_parse(data) AS MAP(VARCHAR, VARCHAR)) FROM table1";
String expected = "SELECT CAST(TRY(json_parse(data)) AS MAP(VARCHAR, VARCHAR)) FROM table1";
assertEquals(wrapUnsafeJsonParse(sql), expected);
}
@Test
public void testMalformedParenthesesMissingClosing()
{
String sql = "SELECT json_parse(data";
assertEquals(wrapUnsafeJsonParse(sql), sql);
}
@Test
public void testMalformedParenthesesUnbalancedOpening()
{
String sql = "SELECT json_parse((data";
assertEquals(wrapUnsafeJsonParse(sql), sql);
}
@Test

Comment on lines +133 to +139
@Test
public void testWhitespaceHandling()
{
String sql = "SELECT json_parse ( data ) FROM table1";
String expected = "SELECT TRY(json_parse ( data )) FROM table1";
assertEquals(wrapUnsafeJsonParse(sql), expected);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a multi-line SQL test (including newlines between TRY and json_parse) to validate the regex handling of \s* across lines.

Right now we only cover single-line whitespace. Because the regex uses \s* in the negative lookbehind and around json_parse, a multi-line query like:

SELECT
  TRY(
    json_parse(data)
  )
FROM table1

should be considered already wrapped. Please add a test with newlines between TRY( and json_parse, and/or between json_parse and its argument, to confirm this behavior and protect against future regex or matcher changes.

Suggested change
@Test
public void testWhitespaceHandling()
{
String sql = "SELECT json_parse ( data ) FROM table1";
String expected = "SELECT TRY(json_parse ( data )) FROM table1";
assertEquals(wrapUnsafeJsonParse(sql), expected);
}
@Test
public void testWhitespaceHandling()
{
String sql = "SELECT json_parse ( data ) FROM table1";
String expected = "SELECT TRY(json_parse ( data )) FROM table1";
assertEquals(wrapUnsafeJsonParse(sql), expected);
}
@Test
public void testMultilineWhitespaceHandlingAlreadyWrapped()
{
String sql = "SELECT\n" +
" TRY(\n" +
" json_parse(data)\n" +
" )\n" +
"FROM table1";
String expected = sql;
assertEquals(wrapUnsafeJsonParse(sql), expected);
}

Summary:

Some query rewrites (e.g., typeof() compatibility rewrites) generate json_parse() calls without TRY() wrappers, causing skip_control_failures on malformed/truncated JSON strings.

Adds JsonParseSafetyWrapper utility that wraps unprotected json_parse() calls with TRY() as a post-processing step in QueryRewriter. Handles nested parentheses, quoted strings, escaped characters, case-insensitive TRY matching, and whitespace between TRY( and json_parse.


# Releas Notes
```
== NO RELEASE NOTE ==
```

Reviewed By: tanjialiang

Differential Revision: D92548510
han-yan01 added a commit to han-yan01/presto that referenced this pull request Feb 10, 2026
Summary:

Some query rewrites (e.g., typeof() compatibility rewrites) generate json_parse() calls without TRY() wrappers, causing skip_control_failures on malformed/truncated JSON strings.

Adds JsonParseSafetyWrapper utility that wraps unprotected json_parse() calls with TRY() as a post-processing step in QueryRewriter. Handles nested parentheses, quoted strings, escaped characters, case-insensitive TRY matching, and whitespace between TRY( and json_parse.


# Releas Notes
```
== NO RELEASE NOTE ==
```

Reviewed By: tanjialiang

Differential Revision: D92548510
@han-yan01 han-yan01 force-pushed the export-D92548510 branch 2 times, most recently from 0111b2c to 085396c Compare February 10, 2026 07:56
@han-yan01 han-yan01 changed the title fix query rewriter truncate fix: Query rewriter truncate Feb 10, 2026
@feilong-liu feilong-liu merged commit fc589a5 into prestodb:master Feb 11, 2026
81 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants