Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ else if (ignoredRules.contains(rule)) {
if (transition instanceof RuleTransition) {
RuleTransition ruleTransition = (RuleTransition) transition;
for (int endToken : process(new ParsingState(ruleTransition.target, tokenIndex, suppressed, parser), ruleTransition.precedence)) {
activeStates.push(new ParsingState(ruleTransition.followState, endToken, suppressed, parser));
activeStates.push(new ParsingState(ruleTransition.followState, endToken, suppressed && endToken == currentToken, parser));
}
}
else if (transition instanceof PrecedencePredicateTransition) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@ public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int
.specialRule(SqlBaseParser.RULE_query, "<query>")
.specialRule(SqlBaseParser.RULE_type, "<type>")
.specialToken(SqlBaseLexer.INTEGER_VALUE, "<integer>")
.ignoredRule(SqlBaseParser.RULE_nonReserved)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can remove ignoredRule from ErrorHandler now?
AFAIK, nonReserved was the only ignored rule.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can remove that.

Comment thread
martint marked this conversation as resolved.
Outdated
.build();

private final BiConsumer<SqlBaseLexer, SqlBaseParser> initializer;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,9 @@ private static Stream<Arguments> statements()
Arguments.of("SELECT grouping(a+2) FROM (VALUES (1)) AS t (a) GROUP BY a+2",
"line 1:18: mismatched input '+'. Expecting: ')', ',', '.'"),
Arguments.of("SELECT x() over (ROWS select) FROM t",
"line 1:23: mismatched input 'select'. Expecting: 'BETWEEN', 'CURRENT', 'UNBOUNDED', <expression>"),
"line 1:23: mismatched input 'select'. Expecting: ')', 'BETWEEN', 'CURRENT', 'GROUPS', 'MEASURES', 'ORDER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', <expression>"),
Arguments.of("SELECT X() OVER (ROWS UNBOUNDED) FROM T",
"line 1:32: mismatched input ')'. Expecting: 'FOLLOWING', 'PRECEDING'"),
"line 1:32: mismatched input ')'. Expecting: '%', '(', '*', '+', '-', '->', '.', '/', 'AND', 'AT', 'FOLLOWING', 'OR', 'OVER', 'PRECEDING', '[', '||', <predicate>, <string>"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. The error position is reported as the end of UNBOUNDED:

SELECT X() OVER (ROWS UNBOUNDED) FROM T
                               ^

The only valid tokens at that point are FOLLOWING and PRECEDING, so we shouldn't be reporting all of these extra tokens.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. I’m trying to figure out why it’s doing that. There’s some other bug lurking in there.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out these are all valid tokens per the grammar. It's because UNBOUNDED is a non-reserved word, so it can take the place of an identifier. In particular, it's considering alternatives that include the following productions:

over
    : OVER ([...] | '(' windowSpecification ')')
    ;

windowSpecification
    : [...]
      windowFrame?
    ;

windowFrame
    : [...]
      frameExtent
    ;

frameExtent
    : [...]
    | frameType=ROWS start=frameBound
    | [...]
    ;

frameBound
    : [...]
    | expression boundType=(PRECEDING | FOLLOWING)  #boundedFrame
    ;

And <expression> can be an <identifier> followed by a number of things, including ., [, OVER, etc.

For instance, this goes through the parser successfully even though it's clearly wrong if we consider UNBOUNDED a special keyword:

SELECT X() OVER (ROWS UNBOUNDED -> 1 PRECEDING) FROM t

It's treating UNBOUNDED -> 1 as a lambda expression, which is a valid (syntax-wise) expression in the "#boundedFrame" frame bound above.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I mark UNBOUNDED as a reserved word, then it produces the expected message:

line 1:32: mismatched input ')'. Expecting: 'FOLLOWING', 'PRECEDING'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make the error handler treat UNBOUNDED as a reserved word without actually making it reserved? In my opinion, printing all alternatives for the case of UNBOUNDED being an identifier is unnecessary and confusing, even if those are all correct per grammar.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue I can spot here is that the error handler suggests GROUPS, RANGE, and ROWS at position 23:

Arguments.of("SELECT x() over (ROWS select) FROM t",
                        "line 1:23: mismatched input 'select'. Expecting: 'BETWEEN', 'CURRENT', 'UNBOUNDED', <expression>"),
                        "line 1:23: mismatched input 'select'. Expecting: ')', 'BETWEEN', 'CURRENT', 'GROUPS', 'MEASURES', 'ORDER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', <expression>"),

It is not correct, as position 23 is after the word ROWS. It seems that the reported error position is the innermost one, but the suggestions come from an outer rule.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct (grammar-wise) because ROWS can be interpreted as an identifier representing a named window. After that, any of the terms PARTITION, ORDER, or any of the prefix tokens in windowFrame are valid.

At the end of the day, this fixes a bug in the parser error handler that should've never existed in the initial implementation, so we should get it in. I can't think of any way to avoid the spurious tokens due to the non-reserved keyword handling without adding semantic knowledge to the grammar about the behavior of each rule, but I have some ideas that are out of the scope of this PR.

Arguments.of("SELECT a FROM x ORDER BY (SELECT b FROM t WHERE ",
"line 1:49: mismatched input '<EOF>'. Expecting: <expression>"),
Arguments.of("SELECT a FROM a AS x TABLESAMPLE x ",
Expand Down Expand Up @@ -160,7 +160,13 @@ private static Stream<Arguments> statements()
Arguments.of("SELECT * FROM t FOR TIMESTAMP AS OF TIMESTAMP WHERE",
"line 1:52: mismatched input '<EOF>'. Expecting: <expression>"),
Arguments.of("SELECT * FROM t FOR VERSION AS OF TIMESTAMP WHERE",
"line 1:50: mismatched input '<EOF>'. Expecting: <expression>"));
"line 1:50: mismatched input '<EOF>'. Expecting: <expression>"),
Arguments.of("SELECT ROW(DATE '2022-10-10', DOUBLE 12.0)",
Comment thread
martint marked this conversation as resolved.
Outdated
Comment thread
martint marked this conversation as resolved.
Outdated
"line 1:38: mismatched input '12.0'. Expecting: '%', '(', ')', '*', '+', ',', '-', '->', '.', '/', 'AND', 'AT', 'OR', 'ORDER', 'OVER', 'PRECISION', '[', '||', <predicate>, <string>"),
Arguments.of("VALUES(DATE 2)",
"line 1:13: mismatched input '2'. Expecting: '%', '(', ')', '*', '+', ',', '-', '->', '.', '/', 'AND', 'AT', 'OR', 'OVER', '[', '||', <predicate>, <string>"),
Arguments.of("SELECT count(DISTINCT *) FROM (VALUES 1)",
"line 1:23: mismatched input '*'. Expecting: <expression>"));
}

@Test
Expand All @@ -178,7 +184,7 @@ public void testPossibleExponentialBacktracking()
"1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * " +
"1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9 * " +
"1 * 2 * 3 * 4 * 5 * 6 * 7 * 8 * 9",
"line 1:375: mismatched input '<EOF>'. Expecting: '%', '*', '+', '-', '/', 'AT', 'THEN', '||'");
"line 1:375: mismatched input '<EOF>'. Expecting: '%', '*', '+', '-', '.', '/', 'AND', 'AT', 'OR', 'THEN', '[', '||', <predicate>");
}

@Test
Expand Down Expand Up @@ -208,7 +214,7 @@ public void testPossibleExponentialBacktracking2()
"OR (f()\n" +
"OR (f()\n" +
"GROUP BY id",
"line 24:1: mismatched input 'GROUP'. Expecting: ')', ',', '.', 'FILTER', 'IGNORE', 'OVER', 'RESPECT', '['");
"line 24:1: mismatched input 'GROUP'. Expecting: '%', ')', '*', '+', ',', '-', '.', '/', 'AND', 'AT', 'FILTER', 'IGNORE', 'OR', 'OVER', 'RESPECT', '[', '||', <predicate>");
}

@ParameterizedTest
Expand Down