[postgresql] Ambiguity with ROLLUP. #4320

kaby76 · 2024-11-11T00:29:30Z

We're now getting into the more difficult ambiguities with the grammar. And, we're making great progress in the speed of the parse: It now takes ~9s to parse all the input, compared to ~42s (note, both tests are without parsing function bodies). Before disabling function body parsing, parsing the entire test suite was ~45s.

Consider the input SELECT c, sum(a) FROM pagg_tab GROUP BY rollup(c) ORDER BY 1, 2;. This can be parsed two ways.

11/10-19:25:02 ~/issues/g4-more-postgresql/sql/postgresql/Generated-CSharp
$ trparse -i 'SELECT c, sum(a) FROM pagg_tab GROUP BY rollup(c) ORDER BY 1, 2;' --ambig | trtree -a
CSharp 0 string success 0.1997469
(root (stmtblock (stmtmulti (stmt (selectstmt (select_no_parens (select_clause (simple_select_intersect (simple_select_pramary (SELECT "SELECT") (target_list_ (target_list (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c"))))))))))))))))))))))))))) (COMMA ",") (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (func_expr (func_application (func_name (type_function_name (identifier (Identifier "sum")))) (OPEN_PAREN "(") (func_arg_list (func_arg_expr (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "a")))))))))))))))))))))))))))) (CLOSE_PAREN ")")))))))))))))))))))))))))))) (from_clause (FROM "FROM") (from_list (table_ref (relation_expr (qualified_name (colid (identifier (Identifier "pagg_tab")))))))) (group_clause (GROUP_P "GROUP") (BY "BY") (group_by_list (group_by_item (rollup_clause (ROLLUP "rollup") (OPEN_PAREN "(") (expr_list (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c"))))))))))))))))))))))))))) (CLOSE_PAREN ")")))))))) (sort_clause_ (sort_clause (ORDER "ORDER") (BY "BY") (sortby_list (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "1")))))))))))))))))))))))))) (COMMA ",") (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "2")))))))))))))))))))))))))))))))) (SEMI ";"))) (EOF ""))
(root (stmtblock (stmtmulti (stmt (selectstmt (select_no_parens (select_clause (simple_select_intersect (simple_select_pramary (SELECT "SELECT") (target_list_ (target_list (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c"))))))))))))))))))))))))))) (COMMA ",") (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (func_expr (func_application (func_name (type_function_name (identifier (Identifier "sum")))) (OPEN_PAREN "(") (func_arg_list (func_arg_expr (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "a")))))))))))))))))))))))))))) (CLOSE_PAREN ")")))))))))))))))))))))))))))) (from_clause (FROM "FROM") (from_list (table_ref (relation_expr (qualified_name (colid (identifier (Identifier "pagg_tab")))))))) (group_clause (GROUP_P "GROUP") (BY "BY") (group_by_list (group_by_item (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (func_expr (func_application (func_name (type_function_name (unreserved_keyword (ROLLUP "rollup")))) (OPEN_PAREN "(") (func_arg_list (func_arg_expr (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c")))))))))))))))))))))))))))) (CLOSE_PAREN ")"))))))))))))))))))))))))))))))) (sort_clause_ (sort_clause (ORDER "ORDER") (BY "BY") (sortby_list (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "1")))))))))))))))))))))))))) (COMMA ",") (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "2")))))))))))))))))))))))))))))))) (SEMI ";"))) (EOF ""))

The problem here is described in the comments of gram.y:

To support CUBE and ROLLUP in GROUP BY without reserving them, we give them an explicit priority lower than '(', so that a rule with CUBE '(' will shift rather than reducing a conflicting rule that takes CUBE as a function name. Using the same precedence as IDENT seems right for the reasons given above.

The text was updated successfully, but these errors were encountered:

kaby76 · 2024-11-11T13:03:00Z

The parse of the ~200 files in the test suite takes about 8s. Of those files, 22 account for about 4s of the parse time, or roughly half of the total time. Each of these files has at least 2 ambiguities, but some with many more.

$ trperf -c aFT `./bin/Debug/net8.0/Test.exe ../examples/*.sql 2>&1 | sort -k5 -n | tail -22 | awk '{ print $3 }'` | grep -v '^0' | a
wk '{sum[$2] += $1; t[$2] = $3} END {for (key in sum) print sum[key], key, t[key]}'
Time to parse: 00:00:00.9262392
Time to parse: 00:00:00.1319337
Time to parse: 00:00:00.5084517
Time to parse: 00:00:00.3196001
Time to parse: 00:00:00.0623989
Time to parse: 00:00:00.0606131
Time to parse: 00:00:00.2406571
Time to parse: 00:00:00.0075491
Time to parse: 00:00:00.3437588
Time to parse: 00:00:00.2272951
Time to parse: 00:00:00.1225716
Time to parse: 00:00:00.1394496
Time to parse: 00:00:00.3754469
Time to parse: 00:00:00.3107465
Time to parse: 00:00:00.5222603
Time to parse: 00:00:00.3044987
Time to parse: 00:00:00.1419571
Time to parse: 00:00:00.1310350
Time to parse: 00:00:00.0001130
Time to parse: 00:00:00.0012748
Time to parse: 00:00:00.0001265
Time to parse: 00:00:00.0002339
2 ../examples/with.sql 0.014354
1 ../examples/partition_aggregate.sql 0.01336
73 ../examples/window.sql 0.179842
8 ../examples/rowsecurity.sql 0.798974
3 ../examples/opr_sanity.sql 0.990225
51 ../examples/groupingsets.sql 0.707481
7 ../examples/subselect.sql 0.601385
32 ../examples/join.sql 3.245541
6 ../examples/tsrf.sql 0.090786
10 ../examples/partition_join.sql 0.853429

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[postgresql] Ambiguity with ROLLUP. #4320

[postgresql] Ambiguity with ROLLUP. #4320

kaby76 commented Nov 11, 2024 •

edited

Loading

kaby76 commented Nov 11, 2024 •

edited

Loading

[postgresql] Ambiguity with ROLLUP. #4320

[postgresql] Ambiguity with ROLLUP. #4320

Comments

kaby76 commented Nov 11, 2024 • edited Loading

kaby76 commented Nov 11, 2024 • edited Loading

kaby76 commented Nov 11, 2024 •

edited

Loading

kaby76 commented Nov 11, 2024 •

edited

Loading