Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[postgresql] Ambiguity with ROLLUP. #4320

Open
kaby76 opened this issue Nov 11, 2024 · 1 comment
Open

[postgresql] Ambiguity with ROLLUP. #4320

kaby76 opened this issue Nov 11, 2024 · 1 comment

Comments

@kaby76
Copy link
Contributor

kaby76 commented Nov 11, 2024

We're now getting into the more difficult ambiguities with the grammar. And, we're making great progress in the speed of the parse: It now takes ~9s to parse all the input, compared to ~42s (note, both tests are without parsing function bodies). Before disabling function body parsing, parsing the entire test suite was ~45s.

Consider the input SELECT c, sum(a) FROM pagg_tab GROUP BY rollup(c) ORDER BY 1, 2;. This can be parsed two ways.

11/10-19:25:02 ~/issues/g4-more-postgresql/sql/postgresql/Generated-CSharp
$ trparse -i 'SELECT c, sum(a) FROM pagg_tab GROUP BY rollup(c) ORDER BY 1, 2;' --ambig | trtree -a
CSharp 0 string success 0.1997469
(root (stmtblock (stmtmulti (stmt (selectstmt (select_no_parens (select_clause (simple_select_intersect (simple_select_pramary (SELECT "SELECT") (target_list_ (target_list (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c"))))))))))))))))))))))))))) (COMMA ",") (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (func_expr (func_application (func_name (type_function_name (identifier (Identifier "sum")))) (OPEN_PAREN "(") (func_arg_list (func_arg_expr (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "a")))))))))))))))))))))))))))) (CLOSE_PAREN ")")))))))))))))))))))))))))))) (from_clause (FROM "FROM") (from_list (table_ref (relation_expr (qualified_name (colid (identifier (Identifier "pagg_tab")))))))) (group_clause (GROUP_P "GROUP") (BY "BY") (group_by_list (group_by_item (rollup_clause (ROLLUP "rollup") (OPEN_PAREN "(") (expr_list (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c"))))))))))))))))))))))))))) (CLOSE_PAREN ")")))))))) (sort_clause_ (sort_clause (ORDER "ORDER") (BY "BY") (sortby_list (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "1")))))))))))))))))))))))))) (COMMA ",") (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "2")))))))))))))))))))))))))))))))) (SEMI ";"))) (EOF ""))
(root (stmtblock (stmtmulti (stmt (selectstmt (select_no_parens (select_clause (simple_select_intersect (simple_select_pramary (SELECT "SELECT") (target_list_ (target_list (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c"))))))))))))))))))))))))))) (COMMA ",") (target_el (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (func_expr (func_application (func_name (type_function_name (identifier (Identifier "sum")))) (OPEN_PAREN "(") (func_arg_list (func_arg_expr (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "a")))))))))))))))))))))))))))) (CLOSE_PAREN ")")))))))))))))))))))))))))))) (from_clause (FROM "FROM") (from_list (table_ref (relation_expr (qualified_name (colid (identifier (Identifier "pagg_tab")))))))) (group_clause (GROUP_P "GROUP") (BY "BY") (group_by_list (group_by_item (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (func_expr (func_application (func_name (type_function_name (unreserved_keyword (ROLLUP "rollup")))) (OPEN_PAREN "(") (func_arg_list (func_arg_expr (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (columnref (colid (identifier (Identifier "c")))))))))))))))))))))))))))) (CLOSE_PAREN ")"))))))))))))))))))))))))))))))) (sort_clause_ (sort_clause (ORDER "ORDER") (BY "BY") (sortby_list (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "1")))))))))))))))))))))))))) (COMMA ",") (sortby (a_expr (a_expr_qual (a_expr_lessless (a_expr_or (a_expr_and (a_expr_between (a_expr_in (a_expr_unary_not (a_expr_isnull (a_expr_is_not (a_expr_compare (a_expr_like (a_expr_qual_op (a_expr_unary_qualop (a_expr_add (a_expr_mul (a_expr_caret (a_expr_unary_sign (a_expr_at_time_zone (a_expr_collate (a_expr_typecast (c_expr (aexprconst (iconst (Integral "2")))))))))))))))))))))))))))))))) (SEMI ";"))) (EOF ""))

The problem here is described in the comments of gram.y:

To support CUBE and ROLLUP in GROUP BY without reserving them, we give them an explicit priority lower than '(', so that a rule with CUBE '(' will shift rather than reducing a conflicting rule that takes CUBE as a function name. Using the same precedence as IDENT seems right for the reasons given above.

@kaby76
Copy link
Contributor Author

kaby76 commented Nov 11, 2024

The parse of the ~200 files in the test suite takes about 8s. Of those files, 22 account for about 4s of the parse time, or roughly half of the total time. Each of these files has at least 2 ambiguities, but some with many more.

$ trperf -c aFT `./bin/Debug/net8.0/Test.exe ../examples/*.sql 2>&1 | sort -k5 -n | tail -22 | awk '{ print $3 }'` | grep -v '^0' | a
wk '{sum[$2] += $1; t[$2] = $3} END {for (key in sum) print sum[key], key, t[key]}'
Time to parse: 00:00:00.9262392
Time to parse: 00:00:00.1319337
Time to parse: 00:00:00.5084517
Time to parse: 00:00:00.3196001
Time to parse: 00:00:00.0623989
Time to parse: 00:00:00.0606131
Time to parse: 00:00:00.2406571
Time to parse: 00:00:00.0075491
Time to parse: 00:00:00.3437588
Time to parse: 00:00:00.2272951
Time to parse: 00:00:00.1225716
Time to parse: 00:00:00.1394496
Time to parse: 00:00:00.3754469
Time to parse: 00:00:00.3107465
Time to parse: 00:00:00.5222603
Time to parse: 00:00:00.3044987
Time to parse: 00:00:00.1419571
Time to parse: 00:00:00.1310350
Time to parse: 00:00:00.0001130
Time to parse: 00:00:00.0012748
Time to parse: 00:00:00.0001265
Time to parse: 00:00:00.0002339
2 ../examples/with.sql 0.014354
1 ../examples/partition_aggregate.sql 0.01336
73 ../examples/window.sql 0.179842
8 ../examples/rowsecurity.sql 0.798974
3 ../examples/opr_sanity.sql 0.990225
51 ../examples/groupingsets.sql 0.707481
7 ../examples/subselect.sql 0.601385
32 ../examples/join.sql 3.245541
6 ../examples/tsrf.sql 0.090786
10 ../examples/partition_join.sql 0.853429

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant