Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnest UNION plans (in addition to UNION ALL) #7786

Closed
alamb opened this issue Oct 10, 2023 · 3 comments · Fixed by #7788
Closed

Unnest UNION plans (in addition to UNION ALL) #7786

alamb opened this issue Oct 10, 2023 · 3 comments · Fixed by #7788
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Oct 10, 2023

Is your feature request related to a problem or challenge?

The current the plan for a nested UNION is still nested resulting in unecessary GROUPing

❯ explain SELECT 1 UNION (SELECT 1 UNION SELECT 1);
+---------------+-----------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                    |
+---------------+-----------------------------------------------------------------------------------------+
| logical_plan  | Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                              |
|               |   Union                                                                                 |
|               |     Projection: Int64(1)                                                                |
|               |       EmptyRelation                                                                     |
|               |     Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                          |
|               |       Union                                                                             |
|               |         Projection: Int64(1)                                                            |
|               |           EmptyRelation                                                                 |
|               |         Projection: Int64(1)                                                            |
|               |           EmptyRelation                                                                 |

Describe the solution you'd like

As suggested by @jackwener: #7695 (comment), ideally the plan should look like the following (the unions should be unnested and run through a single group by):

❯ explain SELECT 1 UNION (SELECT 1 UNION SELECT 1);
+---------------+-----------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                    |
+---------------+-----------------------------------------------------------------------------------------+
| logical_plan  | Aggregate: groupBy=[[Int64(1)]], aggr=[[]]                                              |
|               |   Union                                                                                 |
|               |     Projection: Int64(1)                                                                |
|               |       EmptyRelation                                                                     |
|               |     Projection: Int64(1)                                                                |
|               |       EmptyRelation                                                                     |
|               |     Projection: Int64(1)                                                                |
|               |       EmptyRelation                                                                     |

Describe alternatives you've considered

No response

Additional context

@maruschin added a nice optimization (as well as very nice tests) to unnest UNION ALL in #7695

❯ explain SELECT 1 UNION ALL (SELECT 1 UNION ALL SELECT 1);
+---------------+----------------------------------------+
| plan_type     | plan                                   |
+---------------+----------------------------------------+
| logical_plan  | Union                                  |
|               |   Projection: Int64(1) AS Int64(1)     |
|               |     EmptyRelation                      |
|               |   Projection: Int64(1) AS Int64(1)     |
|               |     EmptyRelation                      |
|               |   Projection: Int64(1) AS Int64(1)     |
|               |     EmptyRelation                      |
@maruschin
Copy link
Contributor

maruschin commented Oct 10, 2023

I added draft PR. There's a lot of code duplication. I’m thinking about how best to do it: combine this rule with the previous one or separate the common code into a function.

@alamb
Copy link
Contributor Author

alamb commented Oct 11, 2023

combine this rule with the previous one i

This seems natural to me if possible

@maruschin
Copy link
Contributor

maruschin commented Oct 11, 2023

Done, refactored nested union optimization, added tests.
Generalized nested distinct optimization: all distinct operations removed from distinct union, even select distinct.
I think that this case should not be done separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants