Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize nested unions #7481

Closed
universalmind303 opened this issue Sep 5, 2023 · 6 comments · Fixed by #7695
Closed

Optimize nested unions #7481

universalmind303 opened this issue Sep 5, 2023 · 6 comments · Fixed by #7695
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@universalmind303
Copy link
Contributor

universalmind303 commented Sep 5, 2023

Is your feature request related to a problem or challenge?

if your query contains many nested unions, it could result in an inefficient plan. If it is a union of unions, we can easily simplify that to a single union node.

Describe the solution you'd like

nested union nodes should be rewritten as a single union node

flowchart TD
    A[union] --> B[4]
    A --> C[union]
    C --> D[2]
    C --> E[1]
Loading
flowchart TD
    A[union] --> B[4]
    A --> D[2]
    A --> E[1]
Loading

Describe alternatives you've considered

None come to mind.

Additional context

For reference, polars optimizes these away.

pola-rs/polars#7855
pola-rs/polars#7861

https://github.com/influxdata/influxdb_iox/issues/7412

@universalmind303 universalmind303 added the enhancement New feature or request label Sep 5, 2023
@universalmind303
Copy link
Contributor Author

Additionally, a union of 1 item could always just drop the Union

flowchart TD
    A[union] --> B[scan]
Loading
flowchart TD
  B[scan]
Loading

@alamb
Copy link
Contributor

alamb commented Sep 19, 2023

@crepererum actually implemented these two physical optimizations in IOx: https://github.com/influxdata/influxdb_iox/tree/main/iox_query/src/physical_optimizer/union

It is probably a fairly straightforward exercise to copy them (and their tests) into DataFusion

Thus marking this as a good first issue as the code already exists

@alamb alamb added the good first issue Good for newcomers label Sep 19, 2023
@alamb
Copy link
Contributor

alamb commented Sep 19, 2023

@crepererum also notes that this could be a logical optimizer pass or a physical optimizer pass. IOx implemented it as a physical pass.

@maruschin
Copy link
Contributor

maruschin commented Sep 24, 2023

Hi, I can do it.

@universalmind303 can we create unary union after parsing sql query? I think that SELECT * FROM FOO UNION is not a valid query.

@alamb this logical optimization already exists in LogicalPlanBuilder::union method, should it be rewriten as logical optimizer step and removed from LogicalPlanBuilder::union?
https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/logical_plan/builder.rs#L1227

@alamb
Copy link
Contributor

alamb commented Sep 25, 2023

@alamb this logical optimization already exists in LogicalPlanBuilder::union method, should it be rewriten as logical optimizer step and removed from LogicalPlanBuilder::union?

@maruschin -- I think that sounds like an excellent idea

@maruschin
Copy link
Contributor

@alamb Everything is done, please look at the PR.
And, do I need to add a optimization of distinct nested unions, or should I do this in another PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants