Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement broadcast join optimization #348

Open
Dandandan opened this issue Oct 14, 2022 · 0 comments
Open

Implement broadcast join optimization #348

Dandandan opened this issue Oct 14, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@Dandandan
Copy link
Contributor

Dandandan commented Oct 14, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When we support broadcast exchanges #342 we can transform certain joins to utilize it.

Describe the solution you'd like

Currently all plans involving hash joins look like the following.

HashJoin <- RemoteExchange (partitioned) <- build side input
         <- RemoteExchange (partitioned) <- probe side input

When the build side is small (e.g. Spark uses 10MB * number of partitions for this by default - but generally bigger can help as well in my experience).

The new plan after optimization looks like this (note the missing exchange in the probe side, that side doesn't require shuffling now)

HashJoin <- BroadcastExchange <- build side input
         <- probe side input

Describe alternatives you've considered

Implement the (physical) optimization rule. The rule should run after the HashBuildProbeOrder rule from DataFusion.

Additional context

@Dandandan Dandandan added the enhancement New feature or request label Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant