-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27225][SQL] Implement join strategy hints #24164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1426294
e77c9f3
f198dfb
d25822d
4a13ffe
407c63f
6bd7f56
c535d36
e533ac2
7342fbd
0912997
c0b217c
4a48286
a9634c4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -66,17 +66,94 @@ object JoinHint { | |
| /** | ||
| * The hint attributes to be applied on a specific node. | ||
| * | ||
| * @param broadcast If set to true, it indicates that the broadcast hash join is the preferred join | ||
| * strategy and the node with this hint is preferred to be the build side. | ||
| * @param strategy The preferred join strategy. | ||
| */ | ||
| case class HintInfo(broadcast: Boolean = false) { | ||
| case class HintInfo(strategy: Option[JoinStrategyHint] = None) { | ||
|
|
||
| override def toString: String = { | ||
| val hints = scala.collection.mutable.ArrayBuffer.empty[String] | ||
| if (broadcast) { | ||
| hints += "broadcast" | ||
| /** | ||
| * Combine this [[HintInfo]] with another [[HintInfo]] and return the new [[HintInfo]]. | ||
| * @param other the other [[HintInfo]] | ||
| * @param hintOverriddenCallback a callback to notify if any [[HintInfo]] has been overridden | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we create a hint merging strategy framework, I think it will not be an arbitrary callback. Shall we make it simple now and leave it for future design? Then we can just log message inside this method. |
||
| * in this merge. | ||
| * | ||
| * Currently, for join strategy hints, the new [[HintInfo]] will contain the strategy in this | ||
| * [[HintInfo]] if defined, otherwise the strategy in the other [[HintInfo]]. The | ||
| * `hintOverriddenCallback` will be called if this [[HintInfo]] and the other [[HintInfo]] | ||
| * both have a strategy defined but the join strategies are different. | ||
| */ | ||
| def merge(other: HintInfo, hintOverriddenCallback: HintInfo => Unit): HintInfo = { | ||
| if (this.strategy.isDefined && | ||
| other.strategy.isDefined && | ||
| this.strategy.get != other.strategy.get) { | ||
| hintOverriddenCallback(other) | ||
| } | ||
|
|
||
| if (hints.isEmpty) "none" else hints.mkString("(", ", ", ")") | ||
| HintInfo(strategy = this.strategy.orElse(other.strategy)) | ||
| } | ||
|
|
||
| override def toString: String = strategy.map(s => s"(strategy=$s)").getOrElse("none") | ||
| } | ||
|
|
||
| sealed abstract class JoinStrategyHint { | ||
|
|
||
| def displayName: String | ||
| def hintAliases: Set[String] | ||
|
|
||
| override def toString: String = displayName | ||
| } | ||
|
|
||
| /** | ||
| * The enumeration of join strategy hints. | ||
| * | ||
| * The hinted strategy will be used for the join with which it is associated if doable. In case | ||
| * of contradicting strategy hints specified for each side of the join, hints are prioritized as | ||
| * BROADCAST over SHUFFLE_MERGE over SHUFFLE_HASH over SHUFFLE_REPLICATE_NL. | ||
| */ | ||
| object JoinStrategyHint { | ||
|
|
||
| val strategies: Set[JoinStrategyHint] = Set( | ||
| BROADCAST, | ||
| SHUFFLE_MERGE, | ||
| SHUFFLE_HASH, | ||
| SHUFFLE_REPLICATE_NL) | ||
| } | ||
|
|
||
| /** | ||
| * The hint for broadcast hash join or broadcast nested loop join, depending on the availability of | ||
| * equi-join keys. | ||
| */ | ||
| case object BROADCAST extends JoinStrategyHint { | ||
| override def displayName: String = "broadcast" | ||
| override def hintAliases: Set[String] = Set( | ||
| "BROADCAST", | ||
| "BROADCASTJOIN", | ||
| "MAPJOIN") | ||
| } | ||
|
|
||
| /** | ||
| * The hint for shuffle sort merge join. | ||
| */ | ||
| case object SHUFFLE_MERGE extends JoinStrategyHint { | ||
| override def displayName: String = "merge" | ||
| override def hintAliases: Set[String] = Set( | ||
| "SHUFFLE_MERGE", | ||
| "MERGE", | ||
| "MERGEJOIN") | ||
| } | ||
|
|
||
| /** | ||
| * The hint for shuffle hash join. | ||
| */ | ||
| case object SHUFFLE_HASH extends JoinStrategyHint { | ||
| override def displayName: String = "shuffle_hash" | ||
| override def hintAliases: Set[String] = Set( | ||
| "SHUFFLE_HASH") | ||
| } | ||
|
|
||
| /** | ||
| * The hint for shuffle-and-replicate nested loop join, a.k.a. cartesian product join. | ||
| */ | ||
| case object SHUFFLE_REPLICATE_NL extends JoinStrategyHint { | ||
| override def displayName: String = "shuffle_replicate_nl" | ||
| override def hintAliases: Set[String] = Set( | ||
| "SHUFFLE_REPLICATE_NL") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This hint for cartesian products is useful for users?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. In the default logic, broadcast-nl is prioritized over shuffle-replicate-nl (cartesian-product), so this can be used for special cases where shuffle-replicate-nl is favored.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we might need a code comment to explain SHUFFLE_REPLICATE_NL is |
||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a little weird to see this method being defined twice. Can we just log the message inside
HintInfo.merge?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to have a centralized handler for all kinds of hint events/errors, and the action, whether to log warnings/errors or to throw exceptions, can be configurable. WDYT?