Skip to content

Commit 55e9dd6

Browse files
mateizmarmbrus
authored andcommitted
[SPARK-3084] [SQL] Collect broadcasted tables in parallel in joins
BroadcastHashJoin has a broadcastFuture variable that tries to collect the broadcasted table in a separate thread, but this doesn't help because it's a lazy val that only gets initialized when you attempt to build the RDD. Thus queries that broadcast multiple tables would collect and broadcast them sequentially. I changed this to a val to let it start collecting right when the operator is created. Author: Matei Zaharia <[email protected]> Closes apache#1990 from mateiz/spark-3084 and squashes the following commits: f468766 [Matei Zaharia] [SPARK-3084] Collect broadcasted tables in parallel in joins (cherry picked from commit 6a13dca) Signed-off-by: Michael Armbrust <[email protected]>
1 parent ec0b91e commit 55e9dd6

File tree

1 file changed

+1
-1
lines changed
  • sql/core/src/main/scala/org/apache/spark/sql/execution

1 file changed

+1
-1
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -424,7 +424,7 @@ case class BroadcastHashJoin(
424424
UnspecifiedDistribution :: UnspecifiedDistribution :: Nil
425425

426426
@transient
427-
lazy val broadcastFuture = future {
427+
val broadcastFuture = future {
428428
sparkContext.broadcast(buildPlan.executeCollect())
429429
}
430430

0 commit comments

Comments
 (0)