Skip to content

Conversation

@xieshuaihu
Copy link

What changes were proposed in this pull request?

Similar with SPARK-44794, propagate JobArtifactState to broadcast/subquery thread.

This is an example:

val add1 = udf((i: Long) => i + 1)
val tableA = spark.range(2).alias("a")
val tableB = broadcast(spark.range(2).select(add1(col("id")).alias("id"))).alias("b")
tableA.join(tableB).
  where(col("a.id")===col("b.id")).
  select(col("a.id").alias("a_id"), col("b.id").alias("b_id")).
  collect().
  mkString("[", ", ", "]")

Before this pr, this example will throw exception ClassNotFoundException. Subquery and Broadcast execution use a separate ThreadPool which don't have the JobArtifactState.

Why are the changes needed?

Fix bug. Make Subquery/Broadcast thread work with Connect's artifact management.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add a new test to ReplE2ESuite

Was this patch authored or co-authored using generative AI tooling?

No

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jan 16, 2024

Merged to master.

@HyukjinKwon
Copy link
Member

It has a conflict in branch-3.5. Mind creating a backporting PR please?

@xieshuaihu
Copy link
Author

@HyukjinKwon Thanks. And a new backport pr has been created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants