[SPARK-45856] Move ArtifactManager from Spark Connect into SparkSession (sql/core) #43735

vicennial · 2023-11-09T15:55:50Z

What changes were proposed in this pull request?

The significant changes in this PR include:

SparkConnectArtifactManager is renamed to ArtifactManager and moved out of Spark Connect and into sql/core (available in SparkSession through SessionState) along with all corresponding tests and confs.
While ArtifactManager in part of SparkSession, we keep the legacy behaviour for non-connect spark while utilising the ArtifactManager in connect pathways
- This is done by exposing a new method withResources in the artifact manager that sets the context class loader (for driver-side operations) and propagates the JobArtifactState such that the resources reach the executor.
- Spark Connect pathways utilise this method through the SessionHolder#withActive
- When withResources is not used, neither the custom context classloader nor the JobArtifactState is propagated and hence, non Spark Connect pathways remain with legacy behaviour.

Why are the changes needed?

The ArtifactManager that currently lies in the connect package can be moved into the wider sql/core package (e.g SparkSession) to expand the scope. This is possible because the ArtifactManager is tied solely to the SparkSession#sessionUUID and hence can be cleanly detached from Spark Connect and be made generally available.

Does this PR introduce any user-facing change?

No. Existing behaviour is kept intact for both non-connect and connect spark.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

…on (sql/core)

vicennial · 2023-11-14T11:56:03Z

PTAL @hvanhovell @HyukjinKwon

hvanhovell

LGTM

dongjoon-hyun

According to the JIRA, this is only for Apache Spark 4.0.0, right?

dongjoon-hyun

We still need to support configurations in a way because we didn't deprecate yet in the Apache Spark community.

spark.connect.copyFromLocalToFs.allowDestLocal

dongjoon-hyun · 2023-11-15T21:49:24Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

+   * Returns an `ArtifactManager` that supports adding, managing and using session-scoped artifacts
+   * (jars, classfiles, etc).
+   *
+   * @since 3.5.1


This should be 4.0.0 because this PR is for Apache Spark 4.0.0, @vicennial .

dongjoon-hyun · 2023-11-17T21:44:25Z

Could you re-trigger the failed pipelines?

vicennial · 2023-11-19T13:06:48Z

@dongjoon-hyun Thank you for the review!

I've updated the version and for the deprecated conf spark.connect.copyFromLocalToFs.allowDestLocal, I've added it to the deprecatedSQLConfigs list so that it would generate a warning if used. Currently, both configs would work but if the new config is set (currently optional), it would override the deprecated config. Eventually, we should remove the deprecated conf and convert the new config from optional to have a default value.

vicennial · 2023-11-20T09:29:03Z

Hmm, JavaDocGeneration is failing in 2 tests and from the logs, its not clear why...
I will merge master and see if that helps

vicennial · 2023-11-20T22:18:39Z

@dongjoon-hyun The CI is green now :)

HyukjinKwon · 2023-11-21T01:39:14Z

Merged to master.

fhalde · 2023-11-22T03:24:23Z

Hi, we are super interested in having the isolated classloader per spark session ability for our usecase. i believe this today is only achievable if jobs are run from a connect client. we want to avoid using connect client

but with this pr merged, it should be possible to have isolated classloaders per spark session on the executors right?

our use-case involves starting a spark driver and dynamically loading/adding jars and running transformations present within the jars. without isolated classloaders per session, on the executor side we would risk classpath conflict

@HyukjinKwon if your pr addresses my concern, i can back port it to 3.5.0 in my fork

vicennial · 2023-11-28T13:17:22Z

@fhalde Yes, with this PR, it would be possible to have isolated classloaders per spark session on the executors without going through Spark Connect.

The withResources method should be used to wrap all executions (like in Spark Connect here) and note that, all artifacts (i.e Jars, classfiles) would need to be added through the ArtifactManager (directly adding these to SparkContext would not work), refer to Spark Connect's AddArtifactsHandler to see how we use the API.

LuciferYang · 2023-12-13T11:27:50Z

project/MimaExcludes.scala

    ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.streaming.StreamingQueryException"),

+    // SPARK-45856: Move ArtifactManager from Spark Connect into SparkSession (sql/core)
+    ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.CacheId.apply"),


@vicennial I would like to reconfirm, the ProblemFilters added by SPARK-45856 will never need to undergo a mima check in versions after Spark 4.0, is that correct? Or is this just the ProblemFilters added for the mima check between Spark 4.0 and Spark 3.5?I found that it has been placed in defaultExcludes.

### What changes were proposed in this pull request? This jar was added in #42069 but moved in #43735. ### Why are the changes needed? To clean up a jar not used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests should check ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47315 from HyukjinKwon/minor-cleanup-jar-2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Martin Grund <[email protected]>

### What changes were proposed in this pull request? This jar was added in apache#42069 but moved in apache#43735. ### Why are the changes needed? To clean up a jar not used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests should check ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47315 from HyukjinKwon/minor-cleanup-jar-2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Martin Grund <[email protected]>

[SPARK-45856] Move ArtifactManager from Spark Connect into SparkSessi…

c36640b

…on (sql/core)

github-actions bot added SQL ML CORE PYTHON CONNECT labels Nov 9, 2023

vicennial marked this pull request as ready for review November 10, 2023 13:42

vicennial added 2 commits November 10, 2023 15:28

Merge branch 'master' into SPARK-45856

e4ab2f0

mima exclusions

a48fdc2

github-actions bot added the BUILD label Nov 10, 2023

vicennial added 2 commits November 13, 2023 13:37

fix lint + javadoc

02d2c11

mima + javadoc fixes

3078139

vicennial added 2 commits November 15, 2023 16:11

Merge branch 'master' into SPARK-45856

1a8f962

fix merge conflict

e57c688

hvanhovell approved these changes Nov 15, 2023

View reviewed changes

dongjoon-hyun reviewed Nov 15, 2023

View reviewed changes

dongjoon-hyun requested changes Nov 15, 2023

View reviewed changes

update ver + support deprecated config

33f7201

Merge branch 'master' into SPARK-45856

93d5764

vicennial requested a review from dongjoon-hyun November 20, 2023 22:18

HyukjinKwon approved these changes Nov 21, 2023

View reviewed changes

HyukjinKwon closed this in 434aa30 Nov 21, 2023

LuciferYang reviewed Dec 13, 2023

View reviewed changes

This was referenced Jul 11, 2024

[MINOR][TESTS] Remove unused test jar (udf_noA.jar) #47309

Merged

[MINOR][TESTS] Remove unused test jar (udf_noA.jar) #47315

Closed

[SPARK-45856] Move ArtifactManager from Spark Connect into SparkSession (sql/core) #43735

[SPARK-45856] Move ArtifactManager from Spark Connect into SparkSession (sql/core) #43735

Uh oh!

Conversation

vicennial commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

vicennial commented Nov 14, 2023

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Nov 17, 2023

Uh oh!

vicennial commented Nov 19, 2023

Uh oh!

vicennial commented Nov 20, 2023

Uh oh!

vicennial commented Nov 20, 2023

Uh oh!

HyukjinKwon commented Nov 21, 2023

Uh oh!

fhalde commented Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vicennial commented Nov 28, 2023

Uh oh!

LuciferYang Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vicennial commented Nov 9, 2023 •

edited

Loading

fhalde commented Nov 22, 2023 •

edited

Loading

LuciferYang Dec 13, 2023 •

edited

Loading