-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-45856] Move ArtifactManager from Spark Connect into SparkSession (sql/core) #43735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
PTAL @hvanhovell @HyukjinKwon |
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the JIRA, this is only for Apache Spark 4.0.0, right?
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to support configurations in a way because we didn't deprecate yet in the Apache Spark community.
spark.connect.copyFromLocalToFs.allowDestLocal
| * Returns an `ArtifactManager` that supports adding, managing and using session-scoped artifacts | ||
| * (jars, classfiles, etc). | ||
| * | ||
| * @since 3.5.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 4.0.0 because this PR is for Apache Spark 4.0.0, @vicennial .
|
Could you re-trigger the failed pipelines? |
|
@dongjoon-hyun Thank you for the review! I've updated the version and for the deprecated conf |
|
Hmm, JavaDocGeneration is failing in 2 tests and from the logs, its not clear why... |
|
@dongjoon-hyun The CI is green now :) |
|
Merged to master. |
|
Hi, we are super interested in having the isolated classloader per spark session ability for our usecase. i believe this today is only achievable if jobs are run from a connect client. we want to avoid using connect client but with this pr merged, it should be possible to have isolated classloaders per spark session on the executors right? our use-case involves starting a spark driver and dynamically loading/adding jars and running transformations present within the jars. without isolated classloaders per session, on the executor side we would risk classpath conflict @HyukjinKwon if your pr addresses my concern, i can back port it to 3.5.0 in my fork |
|
@fhalde Yes, with this PR, it would be possible to have isolated classloaders per spark session on the executors without going through Spark Connect. The |
| ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.streaming.StreamingQueryException"), | ||
|
|
||
| // SPARK-45856: Move ArtifactManager from Spark Connect into SparkSession (sql/core) | ||
| ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.CacheId.apply"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vicennial I would like to reconfirm, the ProblemFilters added by SPARK-45856 will never need to undergo a mima check in versions after Spark 4.0, is that correct? Or is this just the ProblemFilters added for the mima check between Spark 4.0 and Spark 3.5?I found that it has been placed in defaultExcludes.
### What changes were proposed in this pull request? This jar was added in #42069 but moved in #43735. ### Why are the changes needed? To clean up a jar not used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests should check ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47315 from HyukjinKwon/minor-cleanup-jar-2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Martin Grund <[email protected]>
### What changes were proposed in this pull request? This jar was added in apache#42069 but moved in apache#43735. ### Why are the changes needed? To clean up a jar not used. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests should check ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47315 from HyukjinKwon/minor-cleanup-jar-2. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Martin Grund <[email protected]>
What changes were proposed in this pull request?
The significant changes in this PR include:
SparkConnectArtifactManageris renamed toArtifactManagerand moved out of Spark Connect and intosql/core(available inSparkSessionthroughSessionState) along with all corresponding tests and confs.ArtifactManagerin part of SparkSession, we keep the legacy behaviour for non-connect spark while utilising the ArtifactManager in connect pathwayswithResourcesin the artifact manager that sets the context class loader (for driver-side operations) and propagates theJobArtifactStatesuch that the resources reach the executor.SessionHolder#withActivewithResourcesis not used, neither the custom context classloader nor theJobArtifactStateis propagated and hence, non Spark Connect pathways remain with legacy behaviour.Why are the changes needed?
The
ArtifactManagerthat currently lies in the connect package can be moved into the wider sql/core package (e.g SparkSession) to expand the scope. This is possible because theArtifactManageris tied solely to theSparkSession#sessionUUIDand hence can be cleanly detached from Spark Connect and be made generally available.Does this PR introduce any user-facing change?
No. Existing behaviour is kept intact for both non-connect and connect spark.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.