-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30134][SQL] Support DELETE JAR feature in SPARK #26777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@wangyum @HyukjinKwon @dongjoon-hyun please review this |
|
Can one of the admins verify this patch? |
|
#13506, there is a similar one before |
|
@srowen @cloud-fan could you guys please review this feature |
|
I don't really see the use case for this, and would prefer not to add yet another API method, as per the last PR. The semantics are kind of funny as it's not clear whether the classes are unloaded and when |
|
Use case: if the jar’s function definition is changed , user can drop the jar and add back again without restarting the thrift server Earlier for any changes done to the jar it is required to restart the thrift server as it was added to classpath |
bcf1239 to
68e7330
Compare
If the UDF function jar path does not change, only the update jar is overwritten, can you not start the thriftserver and take effect in real time? |
yes once you update the jar restarting the thriftserver will work |
restarting can take effect, can take effect without restarting? |
|
let me make it clear This PR aims at solving following use case,. user can update the jar definition and load to spark without restarting the thrift server. let's say I have jar myfunction.jar and I loaded to spark using Before this PR After this PR |
|
@sandeep-katta |
|
@sandeep-katta |
|
@diaolimin unfortunately this fix is not merged, so you need to restart the thrift server to update the jar definition |
|
@sandeep-katta I want to ask what is the problem? Can it be used in a production environment? |
|
@diaolimin consensus were reached not to add new API like delete jar, so it was not merged. And also it is not production ready as this PR targets only one part of the umbrella jira but more or less if you follow above design then you should be good to implement |
What changes were proposed in this pull request?
Support DELETE JAR functionality in spark. On deletion the jar will be removed from the
IsolatedClientLoaderclasspath and fromSharedStateclasspath. It also removes the jar fromaddedJarsmap, so that next set of taskSet won't get these jarsSequence Diagram
IsolatedClientLoader.deleteJarDeletes the jar fromhiveClassLoaderand recreates the classLaoderSparkContext.deleteJarremoves the jar fromaddedJarslistSharedState.deleteJarRemoves the jar from the sessionState class LoaderWhy are the changes needed?
If the jar definition is changed, use can delete the jar and add new one. This process does not require service to be restarted. Even Hive supports the DELETE jar feature.
Does this PR introduce any user-facing change?
Yes, new feature will be introduced to the user. Same will be updated in documentation as per the jira SPARK-30135
How was this patch tested?
Added UT and also tested maually with the following testcases