Skip to content

Conversation

@sandeep-katta
Copy link
Contributor

@sandeep-katta sandeep-katta commented Dec 6, 2019

What changes were proposed in this pull request?

Support DELETE JAR functionality in spark. On deletion the jar will be removed from the IsolatedClientLoader classpath and from SharedState classpath. It also removes the jar from addedJars map, so that next set of taskSet won't get these jars

Sequence Diagram

DeleteJarFlow

IsolatedClientLoader.deleteJar Deletes the jar from hiveClassLoader and recreates the classLaoder
SparkContext.deleteJar removes the jar from addedJars list
SharedState.deleteJar Removes the jar from the sessionState class Loader

Why are the changes needed?

If the jar definition is changed, use can delete the jar and add new one. This process does not require service to be restarted. Even Hive supports the DELETE jar feature.

Does this PR introduce any user-facing change?

Yes, new feature will be introduced to the user. Same will be updated in documentation as per the jira SPARK-30135

How was this patch tested?

Added UT and also tested maually with the following testcases

  1. Tested in spark-shell,spark-sql and beeline
  2. With no schema like /opt/somepath/somejar.jar, hdfs
  3. With fullpath or only with jarname
  4. dropping invalid jar
  5. Tested with Hive-2.3.6

@sandeep-katta
Copy link
Contributor Author

@wangyum @HyukjinKwon @dongjoon-hyun please review this

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yaooqinn
Copy link
Member

yaooqinn commented Dec 6, 2019

#13506, there is a similar one before

@sandeep-katta
Copy link
Contributor Author

@srowen @cloud-fan could you guys please review this feature

@srowen
Copy link
Member

srowen commented Dec 9, 2019

I don't really see the use case for this, and would prefer not to add yet another API method, as per the last PR. The semantics are kind of funny as it's not clear whether the classes are unloaded and when

@sandeep-katta
Copy link
Contributor Author

sandeep-katta commented Dec 9, 2019

Use case: if the jar’s function definition is changed , user can drop the jar and add back again without restarting the thrift server

Earlier for any changes done to the jar it is required to restart the thrift server as it was added to classpath

@melin
Copy link

melin commented Dec 13, 2019

Use case: if the jar’s function definition is changed , user can drop the jar and add back again without restarting the thrift server

Earlier for any changes done to the jar it is required to restart the thrift server as it was added to classpath

If the UDF function jar path does not change, only the update jar is overwritten, can you not start the thriftserver and take effect in real time?

@sandeep-katta
Copy link
Contributor Author

Use case: if the jar’s function definition is changed , user can drop the jar and add back again without restarting the thrift server
Earlier for any changes done to the jar it is required to restart the thrift server as it was added to classpath

If the UDF function jar path does not change, only the update jar is overwritten, can you not start the thriftserver and take effect in real time?

yes once you update the jar restarting the thriftserver will work

@melin
Copy link

melin commented Dec 16, 2019

yes once you update the jar restarting the thriftserver will work

restarting can take effect, can take effect without restarting?

@sandeep-katta
Copy link
Contributor Author

let me make it clear

This PR aims at solving following use case,.

user can update the jar definition and load to spark without restarting the thrift server.

let's say I have jar myfunction.jar and I loaded to spark using add jar command. And this jar definition is changed

Before this PR
I need to update the jar and restart the thrift server to take effect.

After this PR
Execute Delete JAR command, update the jar and execute ADD JAR command. No need of restarting the thrift server.

@AngersZhuuuu
Copy link
Contributor

@sandeep-katta
For your pr, you also nee to remove deleted jar in Executor.updateDependencies() method.
Since if you run delete jar, user may delete it(file). then when executor call updateDependencies()
it will failed will FileNotFound error and all the next task can't run.

@srowen srowen closed this Apr 28, 2020
@diaolimin
Copy link

@sandeep-katta
I encountered this problem. Which version is planned to be merged?

@sandeep-katta
Copy link
Contributor Author

@diaolimin unfortunately this fix is not merged, so you need to restart the thrift server to update the jar definition

@diaolimin
Copy link

@sandeep-katta I want to ask what is the problem? Can it be used in a production environment?

@sandeep-katta
Copy link
Contributor Author

@diaolimin consensus were reached not to add new API like delete jar, so it was not merged. And also it is not production ready as this PR targets only one part of the umbrella jira but more or less if you follow above design then you should be good to implement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants