-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5801] [core] Avoid creating nested directories. #4747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Cache the value of the local root dirs to use for storing local data, so that the same directory is reused. Also, to avoid an extra level of nesting, use a different env variable to propagate the local dirs from the Worker to the executors. And make the executor directory use a different name.
|
Test build #27903 has started for PR 4747 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are dumb questions, but why does the fix entail setting a different env variable? and should dirs be joined with a path separator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, Utils.scala would always create a subdirectory for the executor under the directory already created by the Worker. So you'd always have "spark-xxxxx/spark-yyyyy" for every executor. It would be always like that (no extra nesting), but, since I'm changing this, sounded like a good enhancement.
And dirs should be joined with a path separator because that's the right way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, this is path.separator not file.separator, : not /.
Yes that's exactly the fix that needs to be made.
EDIT: I see it. I swear it didn't show the last file changed earlier. But it was probably user error.
|
Hold off a little bit on this; let me see how to handle the SparkConf issue (I think Josh added that for the unit tests). |
|
Test build #27903 has finished for PR 4747 at commit
|
|
Test FAILed. |
|
retest this please. We really need to fix the |
|
Test build #27908 has started for PR 4747 at commit
|
Add a way for the test to work around the cache (even though the test was passing without this). Also add a comment that explains the new behavior of the `getOrCreateLocalRootDirs()`.
|
Test build #27909 has started for PR 4747 at commit
|
|
Test build #27908 has finished for PR 4747 at commit
|
|
Test PASSed. |
|
Test build #27909 has finished for PR 4747 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we can call this function to delete the local root directory in non-yarn mode when application is exited or SparkContext is stoped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this method doesn't delete anything. There's a separate PR to clean up the root directories (#4759).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to register these for cleanup at shutdown? cf. #4759 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed (see SPARK-4834). That's done by the maybeCleanupApplication in this file when the application finishes.
Cache the value of the local root dirs to use for storing local data,
so that the same directories are reused.
Also, to avoid an extra level of nesting, use a different env variable
to propagate the local dirs from the Worker to the executors. And make
the executor directory use a different name.