-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-17386. Change default fs.s3a.buffer.dir to be under Yarn container path on yarn applications #3908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17386. Change default fs.s3a.buffer.dir to be under Yarn container path on yarn applications #3908
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
a7eb3a4 to
27d64c4
Compare
This comment was marked as outdated.
This comment was marked as outdated.
27d64c4 to
f5efaa8
Compare
|
Rebased and tested in Test result |
|
💔 -1 overall
This message was automatically generated. |
aajisaka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @monthonk
|
Hi @steveloughran, would you review this PR? |
|
@aajisaka Thank you for your time reviewing my PR! |
…iner path on yarn applications
f5efaa8 to
456c576
Compare
|
💔 -1 overall
This message was automatically generated. |
|
It is important that HADOOP-17631 is in first, otherwise am applications running in secure mode won't get a valid path here. given that is the case on this branch, all is good here. |
…iner path on yarn applications (apache#3908) Co-authored-by: Monthon Klongklaew <[email protected]> Signed-off-by: Akira Ajisaka <[email protected]>
…iner path on yarn applications (#3908) Co-authored-by: Monthon Klongklaew <[email protected]> Signed-off-by: Akira Ajisaka <[email protected]>
Description of PR
fs.s3a.buffer.dir defaults to hadoop.tmp.dir which is /tmp or similar. A lot of systems don't clean up /tmp until reboot -and if they stay up for a long time then they accrue files written through s3a staging committer from spark containers which fail.
Fix: use ${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a as the option so that if env.LOCAL_DIRS is set is used over hadoop.tmp.dir. YARN-deployed apps will use that for the buffer dir. When the app container is destroyed, so is the directory.
How was this patch tested?
Injected LOCAL_DIRS env and verified that it was picked up by S3A. Also when it is not set, verified that hadoop.tmp.dir would be used as a fallback.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?