-
Notifications
You must be signed in to change notification settings - Fork 198
[BLAZE-1149] Fix shuffle file permission issue when using BlazeShuffleManager #1148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fba170a to
f4d438c
Compare
e2618ac to
5c28632
Compare
|
|
||
| // Set the shuffle file permissions to 0644 to keep it consistent with the permissions of | ||
| // the built-in shuffler manager in Spark. | ||
| std::fs::set_permissions(path_ref, std::fs::Permissions::from_mode(0o644))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below code would be impacted by umask.
OpenOptions::new()
.write(true)
.create(true)
.truncate(true)
.mode(0o644)
.open(path_ref)?;
So, we have to use std::fs::set_permissions
1220267 to
b25e6f4
Compare
|
I have tested in our cluster, it can fetch the shuffle data after this patch. cc @richox |
|
gentle ping @richox |
b25e6f4 to
db728e5
Compare
|
i didnt see this setPermission logics in spark. so why do we need a different implementation here? |
|
Have updated the PR description. |
The directory permission created by the c++/rust method in a specific file system or operating system may be inconsistent with the directory permissions created by Spark through Java. |



Which issue does this PR close?
Closes #1149
Similar issue with gluten we met before, apache/incubator-gluten#9156
Rationale for this change
The purpose of this change is to ensure that the shuffle data files allowing read access for the others to fix shuffle fetch fail.
The error message on NodeManager:
Set the shuffle file permissions to 0644 to keep it consistent with the permissions of the built-in shuffler manager in Spark.
Related spark PR: apache/spark#35085
What changes are included in this PR?
Are there any user-facing changes?
No.
How was this patch tested?
Have tested in our cluster, the file permission looks good now.
It can fetch the shuffle data after this patch.
