-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-26887 Make sure dirPath has the correct permissions #3894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @skysiders . Thank you for sending a pull request.
I would not expect this patch to be necessary. As you said, the permissions will generally be 700 (governed by hive.scratch.dir.permission). The typical umask is 022 (governed by fs.permissions.umask-mode), though a few spots in the Hadoop code use something more restrictive. This umask will have no effect on these permissions. I can't think of any reasonable values of hive.scratch.dir.permission and fs.permissions.umask-mode for which we should force the permissions instead of respecting the umask. Have you seen any problems related to this in practice that needed this patch?
The proposed fix is potentially incomplete from a security perspective, because there is still a small window of time in which the directory has been created but the additional setPermission hasn't run yet. If the initial permissions are weak enough, then an authorized user could slip in and access the directory.
On a side note, a shorter way to code this is the single-line call to static method FileSystem.mkdirs(fs, dir, permission). However, the internal implementation is still 2 steps: 1) create directory, 2) set permissions, so it's subject to the same problems. FileSystem doesn't offer any way to circumvent umask atomically.
|
Hi @cnauroth ,thank for your review! I have mentioned similar flaws in STORM-3862 ,TEZ-4412,HBASE-26994 Finally, regarding the FileSystem.mkdirs(fs, path, perm) you mentioned, I agree with your point of view, but what I still want to say is that the idea of this kind of repair is not to set permissions on files, but out of file permissions Check, as shown in the code, if the permissions are correct, we don't need to set permissions on it again. |
|
Hi @cnauroth ,I thought about this question again, and I think that compared with FileSystem.mkdirs(fs,path,perms) mentioned by you, my patch may be a little faster. The mkdirs(fs,path,perms) function does two things, first creating a file with default permissions and then assigning permissions to that file. What I do can be divided into three steps, first create the file, then determine if the file permissions are correct, if not set permissions. I think the reason for the speed here is that the permission judgment is faster than setting the permission directly, only if the file itself has the same permission and the permission to set, it can be about 4 times faster. According to what you said, this patch should modify all mkdirs(path,perms) into mkdirs(fs,path,perms), but as I just said, I think it will be a little slower than my current patch, but this kind of file creation is also rigorous.I‘m looking forward to your reply. |
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skysiders , thank you for sharing the thought process, and you have a good point about the potential performance difference.
I still think these changes are unnecessary in any reasonable deployment model. In practice, it will only make a difference for deployments with non-zero in the first digit (owner permissions) of fs.permissions.umask-mode. I can't think of a good reason an admin would restrict the owner's access to their own directory.
You mentioned that perhaps this code could evolve someday to use 770 instead of 700, in which case the default fs.permissions.umask-mode would restrict it to 750. I think that's unlikely, as these code paths by design involve directories isolated to the individual user, not intended for sharing to the group or other levels. Even if that did happen, I'd argue that the umask is serving its intended purpose: providing additional restrictions according to the admin's security policies.
If an application really wants to make a design choice to circumvent umask (and make sure it carefully covers all of the associated risks with that), then I think it should be approached as initializing a FileSystem with a Configuration override of fs.permissions.umask-mode=000, ignoring whatever is configured in core-site.xml. Then, the application can be confident that it's going to get exactly the permissions it requested. Even better, this will be guaranteed correct with a single RPC: just mkdirs instead of mkdirs + setPermission.
Thank you for sharing the links to pull requests in other projects too. I would have had the same feedback there, but I see those are already committed.
Thank you for the patch! The code looks like a correct implementation of what you described. (Have you looked into whether or not it's possible to add tests?) I just entered one more minor comment about formatting.
I am -0 for this, meaning I disagree but I won't oppose it. I'm also not a Hive committer, so my vote is non-binding. You should look for a review from a Hive committer if you want to keep pursuing this.
| fs.mkdirs(cacheDirPath, fsPermission); | ||
|
|
||
| if (!fsPermission.equals(fsPermission.applyUMask(FsPermission.getUMask(conf)))) { | ||
| fs.setPermission(cacheDirPath, fsPermission); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is off on this line.
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skysiders , I had some more thoughts on this today. There is some established precedent for overriding permissions exactly like this elsewhere in the ecosystem, e.g. the code we are looking at in MAPREDUCE-7375 / apache/hadoop#4237. I forgot about that while looking at the Hive code in isolation.
I still have some disagreements with the concept as described in my last comment, but I'm converting to +1 for the sake of consistency across projects. Again, that's non-binding, so you'll need to seek out a committer for review too. (Reminder: I also pointed out an indentation problem to correct.)
Thank you for the patch, and thanks for bearing with me while I reason through it.
|
Hi @cnauroth , thanks for your review. I'm glad to see your reply. In the MAPREDUCE-7375 you mentioned, I saw your approval of this fix, as well as the extremely special situation that you mentioned earlier that the umask would not be set to 777, and I very much agree with these. The repairs I submitted in other projects are also based on the way of thinking I mentioned above, that is, after we create a file, should we check and judge the permissions of this file and grant permissions to it. I think this matter is necessary, because as an upper-level application, it is difficult to determine the umask setting of the underlying file system (for example, the umask under Linux is set by the system administrator root, and the umask of the distributed filesystem is set by the underlying hdfs The user decides when starting dfs, while hive or other systems are run by program administrators, and these two users may not be the same user) This is my most fundamental point of view, and it is also the reason why I insist on this repair. I would like to discuss this issue with you further. Do you think that this repair method should be applied to those permissions of 770 instead of permissions like 700 (because 770 may be defaulted by the file system umask=022 affect)? Finally, I am very happy to have such an in-depth discussion with you on the issue of file permissions. Thank you for your views on my point of view. Hi @abstractdog @ayushtkn , could you please have a look at this? |
|
Kudos, SonarCloud Quality Gate passed! |
|
Hi @cnauroth @abstractdog @ayushtkn , I reviewed this part of the code again today, and I found that a previous patch here is HIVE-4487, we can clearly see that the developer deliberately set permissions here, and then this code was modified again in HIVE-8015 I used to be in HBASE-26994 mentioned the concept of API misuse, I think the second patch here should be the misuse of API. |
|
Hi @zabetak , could you please have a look at this? |
cnauroth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 (non-binding)
Thank you for responding to the code review feedback, @skysiders .
|
Thanks for the elaborate analysis and discussion @skysiders @cnauroth ! Looking into the changes it seems that this is kind of a breaking change since depending on the configuration permissions will be set differently. Moreover the proposed changes make the code more verbose and less straightforward. Furthermore, I am not sure we want to enforce a programming pattern where we do For the reasons above, I would prefer if we didn't merge these changes. |
|
Hi @zabetak , thanks for your review. Regarding the second point "programming pattern" you mentioned, in fact, it is also possible to use Hadoop's underlying FileSystem.create(fs, path, perm) here. In fact, I now think that such a "programming pattern" should be adopted, because this It is safe and more reliable than fs.create(path,perm). This kind of repair is mainly aimed at API misuse. I have mentioned this problem in HBASE-26994, which means that the developer originally intended to Set special permissions here, but mistakenly think that fs.create(path, perm) can set special permissions perm for the path. In fact, this is wrong. In the chat with the hbase developer, I pointed out his mistake and Got his approval. Finally, I want to say that this fix is necessary in my opinion. I have searched in hive and found these four API misuse problems, so I point out this problem here. |
|
@skysiders Apologies for not following up on this but I am still pretty busy with various things.
If this pattern is preferred then why not adopting it here? I saw your comments above about a potential perf benefit. If that is true then it would be better to propose this perf improvement in the "Hadoop HDFS" project. It would be better to have this Other than that I don't really like the fact that |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |








HIVE-26887 In the QueryResultsCache function of class QueryResultsCache, there is the following code segment
It can be seen that the function will use the mkdirs to create cacheDirPath, and the parameters passed in include the path variable cacheDirPath and a permission 700. But we haven't confirmed whether the permission is correctly assigned to the file.
The above question is raised because there are two mkdir functions of hadoop,
mkdirs(Path f, FsPermission permission)withmkdirs(FileSystem fs, Path dir, FsPermission permission)and the first one is used here. The permissions of this function will be affected by the underlying umask. Although 700 here will hardly be affected by umask, but I think from a rigorous point of view, we should have one more permission check and permission grant here. So I judge whether the file permission is correctly granted by detecting the influence of umask on the permission, and if not, give it the correct permission through setPermission.And I find same issue in other three methods here.
In class Context
In class SessionState
and in class TezSessionState