-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added archive validation for all available algorithms in hashlib. #8851
Conversation
Some thoughts:
|
This certainly has nothing to do with #7778 |
Hi @radoering, When calculating the known hashes, we filter for What would impact performance is if we had multiple hashes types for the same given file (i.e. sha256, sha512, ...). This can be seen on the line were we generate |
Thanks for the hint, I overlooked this. So in theory, the same archive could appear several times (with different hash types) in the list. (I'm not sure whether this can happen in practice but let's assume it can.)
I don't think picking the first hash type is good enough. We do not want to rely on md5 if there are more secure hashes. I suppose, we have to put a list of preferred hash types somewhere (maybe in I assume normally there shouldn't be other hashes than these in the lockfile anyway so this might all be a theoretical issue after all: poetry/src/poetry/repositories/http_repository.py Lines 227 to 232 in 022308e
|
I've created a I've also added tests for multiple hash types of the same file, multiple files in the same package, and an unsupported hash type. |
I think the approach makes sense in general. We just have to clarify some details. I don't really like the repetetiveness of
We definitively want a parameterized unit test for this function that checks the special cases (regardless of the exact implementation).
That's probably fine. I don't expect performance to be an issue but if it was that would also be a factor to consider...
LGTM but I'm not an expert. I suppose we can change it later if someone with more expertise comes along. Just one thing: I'd probably omit md5 (and maybe sha1?) because they are known to be insecure. If we omit them from the preferred/priority list, we still have a chance to choose a better algorithm, which is unknown to us. Not something you should rely on but anyway... |
I didn't particularly like that big block of if statements either. I did originally write it as a match\case, but then realised poetry supports versions of python older than 3.10. I think I got the gist of where you were going in your code snippet. I pulled out the creation of the prioritised hash types into a global, as I didn't like the idea of doing all that collection manipulation for every archive and to only get the same answer every time. Also python As to your MD5 and SHA1 comment, preferring other algorithms is a must (which is what this implementation does). Omitting them from the list will remove the ability to ever use them. I took it that was not your intent, but if you did want to prohibit the use of archives with only an MD5 or SHA1 hash then I will remove them. That does tend to force people to point to git repositories to avoid any checks (or to lesser dependency management tools 😝), which is obviously worse. Is there a good mechanism to warn users if we have had to check an archive using MD5 or SHA1? Or perhaps create a feature to add a configuration option? |
Makes sense.
Not sure what your are referring to. The preferred/prioritized hashes must be ordered, but the
Not when using my code snippet. 😉 My idea is that we still choose an available hash type even if none is in our preferred list. That part is missing now. I'll add a comment in the code to make it clearer.
We could have another list/set of deprecated hash types. However, I'm not sure if it's worth it. |
So I've:
I didn't include the warning log you had as this catered for in |
I thought |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor nitpicks.
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Resolves: #4578
May also contribute to #7881 , as it's unclear if the known hashes being checked are all sha256.