-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHA1 sum of package dists should be more reliable than a SHA1 on the zip result #2540
Comments
Is there a solution until now? |
I'm not an expert in the ZIP format, but I guess the file access times (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) are changing and thereby messing up the checksum. |
@david0 it's a good theory/idea, either that or even modification times. I am not sure if these metadata bits are part of the header or if they are also compressed together with the file contents. If it's in the headers then maybe we could more or less easily mask them and then hash the outcome yes. If anyone is up for digging further in the spec and possibly trying to build a small proof of concept (it really does not need to be part of composer as a first step) that'd be great. |
My proof of concept. I also noticed that file access times etc. are stored in an optional field "extra". It is possible to omit this field when creating zip files using |
@david0 ideally we should support this for zips created by github as well though where we can't control the process, so if we can strip the headers it'd be best. I am not sure about the proof of concept because I get the same shasums anyway for the testdata1/2 and 3/4 couples. So the sha.php code does not prove much but it does indeed produce different shas that are also consistent. It'd be nice if we could reproduce the exact issue (two different zips of same content) and get the same sha. |
@Seldaek must be some difference on your system (with atime). I updated the poc and included my testfiles.
It would be also possible to make testdata 1-4 have the same hash by ignoring the central directory structure also. |
OK I get the same hashes as you do so that's a good start ;) Not sure what central directory structure is, does that include paths/directory structure of the unzipped output? Because ignoring that might be risky. |
The central directory structure seems the be a repetition of the file headers for random access. |
@sroze just haven't had time to look into this in more details, but yes in theory it's a good way to do hashing for zip files, it's probably not very hard to add that as an optional hashing mechanism to verify the shasum of zips. |
What about the ability to disable the checksum until a real solution can be found? |
AFAIK packagist doesn't have checksums and toran proxy doesn't either, I am not sure about satis but overall checksums aren't really used atm due to the issues they caused. |
Well, Packagist does not use the checksum because the Github API does not provide a checksum for its archives (probably because it changes over time as they generate archives on demand for commits). |
We could also download the archives on packagist to get a checksum,
that's not really the issue.
|
So if I submit a PR with a config inside satis.json to allow disabling satis checksums inside packages.json - there'd be no initial objections? |
@Seldaek that would be both inefficient (downloading all archives in the GithubDriver just to get the checksum) and a bad idea (as Github archives are generated on the fly and so can have different metadata over time) |
I would agree to have an option (just a command-line one could be sufficient) to disable checksum checks. At the same time, having a real solution like this one would be far better. @Seldaek if you're in with this idea, I can PR that |
Any updates here? I am regularly seeing issues like this which slow down our deployments due to Composer falling back to Git cloning instead of downloading dist archives:
|
how are you ending up with a checksum for github downloads ? Packagist and Composer don't add them as github does not provide checksums (and it cannot as they generate the archive on demand meaning that the checksum is not stable over time) |
@stof Good question, I cannot say what causes these checksums to show up. Could it be that Composer caches can cause this issue? I am unable to reproduce the issue when pulling |
Any update to this? We're using satis and keep seeing this same issue... |
We also had the issue that we tried to keep the lockfile clean of any Following @mbrodala s hint I deleted It turned out that those checksums were also present in |
Whilst that may work, that doesn't seem like a solution to me. The real question is why are they different...
|
@oligriffiths Seems to be a known issue in Satis but not really related to the issue here. ;-) |
Cool, just saying the direct download from github would be a solution |
I'm having this issue with |
Is there an existing issue to move the hashing also to sha-2 instead of md5 or sha1? I'm a bit surprised that composer as a newish tool is using an outdated and deprecated hash for verification |
Regarding striping metadata from ZIP files, look at the implementation in: |
So I have wasted hours on finding out why I am getting these warning:
Are there any plans to fix this? |
Don't know if anyone finds this interesting but I usually use "faketime" to create consistent archives: http://manpages.ubuntu.com/manpages/trusty/man1/faketime.1.html |
Could somebody please add a little more context regarding when exactly this problem occurs? What are these "multiple servers" from the OP, and how do they create the archives? For the record, I am able to create a ZIP file from the https://github.com/symfony/symfony repo that has the same checksum as the ZIP archive downloadable on the GitHub releases page. Try:
Edit: To be a bit more elaborate myself, here's an example for doctrine/cache 1.10.0 as a random choice. In a
So:
Yes, the |
No it does not. For github repos, the GithubDriver extracts all data from the API. |
Hey @peff 👋🏼, you've been very helpful back in 2017 over at Homebrew/homebrew-core#18044, where the Homebrew team had to deal with checksums for GitHub repo archives that suddenly changed. To my understanding, this was due to an internal fix in Git and how it exports archives. Counting on your expertise and hoping that you're still in some way affiliated with GitHub, and also referring to your comment regarding byte-stable tarballs, can you tell us...
Thanks! Update for clarification: Obviously I gave one reason myself why the checksum might change, namely changes to the way Git exports and packages archives. But since you mentioned that GitHub might take measures to guarantee byte-stable archives, and given the fact that also Homebrew seems to rely on checksums of GitHub archives (arbitrary example), we might consider the possible fall-out of a checksum change to be big enough that its not going to happen by accident 😉 . Doest that make sense? |
Not much (nothing?) has changed since that comment. Zipfiles are treated the same as tarballs: generated by
Yeah. I think the situation remains "changes should be very rare and avoided if possible, but technically there are no promises". |
Thanks for the clarification. This to me seems like a no-go as it means there is a chance a git or github change would render virtually all composer.lock file in existence invalid and uninstallable due to security warnings. Even if it's very rare it seems like a pretty bad disruption to the whole ecosystem I'd rather avoid. |
Jordi, thank you for getting back to this (c)old case so quickly. The reason why I was browsing this and related issues (#5940, #4022) is that I was hoping that there would be an (easy?) way of making sure that the code that we see and install is the same on our local development machines, on CI and in production. I cannot really say if this is already addressed by the I have no experience when it comes to designing security for systems like this, so I cannot even tell what kind of attacks this would prevent or at least mitigate... Maybe a misbehaving package owner trying a targeted attack, serving different On the other hand: As long as we're downloading GitHub-provided ZIP archives over HTTPS directly from GitHub, and since we can trust GitHub as a platform, and since they create the ZIPs from the corresponding commit ref, just having the So, Jordi, do you still recall what exactly your motivation was for opening this issue? Were it security/integrity concerns? Regarding possible checksum changes: Yes, I see the risk and I understand that users would probably be yelling at Composer/Packagist. OTOH, since at least Homebrew would be affected as well, there would probably be a lot of visibility and support by all parties involved to sort such issues out. And anyway, in my mind, all this would be opt-in (or out?) by a switch like But again, before we're discussing such technical details I'd need to understand what benefits a simple checksum-based approach would bring security-wise. |
This isn't opt-in or opt-out, all existing Composer versions if they have a checksum defined will verify it and if it fails they throw. So that's why we disabled it for github early on as it wasn't stable enough.
That is correct I'd say security wise it doesn't add a ton for GitHub, but for other less reputable sources it wouldn't hurt to have. Then again last I checked ~97% of packages were on GitHub, so all in all I'm not sure what is worth spending resources on. |
Gave this some more thought yesterday, and I'm thinking maybe what we can do is this:
That should allow us to roll this in smoothly and in a future-proof manner. That said, before proceeding any further we should first assess what can be done to share/integrate this together with the TUF work going on in #9740 These are just notes, I don't want to promise real progress on this any time soon as we have still tons of other work already planned. |
This should mostly work, but I'll mention one case where it might not: if the dependent package uses Git's This should be quite rare, I'd think (again, I've seen only one report total of it messing up Homebrew). And it can perhaps be solved by politely explaining to the project you're hashing why |
There is always a way things can go wrong :) Thanks for the pointer, but I would indeed expect this to be rare enough that the combination of using export-subst + github changing its output would only impact a small number of people. |
The purpose of checksum is to lock the "not-yet-downloaded" file's integrity. We can implement this as opt-in feature and let it work as slow as it needs to - as proposed by @Seldaek in generating checksum each of files in the package sort the checksums and take sha of that ... well, not sure if this raises possibility of collision .. I hope it's rare. Speaking up here because |
I am actively working on this integration. A notable challenge arises from the lack of checksums in Regarding stability concerns with Github-produced archives, it's true that relying on them "as-is" is not advisable (Github said we shouldn't do that because it might change without notice). However, the Nix package manager ingeniously bypasses this issue by computing hashes based on the content of the archives rather than the archives themselves. This approach ensures that potential changes in Github's archive format won't affect the hashes. A few months ago, I experimented with creating stable hashes using PHP (https://gist.github.com/drupol/2bb45818f4ccb73362d26b4d9aee9ec2). Unfortunately, the process was quite slow and I didn't pursue it further. Currently, I don't have a definitive solution for this specific issue, but I am convinced that Nix excels in many areas, particularly in generating stable hashes. |
(Replacing #1496 which has become a mess, references #5940)
If multiple servers create archives, then those archives can have different SHA1s which is problematic. Potential solutions:
The text was updated successfully, but these errors were encountered: