-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic output due to non-deterministic os.walk and unix file permissions of original files #2155
Comments
I think 1 is fine to solve as you suggest. Pex has a suite of determinism tests, but they assume you're trying to build the PEX again at some other time on the same filesystem / OS. I think its reasonable to expect the filesystem / OS should not matter. I'm pretty sure the issue here is not that As for permissions, those should be what you see is what you get, umasks aside. If I clone a git repo, say, and that contains the code to be PEXed, then the final permissions on the files of the clone that go into the PEX should be the permissions preserved in the PEX. Likewise, when the PEX extracts itself those exact permissions should be what the corresponding files have in their resting locations. In short, I'm not understanding the second issue and it would be great to have a repro if you can provide one - perhaps using ~docker. |
Re 1: Sounds good. Re 2: Here's a Dockerfile that reproduces the issue: https://gist.github.com/ento/ee0e7ca020d85a4f58c74be8d093502b#file-dockerfile.
So, given the same repo, pex may produce different outputs depending on the environment. |
Hrm, ok. I'm not sure I like the umask munging angle. I need to think on that a bit. Catering to differing OS / filesystem makes sense - its prevalent. Folks like to develop on X but deploy to Y (I think that's crazy but it's a thing and sometimes actually unavoidable). Umask though gets weird. You're handing Pex different inputs in that case; so different outputs makes complete sense and your build process should probably control your inputs better. Stepping back a bit, I think only these permissions matter for a PEX, a wheel, etc:
In other words, maybe Pex could have a mode where it normalizes perms, only preserving the execute bit for files. I can find no spec guidance here but there is this related wheel spec thread: https://discuss.python.org/t/clarifications-to-the-wheel-specification/8141 |
Addresses one part of #2155. Adds a wrapper for `os.walk` that sorts directory entries as it traverses directories. As discussed in the issue, `os.walk` may yield directory entries in a different order depending on (possibly) the file system. I don't know if the two `os.walk` calls that this PR replaces are enough for addressing this class of non-determinism in pex, but these were sufficient for solving the issue in my case. Happy to expand the use of the wrapper if there are other places that it should be used.
Context
I'm using Pants to package up a GitHub Actions action I started writing in Python. The packaged pex file is checked in to the repo, so that when the action gets run, all dependencies get pulled down as part of the action itself, similar to how actions/typescript-action uses ncc.
Problem and current workaround
It appeared to work fine until I added a CI check that builds a pex file in the CI environment and verifies the checked in pex file is identical to the one that just got built: even though none of the source files nor dependencies changed, the check didn't pass.
I tracked down the causes and was able to use a forked pex repo with a few changes to make the check pass: main...ento:pex:deterministic-bootstrap
os.walk
was yielding directories in a non-deterministic order, apparently. Fixed by applying.sort()
in a few places.Questions
I can work on opening a PR or two to address these sources of non-determinism if that's something good to incorporate into pex.
os.walk
, I'm thinking of adding a wrapper aroundos.walk
likedeterministic_os_walk
and using it at least in the two places I needed to patch. Would it be a good idea to use it everywhere, or should it be limited to just these two places? Or is this something out of scope of guarantees that pex is willing to provide?--use-system-time
flag for allowing non-deterministic timestamps. With file permissions, we could similarly have a non-deterministic mode and deterministic mode, where deterministic mode creates new zipfile entries withu=rw,g=rw,o=rw
for files andu=rwx,g=rwx,o=rwx
for directories and with a configurable umask applied, but.. that will mean files that were originally executable will lose that bit. The determinisitc mode could, instead, limit what it controls tor
andw
permissions and letx
pass through, which might work good enough in practice.The text was updated successfully, but these errors were encountered: