-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create .tar.gz package deterministically using Python's "tarfile" #12244
Conversation
Thanks for the pull request, and welcome! The Servo team is excited to review your changes, and you should hear from @jdm (or someone else) soon. |
Heads up! This PR modifies the following files:
|
r? @aneeshusa |
tarinfo.mtime = 0 | ||
return tarinfo | ||
|
||
with cd(dir_to_package): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard would it be to get rid of this with cd(...)
block? I've never been a fan of cd
ing during builds, but it's OK to keep if it's complicated otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One advantage of with cd()
is that we don't need to deal with relative and non-relative path prefixes which we need to remove before adding the paths to the archive. I guess doing it without cd()
is possible, but will look quite ugly.
Why do you not like cd()
during builds, if I may ask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize we'd need to strip the prefixes, it does sound like cd
is the better option here.
Using cd
during builds adds another piece of state to keep track of, which makes it hard to interactively run commands from the middle of a script to debug/test them, since you may have changed directories much earlier; it's also easy to forget to cd
back. Arguably, this is mostly a problem for shell scripts, and not as much of an issue for Python, both due to the easy-to-see and automatic (safe) scoping of with cd()
and because I'm less likely to interactively run things like with gzip.GzipFile(..):
.
Looks pretty good so far, I didn't think to use native Python facilities like sorting and just looks at the options to |
527ba89
to
4b511ae
Compare
@aneeshusa updated, except for a few comments (about |
See my comment above, I think we can get the temporary package name fix working. |
I did some testing locally and it seems that passing a relative path for |
4b511ae
to
168755f
Compare
@aneeshusa updated, fixed the issue with |
☔ The latest upstream changes (presumably #11967) made this pull request unmergeable. Please resolve the merge conflicts. |
168755f
to
c9518df
Compare
Rebased the changes. |
@aneeshusa I think this is waiting on you? |
@@ -6,13 +6,16 @@ | |||
# <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your | |||
# option. This file may not be copied, modified, or distributed | |||
# except according to those terms. | |||
|
|||
import gzip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please put the empty line back.
Sorry about the delay, lgtm aside from a few nits. |
c9518df
to
0a056a9
Compare
Thanks! @bors-servo r+ |
📌 Commit 0a056a9 has been approved by |
Create .tar.gz package deterministically using Python's "tarfile" <!-- Please describe your changes on the following line: --> A development of #12108, creates a .tar.gz package using the `tarfile` and `gzip` modules, without external dependencies. Fixes #11981. Also this fixes the issue when the existing `resources/` directory didn't allow to create a new package and failed with "File exists" error. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: --> - [X] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [X] These changes fix #11981 (github issue number if applicable). <!-- Either: --> - [ ] There are tests for these changes OR - [X] These changes do not require tests because "more general approach to reproducibility testing is needed" <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/12244) <!-- Reviewable:end -->
☀️ Test successful - arm32, arm64, linux-dev, linux-rel, mac-dev-unit, mac-rel-css, mac-rel-wpt, windows-dev |
Summary: We plan to promote TensorBoard’s Pip packages to Bazel targets, but the filenames of these targets can’t actually be statically determined: they must include the TensorBoard version number, which is defined in a Python module. Instead, we’ll build a tarball containing the two wheels. Building tarballs deterministically with system tools is famously troublesome (you can do it with GNU `tar` and a lot of flags, but not portably), and it seems that the standard approach is to use Python’s built-in `tarfile` library. Implementation based off of <servo/servo#12244>, but with changes to make it Python 3-compatible, to use our code style, and to clean up some minor issues and fix some TODOs from the original. The original source is Apache-2 licensed. Test Plan: End-to-end tests included. Replacing the body of `_run_tool` with ```python return subprocess.check_output([ "/bin/sh", "-c", 'x="$(readlink -f "$1")" && cd "$2" && tar czf "$x" .', "unused", ] + args) ``` causes the `test_invariant_under_mtime` test case to fail for the right reason. wchargin-branch: deterministic-tgz
Summary: We plan to promote TensorBoard’s Pip packages to Bazel targets, but the filenames of these targets can’t actually be statically determined: they must include the TensorBoard version number, which is defined in a Python module. Instead, we’ll build a tarball containing the two wheels. Building tarballs deterministically with system tools is famously troublesome (you can do it with GNU `tar` and a lot of flags, but not portably), and it seems that the standard approach is to use Python’s built-in `tarfile` library. Implementation based off of <servo/servo#12244>, but with changes to make it Python 3-compatible, to use our code style, and to remove some unneeded machinery. Original is Apache-2 licensed. Test Plan: End-to-end tests included. Replacing the body of `_run_tool` with ```python return subprocess.check_output(["tar", "czf"] + args) ``` causes the `test_invariant_under_mtime` test case to fail for the right reason. wchargin-branch: deterministic-tgz
A development of #12108, creates a .tar.gz package using the
tarfile
andgzip
modules, without external dependencies. Fixes #11981.Also this fixes the issue when the existing
resources/
directory didn't allow to create a new package and failed with "File exists" error../mach build -d
does not report any errors./mach test-tidy
does not report any errorsThis change is