Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create .tar.gz package deterministically using Python's "tarfile" #12244

Merged
merged 1 commit into from
Aug 5, 2016

Conversation

tonyo
Copy link
Contributor

@tonyo tonyo commented Jul 4, 2016

A development of #12108, creates a .tar.gz package using the tarfile and gzip modules, without external dependencies. Fixes #11981.
Also this fixes the issue when the existing resources/ directory didn't allow to create a new package and failed with "File exists" error.


  • There are tests for these changes OR
  • These changes do not require tests because "more general approach to reproducibility testing is needed"

This change is Reviewable

@highfive
Copy link

highfive commented Jul 4, 2016

Thanks for the pull request, and welcome! The Servo team is excited to review your changes, and you should hear from @jdm (or someone else) soon.

@highfive
Copy link

highfive commented Jul 4, 2016

Heads up! This PR modifies the following files:

  • @wafflespeanut: python/servo/command_base.py, python/servo/package_commands.py

@highfive highfive added the S-awaiting-review There is new code that needs to be reviewed. label Jul 4, 2016
@jdm
Copy link
Member

jdm commented Jul 6, 2016

r? @aneeshusa

@highfive highfive assigned aneeshusa and unassigned jdm Jul 6, 2016
tarinfo.mtime = 0
return tarinfo

with cd(dir_to_package):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to get rid of this with cd(...) block? I've never been a fan of cding during builds, but it's OK to keep if it's complicated otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of with cd() is that we don't need to deal with relative and non-relative path prefixes which we need to remove before adding the paths to the archive. I guess doing it without cd() is possible, but will look quite ugly.
Why do you not like cd() during builds, if I may ask?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize we'd need to strip the prefixes, it does sound like cd is the better option here.

Using cd during builds adds another piece of state to keep track of, which makes it hard to interactively run commands from the middle of a script to debug/test them, since you may have changed directories much earlier; it's also easy to forget to cd back. Arguably, this is mostly a problem for shell scripts, and not as much of an issue for Python, both due to the easy-to-see and automatic (safe) scoping of with cd() and because I'm less likely to interactively run things like with gzip.GzipFile(..):.

@aneeshusa aneeshusa added S-needs-code-changes Changes have not yet been made that were requested by a reviewer. and removed S-awaiting-review There is new code that needs to be reviewed. labels Jul 18, 2016
@aneeshusa
Copy link
Contributor

Looks pretty good so far, I didn't think to use native Python facilities like sorting and just looks at the options to gzip and tarfile earlier. Thanks for taking the time to rewrite this PR!

@highfive highfive added S-awaiting-review There is new code that needs to be reviewed. and removed S-needs-code-changes Changes have not yet been made that were requested by a reviewer. labels Jul 18, 2016
@tonyo
Copy link
Contributor Author

tonyo commented Jul 18, 2016

@aneeshusa updated, except for a few comments (about with cd() and the temporary package name).

@aneeshusa aneeshusa added S-needs-code-changes Changes have not yet been made that were requested by a reviewer. and removed S-awaiting-review There is new code that needs to be reviewed. labels Jul 20, 2016
@aneeshusa
Copy link
Contributor

See my comment above, I think we can get the temporary package name fix working.

@aneeshusa
Copy link
Contributor

I did some testing locally and it seems that passing a relative path for dest_archive causes the archive to be created in the dir_to_package, not the PWD when archive_deterministically is called. Looks like this is being caused by the cd() - please fix in whichever way is most ergonomic.

@highfive highfive added S-awaiting-review There is new code that needs to be reviewed. and removed S-needs-code-changes Changes have not yet been made that were requested by a reviewer. labels Jul 21, 2016
@tonyo
Copy link
Contributor Author

tonyo commented Jul 21, 2016

@aneeshusa updated, fixed the issue with cd() and temporary package handling.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #11967) made this pull request unmergeable. Please resolve the merge conflicts.

@highfive highfive added the S-needs-rebase There are merge conflict errors. label Jul 25, 2016
@tonyo
Copy link
Contributor Author

tonyo commented Jul 27, 2016

Rebased the changes.

@jdm jdm removed the S-needs-rebase There are merge conflict errors. label Jul 27, 2016
@Ms2ger
Copy link
Contributor

Ms2ger commented Jul 28, 2016

@aneeshusa I think this is waiting on you?

@@ -6,13 +6,16 @@
# <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
# option. This file may not be copied, modified, or distributed
# except according to those terms.

import gzip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please put the empty line back.

@aneeshusa
Copy link
Contributor

Sorry about the delay, lgtm aside from a few nits.

@aneeshusa aneeshusa added S-needs-code-changes Changes have not yet been made that were requested by a reviewer. and removed S-awaiting-review There is new code that needs to be reviewed. labels Aug 4, 2016
@highfive highfive added S-awaiting-review There is new code that needs to be reviewed. and removed S-needs-code-changes Changes have not yet been made that were requested by a reviewer. labels Aug 4, 2016
@aneeshusa
Copy link
Contributor

Thanks! @bors-servo r+

@bors-servo
Copy link
Contributor

📌 Commit 0a056a9 has been approved by aneeshusa

@highfive highfive added S-awaiting-merge The PR is in the process of compiling and running tests on the automated CI. and removed S-awaiting-review There is new code that needs to be reviewed. labels Aug 4, 2016
@bors-servo
Copy link
Contributor

⌛ Testing commit 0a056a9 with merge 1086df6...

bors-servo pushed a commit that referenced this pull request Aug 4, 2016
Create .tar.gz package deterministically using Python's "tarfile"

<!-- Please describe your changes on the following line: -->
A development of #12108, creates a .tar.gz package using the `tarfile` and `gzip` modules, without external dependencies. Fixes #11981.
Also this fixes the issue when the existing `resources/` directory didn't allow to create a new package and failed with "File exists" error.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [X] `./mach build -d` does not report any errors
- [X] `./mach test-tidy` does not report any errors
- [X] These changes fix #11981 (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [X] These changes do not require tests because "more general approach to reproducibility testing is needed"

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/12244)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

☀️ Test successful - arm32, arm64, linux-dev, linux-rel, mac-dev-unit, mac-rel-css, mac-rel-wpt, windows-dev

@bors-servo bors-servo merged commit 0a056a9 into servo:master Aug 5, 2016
@highfive highfive removed the S-awaiting-merge The PR is in the process of compiling and running tests on the automated CI. label Aug 5, 2016
wchargin added a commit to tensorflow/tensorboard that referenced this pull request Jun 19, 2019
Summary:
We plan to promote TensorBoard’s Pip packages to Bazel targets, but the
filenames of these targets can’t actually be statically determined: they
must include the TensorBoard version number, which is defined in a
Python module. Instead, we’ll build a tarball containing the two wheels.
Building tarballs deterministically with system tools is famously
troublesome (you can do it with GNU `tar` and a lot of flags, but not
portably), and it seems that the standard approach is to use Python’s
built-in `tarfile` library.

Implementation based off of <servo/servo#12244>,
but with changes to make it Python 3-compatible, to use our code style,
and to clean up some minor issues and fix some TODOs from the original.
The original source is Apache-2 licensed.

Test Plan:
End-to-end tests included. Replacing the body of `_run_tool` with

```python
    return subprocess.check_output([
        "/bin/sh",
        "-c",
        'x="$(readlink -f "$1")" && cd "$2" && tar czf "$x" .',
        "unused",
    ] + args)
```

causes the `test_invariant_under_mtime` test case to fail for the right
reason.

wchargin-branch: deterministic-tgz
wchargin added a commit to tensorflow/tensorboard that referenced this pull request Jun 21, 2019
Summary:
We plan to promote TensorBoard’s Pip packages to Bazel targets, but the
filenames of these targets can’t actually be statically determined: they
must include the TensorBoard version number, which is defined in a
Python module. Instead, we’ll build a tarball containing the two wheels.
Building tarballs deterministically with system tools is famously
troublesome (you can do it with GNU `tar` and a lot of flags, but not
portably), and it seems that the standard approach is to use Python’s
built-in `tarfile` library.

Implementation based off of <servo/servo#12244>,
but with changes to make it Python 3-compatible, to use our code style,
and to remove some unneeded machinery. Original is Apache-2 licensed.

Test Plan:
End-to-end tests included. Replacing the body of `_run_tool` with

```python
    return subprocess.check_output(["tar", "czf"] + args)
```

causes the `test_invariant_under_mtime` test case to fail for the right
reason.

wchargin-branch: deterministic-tgz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Linux .tar.gz package archive deterministically
6 participants