-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: promote Pip wheels to Bazel target #2363
Conversation
Summary: We plan to promote TensorBoard’s Pip packages to Bazel targets, but the filenames of these targets can’t actually be statically determined: they must include the TensorBoard version number, which is defined in a Python module. Instead, we’ll build a tarball containing the two wheels. Building tarballs deterministically with system tools is famously troublesome (you can do it with GNU `tar` and a lot of flags, but not portably), and it seems that the standard approach is to use Python’s built-in `tarfile` library. Implementation based off of <servo/servo#12244>, but with changes to make it Python 3-compatible, to use our code style, and to clean up some minor issues and fix some TODOs from the original. The original source is Apache-2 licensed. Test Plan: End-to-end tests included. Replacing the body of `_run_tool` with ```python return subprocess.check_output([ "/bin/sh", "-c", 'x="$(readlink -f "$1")" && cd "$2" && tar czf "$x" .', "unused", ] + args) ``` causes the `test_invariant_under_mtime` test case to fail for the right reason. wchargin-branch: deterministic-tgz
Summary: Running `bazel build //tensorboard/pip_package` now actually builds the Pip package. Other targets can depend on the built wheels, which are provided in a tarball. Smoke tests have been separated to a new `:test_pip_package` target. Test Plan: Check that everything builds: ``` $ bazel build //tensorboard/pip_package 2>/dev/null $ echo $? 0 $ tar tzf bazel-bin/tensorboard/pip_package/pip_packages.tar.gz tensorboard-1.15.0a0-py2-none-any.whl tensorboard-1.15.0a0-py3-none-any.whl $ bazel run //tensorboard/pip_package:test_pip_package >/dev/null 2>&1 $ echo $? 0 ``` Then: - Add an `assert False` to the `pb` function in `scalar/summary.py`, and verify that `:test_pip_package` fails with default options but passes with `--tf-version ""`. - Add an `assert False` to `tensorboard/main.py` and verify that the test fails even with `--tf-version ""`. - Add `def foo(): yield from None` to `tensorboard/main.py`, and verify that, when run in a Python 3 virtualenv, the test fails by default but passes with `--default-python-only`. wchargin-branch: bazel-pip-package
wchargin-source: 309c6d9b6b7c08c652bcf87a224e299ebbae64e1 wchargin-branch: bazel-pip-package
wchargin-source: 2d3424900b3e3c7839a4f9a67999f75b8fe2ed0d wchargin-branch: bazel-pip-package
wchargin-source: 991fdaa7a849ba49d062e9f640d6ddd12ffc703e wchargin-branch: deterministic-tgz
wchargin-source: 33bf6658a701a3348e21fea5418d9825e4dc21b9 wchargin-branch: bazel-pip-package # Conflicts: # tensorboard/pip_package/build_pip_package.sh
wchargin-source: 33bf6658a701a3348e21fea5418d9825e4dc21b9 wchargin-branch: bazel-pip-package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice cleanup.
tensorboard/pip_package/BUILD
Outdated
# target can extract the tarball and glob its contents. | ||
outs = ["pip_packages.tar.gz"], | ||
cmd = "\n".join([ | ||
"tmpdir=\"$$(mktemp -d)\" &&", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to read this code embedded in the genrule definition like this.
What if we just made deterministic_tar_gz a data dep for build_pip_package, so that if the argument (currently named OUTPUT_DIR) ends in .tar.gz
it emits the pip packages as an archive in the given file instead?
If we did that and also ensured that pip package building doesn't emit extraneous output, then it seems like we could make the entire genrule here just a one-liner: $(location :build_pip_package) $@
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay; that’s totally fine with me if we just make it always emit a
.tar.gz
, and stomachable if it cases on the filename. Will change
tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, with the filename-casing per your request.
wchargin-source: 30e936b83964ab4b78b9c9755fc62849b182c3d2 wchargin-branch: deterministic-tgz
wchargin-source: 30e936b83964ab4b78b9c9755fc62849b182c3d2 wchargin-branch: deterministic-tgz
wchargin-source: 30e936b83964ab4b78b9c9755fc62849b182c3d2 wchargin-branch: bazel-pip-package
wchargin-source: 54daa09db50cca2bc6b41c933fc40e7b124495ac wchargin-branch: deterministic-tgz
wchargin-source: 54daa09db50cca2bc6b41c933fc40e7b124495ac wchargin-branch: bazel-pip-package
wchargin-source: 065ea1bdcfbd0621f5998a8e310b9e2e7f1f6a5f wchargin-branch: deterministic-tgz
wchargin-source: 065ea1bdcfbd0621f5998a8e310b9e2e7f1f6a5f wchargin-branch: bazel-pip-package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
tensorboard/pip_package/BUILD
Outdated
# target can extract the tarball and glob its contents. | ||
outs = ["pip_packages.tar.gz"], | ||
cmd = "\n".join([ | ||
"tmpdir=\"$$(mktemp -d)\" &&", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay; that’s totally fine with me if we just make it always emit a
.tar.gz
, and stomachable if it cases on the filename. Will change
tomorrow.
wchargin-source: a410c619117892bb8945f20c5c277d748801e342 wchargin-branch: deterministic-tgz
wchargin-source: a410c619117892bb8945f20c5c277d748801e342 wchargin-branch: deterministic-tgz
wchargin-source: a410c619117892bb8945f20c5c277d748801e342 wchargin-branch: bazel-pip-package
wchargin-source: 639878d4e6349a2977efc1ff82d6f6c3d103eaf5 wchargin-branch: bazel-pip-package
wchargin-source: 639878d4e6349a2977efc1ff82d6f6c3d103eaf5 wchargin-branch: bazel-pip-package
wchargin-source: d75820f22a6622d33b775213271b94f8f15ae9a4 wchargin-branch: bazel-pip-package
wchargin-source: c16653dbed2cb17f4909fe0b568fb9fb009bf4ee wchargin-branch: bazel-pip-package
usage: build_pip_package OUTPUT_DIR | ||
|
||
Build TensorBoard Pip packages and store the resulting wheel files | ||
into OUTPUT (a \`.tar.gz\` path or directory). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd say a directory or a \`.tar.gz\` path
to make it clear that that there is no need for the directory name to end in .tar.gz
since it's otherwise strictly speaking ambiguous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, absolutely. (I usually try to do this, but missed this one.)
;; | ||
"0") | ||
printf "\nSkipping smoke test\n\n" | ||
cd "${original_wd}" # Bazel gives "${output}" as a relative path >_> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just resolve output
to an absolute path while we're still in the original working directory, rather than having to cd
back to it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know a portable way to do that. GNU readlink -e
/-f
works
perfectly on Linux, but on macOS readlink
(1) is basically just
readlink
(2). realpath
(1) is also not POSIX. (Here is a handy list
of portable utilities—it’s not large, especially when you look
at (e.g.) xargs
and see that there’s no portable -0
option!)
I mean, there’s always
path="$(python -c 'print(__import__("os").path.realpath(__import__("sys").argv[1]))' "${path}")"
which has the downside of being a bit hideous (and also not correct
because command substitutions strip whitespace); just cd
ing back is
significantly simpler imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If there is a portable way to do this, I would love to know!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
case "${output}" in
/*) ;;
*) output="${PWD}/${output}" ;;
esac
echo >&2 'fatal: unknown smoke value:' "${smoke}" | ||
exit 1 | ||
if ! [ -d "${output}" ]; then | ||
printf >&2 'fatal: no such output directory: %s\n' "${output}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems more straightforward to just mkdir -p "${output}"
here if it doesn't exist. I don't see any particular reason why it should require the directory to already exist, and this makes it more consistent with the .tar.gz
behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring the output directory to exist is more consistent with other applications
that write files to specified directories—
$ tar czf - a.c | tar xzf - -C ./nonexistent/
tar: nonexistent: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
$ cp a.c ./nonexistent/
cp: cannot create regular file './nonexistent/': Not a directory
$ mv a.c ./nonexistent/
mv: cannot move 'a.c' to './nonexistent/': Not a directory
$ ln -s a.c ./nonexistent/
ln: target './nonexistent/' is not a directory: No such file or directory
$ gcc a.c -o ./nonexistent/
/usr/bin/ld: cannot open output file ./nonexistent/: Is a directory
collect2: error: ld returned 1 exit status
$ gcc a.c -o ./nonexistent/a.c
/usr/bin/ld: cannot open output file ./nonexistent/a.c: No such file or directory
collect2: error: ld returned 1 exit status
$ printf 'shell redirections\n' >./nonexistent/out
bash: ./nonexistent/out: No such file or directory
—so I’d prefer to keep it as is unless you feel strongly. (The explicit
error is just to have a nicer message.)
wchargin-source: 243330e9aadb74dd31360a8bcae168c13953b205 wchargin-branch: bazel-pip-package
Summary: We don’t use `set -x` indiscriminately because build tool output appears in Bazel logs, and that would be a lot of spam. But we should print output on failures, to clarify the cause of issues like #2458. The original version of #2363 had this functionality in the BUILD rule itself, but @nfelt found that a bit hard to read (fair), so we include the relevant functionality in `build_pip_package.sh`. Recommend ignoring whitespace for reading this patch. Test Plan: Building `//tensorboard/pip_package` on a machine with no `virtualenv` binary now prints: ``` ERROR: /home/wchargin/tensorboard/tensorboard/pip_package/BUILD:8:1: Executing genrule //tensorboard/pip_package:pip_package failed (Exit 127) bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped) Use --sandbox_debug to see verbose messages from the sandbox + uname + [ Linux = Darwin ] + sedi=sed -i + command -v virtualenv Target //tensorboard/pip_package:extract_pip_package failed to build Use --verbose_failures to see the command lines of failed build steps. FAILED: Build did NOT complete successfully ``` which is pretty clear that the problem is that `virtualenv` is not installed. After installing `virtualenv`, the build completes successfully with no output, as desired. Running ``` bazel run //tensorboard/pip_package:build_pip_package -- /tmp/nonexistent ``` prints simply ``` fatal: no such output directory: /tmp/nonexistent ``` and exits with `1` rather than printing the whole log. wchargin-branch: pip-package-error-handling
Summary: We don’t use `set -x` indiscriminately because build tool output appears in Bazel logs, and that would be a lot of spam. But we should print output on failures, to clarify the cause of issues like #2458. The original version of #2363 had this functionality in the BUILD rule itself, but @nfelt found that the multiple layers of escaping made this a bit hard to read (fair enough), so we now include the relevant functionality in `build_pip_package.sh`. Recommend ignoring whitespace for reading this patch. Test Plan: Building `//tensorboard/pip_package` on a machine with no `virtualenv` binary now prints: ``` ERROR: /home/wchargin/tensorboard/tensorboard/pip_package/BUILD:8:1: Executing genrule //tensorboard/pip_package:pip_package failed (Exit 127) bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped) Use --sandbox_debug to see verbose messages from the sandbox ++ uname + [ Linux = Darwin ] + sedi=sed -i + command -v virtualenv Target //tensorboard/pip_package:pip_package failed to build Use --verbose_failures to see the command lines of failed build steps. FAILED: Build did NOT complete successfully ``` which is pretty clear that the problem is that `virtualenv` is not installed. After installing `virtualenv`, the build completes successfully with no output, as desired. wchargin-branch: pip-package-error-handling
Summary:
Running
bazel build //tensorboard/pip_package
now actually builds thePip package. Other targets can depend on the built wheels, which are
provided in a tarball. Smoke tests have been separated to a new
:test_pip_package
target.Test Plan:
Check that everything builds:
Then:
assert False
to thepb
function inscalar/summary.py
,and verify that
:test_pip_package
fails with default options butpasses with
--tf-version ""
.assert False
totensorboard/main.py
and verify that thetest fails even with
--tf-version ""
.def foo(): yield from None
totensorboard/main.py
, andverify that, when run in a Python 3 virtualenv, the test fails by
default but passes with
--default-python-only
.wchargin-branch: bazel-pip-package