Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: promote Pip wheels to Bazel target #2363

Merged
merged 22 commits into from
Jun 24, 2019
Merged

Conversation

wchargin
Copy link
Contributor

@wchargin wchargin commented Jun 19, 2019

Summary:
Running bazel build //tensorboard/pip_package now actually builds the
Pip package. Other targets can depend on the built wheels, which are
provided in a tarball. Smoke tests have been separated to a new
:test_pip_package target.

Test Plan:
Check that everything builds:

$ bazel build //tensorboard/pip_package 2>/dev/null
$ echo $?
0
$ tar tzf bazel-bin/tensorboard/pip_package/pip_packages.tar.gz
tensorboard-1.15.0a0-py2-none-any.whl
tensorboard-1.15.0a0-py3-none-any.whl
$ bazel run //tensorboard/pip_package:test_pip_package >/dev/null 2>&1
$ echo $?
0
$ tmpdir="$(mktemp -d)"
$ bazel run //tensorboard/pip_package:build_pip_package -- "${tmpdir}" \
> >/dev/null 2>/dev/null
$ ls -1 "${tmpdir}"
tensorboard-1.15.0a0-py2-none-any.whl
tensorboard-1.15.0a0-py3-none-any.whl

Then:

  • Add an assert False to the pb function in scalar/summary.py,
    and verify that :test_pip_package fails with default options but
    passes with --tf-version "".
  • Add an assert False to tensorboard/main.py and verify that the
    test fails even with --tf-version "".
  • Add def foo(): yield from None to tensorboard/main.py, and
    verify that, when run in a Python 3 virtualenv, the test fails by
    default but passes with --default-python-only.

wchargin-branch: bazel-pip-package

Summary:
We plan to promote TensorBoard’s Pip packages to Bazel targets, but the
filenames of these targets can’t actually be statically determined: they
must include the TensorBoard version number, which is defined in a
Python module. Instead, we’ll build a tarball containing the two wheels.
Building tarballs deterministically with system tools is famously
troublesome (you can do it with GNU `tar` and a lot of flags, but not
portably), and it seems that the standard approach is to use Python’s
built-in `tarfile` library.

Implementation based off of <servo/servo#12244>,
but with changes to make it Python 3-compatible, to use our code style,
and to clean up some minor issues and fix some TODOs from the original.
The original source is Apache-2 licensed.

Test Plan:
End-to-end tests included. Replacing the body of `_run_tool` with

```python
    return subprocess.check_output([
        "/bin/sh",
        "-c",
        'x="$(readlink -f "$1")" && cd "$2" && tar czf "$x" .',
        "unused",
    ] + args)
```

causes the `test_invariant_under_mtime` test case to fail for the right
reason.

wchargin-branch: deterministic-tgz
Summary:
Running `bazel build //tensorboard/pip_package` now actually builds the
Pip package. Other targets can depend on the built wheels, which are
provided in a tarball. Smoke tests have been separated to a new
`:test_pip_package` target.

Test Plan:
Check that everything builds:

```
$ bazel build //tensorboard/pip_package 2>/dev/null
$ echo $?
0
$ tar tzf bazel-bin/tensorboard/pip_package/pip_packages.tar.gz
tensorboard-1.15.0a0-py2-none-any.whl
tensorboard-1.15.0a0-py3-none-any.whl
$ bazel run //tensorboard/pip_package:test_pip_package >/dev/null 2>&1
$ echo $?
0
```

Then:

  - Add an `assert False` to the `pb` function in `scalar/summary.py`,
    and verify that `:test_pip_package` fails with default options but
    passes with `--tf-version ""`.
  - Add an `assert False` to `tensorboard/main.py` and verify that the
    test fails even with `--tf-version ""`.
  - Add `def foo(): yield from None` to `tensorboard/main.py`, and
    verify that, when run in a Python 3 virtualenv, the test fails by
    default but passes with `--default-python-only`.

wchargin-branch: bazel-pip-package
wchargin-source: 309c6d9b6b7c08c652bcf87a224e299ebbae64e1
wchargin-branch: bazel-pip-package
wchargin-source: 2d3424900b3e3c7839a4f9a67999f75b8fe2ed0d
wchargin-branch: bazel-pip-package
wchargin-source: 991fdaa7a849ba49d062e9f640d6ddd12ffc703e
wchargin-branch: deterministic-tgz
wchargin-source: 33bf6658a701a3348e21fea5418d9825e4dc21b9
wchargin-branch: bazel-pip-package

# Conflicts:
#	tensorboard/pip_package/build_pip_package.sh
wchargin-source: 33bf6658a701a3348e21fea5418d9825e4dc21b9
wchargin-branch: bazel-pip-package
Copy link
Contributor

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice cleanup.

# target can extract the tarball and glob its contents.
outs = ["pip_packages.tar.gz"],
cmd = "\n".join([
"tmpdir=\"$$(mktemp -d)\" &&",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to read this code embedded in the genrule definition like this.

What if we just made deterministic_tar_gz a data dep for build_pip_package, so that if the argument (currently named OUTPUT_DIR) ends in .tar.gz it emits the pip packages as an archive in the given file instead?

If we did that and also ensured that pip package building doesn't emit extraneous output, then it seems like we could make the entire genrule here just a one-liner: $(location :build_pip_package) $@.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay; that’s totally fine with me if we just make it always emit a
.tar.gz, and stomachable if it cases on the filename. Will change
tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, with the filename-casing per your request.

tensorboard/pip_package/test_pip_package.sh Outdated Show resolved Hide resolved
wchargin-source: 30e936b83964ab4b78b9c9755fc62849b182c3d2
wchargin-branch: deterministic-tgz
wchargin-source: 30e936b83964ab4b78b9c9755fc62849b182c3d2
wchargin-branch: deterministic-tgz
wchargin-source: 30e936b83964ab4b78b9c9755fc62849b182c3d2
wchargin-branch: bazel-pip-package
wchargin-source: 54daa09db50cca2bc6b41c933fc40e7b124495ac
wchargin-branch: deterministic-tgz
wchargin-source: 54daa09db50cca2bc6b41c933fc40e7b124495ac
wchargin-branch: bazel-pip-package
wchargin-source: 065ea1bdcfbd0621f5998a8e310b9e2e7f1f6a5f
wchargin-branch: deterministic-tgz
wchargin-source: 065ea1bdcfbd0621f5998a8e310b9e2e7f1f6a5f
wchargin-branch: bazel-pip-package
Copy link
Contributor Author

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

tensorboard/pip_package/test_pip_package.sh Outdated Show resolved Hide resolved
# target can extract the tarball and glob its contents.
outs = ["pip_packages.tar.gz"],
cmd = "\n".join([
"tmpdir=\"$$(mktemp -d)\" &&",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay; that’s totally fine with me if we just make it always emit a
.tar.gz, and stomachable if it cases on the filename. Will change
tomorrow.

wchargin-source: a410c619117892bb8945f20c5c277d748801e342
wchargin-branch: deterministic-tgz
wchargin-source: a410c619117892bb8945f20c5c277d748801e342
wchargin-branch: deterministic-tgz
wchargin-source: a410c619117892bb8945f20c5c277d748801e342
wchargin-branch: bazel-pip-package
@wchargin wchargin changed the base branch from wchargin-deterministic-tgz to master June 21, 2019 21:10
wchargin-source: 639878d4e6349a2977efc1ff82d6f6c3d103eaf5
wchargin-branch: bazel-pip-package
wchargin-source: 639878d4e6349a2977efc1ff82d6f6c3d103eaf5
wchargin-branch: bazel-pip-package
wchargin-source: d75820f22a6622d33b775213271b94f8f15ae9a4
wchargin-branch: bazel-pip-package
wchargin-source: c16653dbed2cb17f4909fe0b568fb9fb009bf4ee
wchargin-branch: bazel-pip-package
usage: build_pip_package OUTPUT_DIR

Build TensorBoard Pip packages and store the resulting wheel files
into OUTPUT (a \`.tar.gz\` path or directory).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd say a directory or a \`.tar.gz\` path to make it clear that that there is no need for the directory name to end in .tar.gz since it's otherwise strictly speaking ambiguous

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, absolutely. (I usually try to do this, but missed this one.)

;;
"0")
printf "\nSkipping smoke test\n\n"
cd "${original_wd}" # Bazel gives "${output}" as a relative path >_>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just resolve output to an absolute path while we're still in the original working directory, rather than having to cd back to it later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know a portable way to do that. GNU readlink -e/-f works
perfectly on Linux, but on macOS readlink(1) is basically just
readlink(2). realpath(1) is also not POSIX. (Here is a handy list
of portable utilities
—it’s not large, especially when you look
at (e.g.) xargs and see that there’s no portable -0 option!)

I mean, there’s always

path="$(python -c 'print(__import__("os").path.realpath(__import__("sys").argv[1]))' "${path}")"

which has the downside of being a bit hideous (and also not correct
because command substitutions strip whitespace); just cding back is
significantly simpler imho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If there is a portable way to do this, I would love to know!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

case "${output}" in 
  /*) ;;
  *) output="${PWD}/${output}" ;;
esac

echo >&2 'fatal: unknown smoke value:' "${smoke}"
exit 1
if ! [ -d "${output}" ]; then
printf >&2 'fatal: no such output directory: %s\n' "${output}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems more straightforward to just mkdir -p "${output}" here if it doesn't exist. I don't see any particular reason why it should require the directory to already exist, and this makes it more consistent with the .tar.gz behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring the output directory to exist is more consistent with other applications
that write files to specified directories—

$ tar czf - a.c | tar xzf - -C ./nonexistent/
tar: nonexistent: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
$ cp a.c ./nonexistent/
cp: cannot create regular file './nonexistent/': Not a directory
$ mv a.c ./nonexistent/
mv: cannot move 'a.c' to './nonexistent/': Not a directory
$ ln -s a.c ./nonexistent/
ln: target './nonexistent/' is not a directory: No such file or directory
$ gcc a.c -o ./nonexistent/
/usr/bin/ld: cannot open output file ./nonexistent/: Is a directory
collect2: error: ld returned 1 exit status
$ gcc a.c -o ./nonexistent/a.c
/usr/bin/ld: cannot open output file ./nonexistent/a.c: No such file or directory
collect2: error: ld returned 1 exit status
$ printf 'shell redirections\n' >./nonexistent/out
bash: ./nonexistent/out: No such file or directory

—so I’d prefer to keep it as is unless you feel strongly. (The explicit
error is just to have a nicer message.)

wchargin-source: 243330e9aadb74dd31360a8bcae168c13953b205
wchargin-branch: bazel-pip-package
@wchargin wchargin requested a review from nfelt June 21, 2019 23:23
@wchargin wchargin merged commit 0783b9c into master Jun 24, 2019
@wchargin wchargin deleted the wchargin-bazel-pip-package branch June 24, 2019 17:04
wchargin added a commit that referenced this pull request Jul 25, 2019
Summary:
We don’t use `set -x` indiscriminately because build tool output appears
in Bazel logs, and that would be a lot of spam. But we should print
output on failures, to clarify the cause of issues like #2458.

The original version of #2363 had this functionality in the BUILD rule
itself, but @nfelt found that a bit hard to read (fair), so we include
the relevant functionality in `build_pip_package.sh`.

Recommend ignoring whitespace for reading this patch.

Test Plan:
Building `//tensorboard/pip_package` on a machine with no `virtualenv`
binary now prints:

```
ERROR: /home/wchargin/tensorboard/tensorboard/pip_package/BUILD:8:1: Executing genrule //tensorboard/pip_package:pip_package failed (Exit 127) bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
+ uname
+ [ Linux = Darwin ]
+ sedi=sed -i
+ command -v virtualenv
Target //tensorboard/pip_package:extract_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
FAILED: Build did NOT complete successfully
```

which is pretty clear that the problem is that `virtualenv` is not
installed. After installing `virtualenv`, the build completes
successfully with no output, as desired.

Running

```
bazel run //tensorboard/pip_package:build_pip_package -- /tmp/nonexistent
```

prints simply

```
fatal: no such output directory: /tmp/nonexistent
```

and exits with `1` rather than printing the whole log.

wchargin-branch: pip-package-error-handling
wchargin added a commit that referenced this pull request Jul 26, 2019
Summary:
We don’t use `set -x` indiscriminately because build tool output appears
in Bazel logs, and that would be a lot of spam. But we should print
output on failures, to clarify the cause of issues like #2458.

The original version of #2363 had this functionality in the BUILD rule
itself, but @nfelt found that the multiple layers of escaping made this
a bit hard to read (fair enough), so we now include the relevant
functionality in `build_pip_package.sh`.

Recommend ignoring whitespace for reading this patch.

Test Plan:
Building `//tensorboard/pip_package` on a machine with no `virtualenv`
binary now prints:

```
ERROR: /home/wchargin/tensorboard/tensorboard/pip_package/BUILD:8:1: Executing genrule //tensorboard/pip_package:pip_package failed (Exit 127) bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
++ uname
+ [ Linux = Darwin ]
+ sedi=sed -i
+ command -v virtualenv
Target //tensorboard/pip_package:pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
FAILED: Build did NOT complete successfully
```

which is pretty clear that the problem is that `virtualenv` is not
installed. After installing `virtualenv`, the build completes
successfully with no output, as desired.

wchargin-branch: pip-package-error-handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants