Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve travis.yml config for faster CI #2278

Merged
merged 4 commits into from
May 30, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 40 additions & 98 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
dist: trusty
sudo: required

language: python
python:
- "2.7"
- "3.6"

os:
- linux

branches:
only:
- master
Expand All @@ -29,97 +24,55 @@ env:
- TF_VERSION_ID= # Do not install TensorFlow in this case

cache:
pip: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it looks like this just caches the Pip cache directory and not the
actual virtualenv contents, which is good. (If it cached the virtualenv,
we might be relatively safe anyway, given that all our Pip dependencies
are either pinned or installed with -I… but we could still be hit by
any Pip problems w.r.t. reinstalling packages in the same env.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, but yes this is fine, though we do get weird pip cache deserialization errors fairly often for reasons that are unclear to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ಠ_ಠ

🙈

lgtm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set of changes left after removing all of them from view trivially looks good, huh?

directories:
- $HOME/.bazel-output-base
- $HOME/.cache/tb-bazel-repo
- $HOME/.cache/tb-bazel-disk

# Each bullet point is displayed in the Travis log as one collapsed line, which
# indicates how long it took. Travis will check the return code at the end. We
# can't use `set -e` in the YAML file since it might impact Travis internals.
# If inline scripts get too long, Travis surprisingly prints them twice.

before_install:
# Travis pre-installs an old version of numpy. We uninstall it to
# reduce the potential for strange behavior when upgrading in-place
# for TensorFlow's numpy dependency.
- pip uninstall -y numpy
- pip freeze # print installed distributions, for debugging purposes
- |
# Download Bazel
bazel_binary="$(mktemp)" &&
bazel_checksum_file="$(mktemp)" &&
printf >"${bazel_checksum_file}" \
'%s %s\n' "${BAZEL_SHA256SUM}" "${bazel_binary}" &&
for url in \
"http://mirror.tensorflow.org/github.com/bazelbuild/bazel/releases/download/${BAZEL}/bazel-${BAZEL}-linux-x86_64" \
"https://github.com/bazelbuild/bazel/releases/download/${BAZEL}/bazel-${BAZEL}-linux-x86_64" \
; do
if \
wget -t 3 -O "${bazel_binary}" "${url}" &&
shasum -a 256 --check "${bazel_checksum_file}"; then
break
else
rm -f "${bazel_binary}"
fi
done &&
rm "${bazel_checksum_file}" &&
[ -f "${bazel_binary}" ]
- chmod +x "${bazel_binary}"
- sudo mv "${bazel_binary}" /usr/local/bin/bazel

# Fix Boto and Travis issue https://github.com/travis-ci/travis-ci/issues/7940
- sudo rm -f /etc/boto.cfg

# Storing build artifacts in this directory helps Travis cache them. This
# will sometimes cut latency in half, when we're lucky.
- echo "startup --output_base=${HOME}/.bazel-output-base" >>~/.bazelrc

# Travis Trusty Sudo GCE VMs have 2 cores and 7.5 GB RAM. These settings
# help Bazel go faster and not OOM the system.
- echo "startup --host_jvm_args=-Xms500m" >>~/.bazelrc
- echo "startup --host_jvm_args=-Xmx500m" >>~/.bazelrc
- echo "startup --host_jvm_args=-XX:-UseParallelGC" >>~/.bazelrc
- echo "build --local_resources=400,2,1.0" >>~/.bazelrc
- echo "build --worker_max_instances=2" >>~/.bazelrc

# Make Bazel as strict as possible, so TensorBoard will build correctly
# for users, regardless of their Bazel configuration.
- echo "build --worker_verbose" >>~/.bazelrc
- echo "build --worker_sandboxing" >>~/.bazelrc
- echo "build --spawn_strategy=sandboxed" >>~/.bazelrc
- echo "build --genrule_strategy=sandboxed" >>~/.bazelrc
- echo "test --test_verbose_timeout_warnings" >>~/.bazelrc

# It's helpful to see the errors on failure.
- echo "build --verbose_failures" >>~/.bazelrc
- echo "test --test_output=errors" >>~/.bazelrc

# We need to pass the PATH from our virtualenv down into our tests,
# which is non-hermetic and so disabled by default in Bazel 0.21.0+.
- echo "test --action_env=PATH" >>~/.bazelrc
- elapsed() { TZ=UTC printf "Time %(%T)T %s\n" "$SECONDS" "$@"; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you mean "$*" here?

$ bash -c 'TZ=UTC printf "Time %(%T)T %s\n" "$SECONDS" one two'
Time 00:00:00 one
bash: line 0: printf: two: invalid number
Time 00:00:00 

(Or, from usage, maybe just "${1-}"?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Changed to just ${1}.

- elapsed "before_install"
- ci/download_bazel.sh "${BAZEL}" "${BAZEL_SHA256SUM}" ~/bazel
- sudo mv ~/bazel /usr/local/bin/bazel
- cp ci/bazelrc ~/.bazelrc
- elapsed "before_install (done)"

install:
- pip install boto3==1.9.86
- elapsed "install"
# Lint check deps.
- pip install flake8==3.5.0
- pip install futures==3.1.1
- pip install grpcio==1.6.3
- pip install moto==1.3.7
- pip install yamllint==1.5.0
# TensorBoard deps.
- pip install futures==3.1.1
- pip install grpcio==1.0
Copy link
Contributor

@wchargin wchargin May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the version downgrade? Was grpcio==1.6.3 before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted, thanks for the catch. I have no idea how that line ended up changing... I can't trace it to any other change. I must have just clobbered it somehow.

# Uninstall older Travis numpy to avoid upgrade-in-place issues.
- pip uninstall -y numpy
- |
# Install TensorFlow if requested
if [ -n "${TF_VERSION_ID}" ]; then
pip install -I "${TF_VERSION_ID}"
else
# Requirements typically found through TensorFlow
pip install "absl-py>=0.7.0"
pip install "numpy<2.0,>=1.14.5"
# Requirements typically found through TensorFlow.
pip install "absl-py>=0.7.0" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, good catch. Would be nice to have some kind of --chain-lint.

&& pip install "numpy<2.0,>=1.14.5"
fi
# Deps for gfile S3 test.
- pip install boto3==1.9.86
- pip install moto==1.3.7
# Workaround for https://github.com/travis-ci/travis-ci/issues/7940
- sudo rm -f /etc/boto.cfg
- elapsed "install (done)"

before_script:
# fail the build if there are Python syntax errors or undefined names
- elapsed "before_script"
# Do a fail-fast check for Python syntax errors or undefined names.
# Use the comment '# noqa: <error code>' to suppress.
- flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
# a comment of '# noqa' or better yet '# noqa: <error code>' added to the code to silence flake8
- flake8 . --count --exit-zero --ignore=E111,E114 --max-complexity=10 --max-line-length=127 --statistics
# Lint .yaml docs files. Use '# yamllint disable-line rule:foo' to suppress.
- yamllint -c docs/.yamllint docs docs/.yamllint
# Make sure we aren't accidentally including work-in-progress code.
Expand All @@ -128,42 +81,31 @@ before_script:
- tensorboard/tools/license_test.sh
# Make sure that IPython notebooks have valid Markdown.
- tensorboard/tools/docs_list_format_test.sh
- elapsed "before_script (done)"

# Commands in this section should only fail if it's our fault. Travis will
# categorize them as 'failed', rather than 'error' for other sections.
script:
- elapsed "script"
# Note: bazel test implies fetch+build, but this gives us timing.
- bazel fetch //tensorboard/...
- bazel build //tensorboard/...
- elapsed && bazel fetch //tensorboard/...
- elapsed && bazel build //tensorboard/...
- |
# When TensorFlow is not installed, run a restricted subset of tests.
if [ -z "${TF_VERSION_ID}" ]; then
test_tag_filters=support_notf
# Run tests (only a restricted subset if TensorFlow is not installed).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that this makes the string “bazel test … exited with 3” no
longer show up on one line of the build log, which was the original
reason for pulling out test_tag_filters:
#2075 (comment)

If you still prefer it this way, fine with me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, so that's why it's structured like this. I reverted the change but moved the variable setting up to before_script so that if that part fails the build will error, rather than proceeding with the test command.

elapsed && if [ -n "${TF_VERSION_ID}" ]; then
bazel test //tensorboard/...
else
test_tag_filters=
bazel test //tensorboard/... --test_tag_filters=support_notf
fi
- bazel test //tensorboard/... --test_tag_filters="${test_tag_filters}"
- elapsed && bazel run //tensorboard/pip_package:build_pip_package -- --tf-version "${TF_VERSION_ID}" --smoke
# Run manual S3 test
- bazel test //tensorboard/compat/tensorflow_stub:gfile_s3_test
- bazel run //tensorboard/pip_package:build_pip_package -- --tf-version "${TF_VERSION_ID}" --smoke
- elapsed && bazel test //tensorboard/compat/tensorflow_stub:gfile_s3_test
- elapsed "script (done)"

after_script:
# Bazel launches daemons unless --batch is used.
- elapsed "after_script"
- bazel shutdown

before_cache:
- |
# Scrub tiny build artifacts not worth caching.
find "${HOME}/.bazel-output-base" \
-name \*.runfiles -print0 \
-or -name \*.tar.gz -print0 \
-or -name \*-execroot.json -print0 \
-or -name \*-tsc.json -print0 \
-or -name \*-params.pbtxt -print0 \
-or -name \*-args.txt -print0 \
-or -name \*.runfiles_manifest -print0 \
-or -name \*.server_params.pbtxt -print0 \
| xargs -0 rm -rf

notifications:
email: false
34 changes: 34 additions & 0 deletions ci/bazelrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Limit resources since Travis Trusty GCE VMs have 2 cores and 7.5 GB RAM.
build --local_resources=4000,2,1.0
build --worker_max_instances=2

# Ensure sandboxing is on to increase hermeticity.
build --spawn_strategy=sandboxed
build --worker_sandboxing

# Ensure the PATH env var from our virtualenv propagates into tests, which is
# no longer on by default in Bazel 0.21.0 and possibly again in the future.
# We set this flag for "build" since "test" inherits it, but if we don't set
# it for build too, this causes a rebuild at test time, and if we set it for
# both we hit https://github.com/bazelbuild/bazel/issues/8237.
#
# See also:
# https://github.com/bazelbuild/bazel/issues/7095 (protobuf PATH sensitivity)
# https://github.com/bazelbuild/bazel/issues/7026 (future of action_env)
build --action_env=PATH

# Set up caching on local disk so incremental builds are faster.
# See https://bazel.build/designs/2016/09/30/repository-cache.html
build --repository_cache=~/.cache/tb-bazel-repo
fetch --repository_cache=~/.cache/tb-bazel-repo
query --repository_cache=~/.cache/tb-bazel-repo
# See https://docs.bazel.build/versions/master/remote-caching.html#disk-cache
build --disk_cache=~/.cache/tb-bazel-disk

# Log more information to help with debugging, and disable curses output which
# just adds more clutter to the log. (Travis spoofs an interactive terminal.)
common --curses=no
build --verbose_failures
build --worker_verbose
test --test_output=errors
test --test_verbose_timeout_warnings
49 changes: 49 additions & 0 deletions ci/download_bazel.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/sh
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

# Script to download Bazel binary directly onto a build machine.

set -e

die() {
printf >&2 "%s\n" "$1"
exit 1
}

if [ "$#" -ne 3 ]; then
die "Usage: ${0} <version> <sha256sum> <destination-file>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Google style is to omit braces on positional parameters:
https://google.github.io/styleguide/shell.xml?showone=Variable_expansion#Variable_expansion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

fi

version="$1"
checksum="$2"
dest="$3"

temp_dest="$(mktemp)"

mirror_url="https://mirror.bazel.build/github.com/bazelbuild/bazel/releases/download/${version}/bazel-${version}-linux-x86_64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use http://mirror.tensorflow.org/ instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

github_url="https://github.com/bazelbuild/bazel/releases/download/${version}/bazel-${version}-linux-x86_64"

for url in "${mirror_url}" "${github_url}"; do
wget -t 3 -O "${temp_dest}" "${url}" \
&& printf "%s %s\n" "${checksum}" "${temp_dest}" | shasum -a 256 --check \
|| { rm -f "${temp_dest}"; continue; }
mv "${temp_dest}" "${dest}"
break
done

[ -f "${dest}" ]
chmod +x "${dest}"
ls -l "${dest}"