Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul CI workflows to improve efficiency #392

Merged
merged 28 commits into from
Sep 13, 2021

Conversation

ChrisCummins
Copy link
Contributor

@ChrisCummins ChrisCummins commented Sep 10, 2021

This patch set overhauls the CI workflow to achieve a 59.4% reduction in total compute time and a 19.4% reduction in wall time.

Background

The "CI" workflow is run on every pull request and update to the stable and development branches of CompilerGym. It is responsible for running the test suite on a range of supported Python versions and operating systems to catch regressions and test new features. The CI workflow had grown to be very computationally hungry because of the large number of tests, lengthy build process, and large number of runtime configurations.

In the previous configuration, the CI workflow would spawn four job types:

  • bazel_test (example run): these jobs would run the test suite using bazel.
  • install_test (example run): these jobs would build the compiler_gym Python wheel and then run the test suite using pytest.
  • llvm-service-asan (example run): the same as install_test, except that it would build the LLVM compiler service with address sanitizer support and run only the LLVM test suite with leaking checking enabled.
  • pytest-cov (example run): the same install_test, except it run the test suite with code coverage enabled and upload the report to codecov.com.

In total, 12 jobs were spawned and processed independently and in parallel requiring a massive 11 hours of compute time for every single change or PR update:

Job OS Python Runtime
bazel_test Linux 3.9 53:06
bazel_test macOS 3.9 1:05:17
install_test Linux 3.6 58:05
install_test Linux 3.7 56:36
install_test Linux 3.8 49:35
install_test Linux 3.9 54:26
install_test macOS 3.6 52:19
install_test macOS 3.7 47:56
install_test macOS 3.8 53:58
install_test macOS 3.9 43:07
llvm-service-asan Linux 3.9 1:07:35
pytest-cov Linux 3.9 53:13
Total - - 10:55:04
Wall time - - 1:07:35

Much of this compute time is redundant and wasteful:

  • Each of the 8 install_test jobs builds a Python wheel from scratch even though the Python wheel is version insensitive.
  • The bazel_test and install_test jobs run the same test suite redundantly.
  • Only a single instance of the tests could be run macOS. Running multiple test suites on macOS using different Python versions is unlikely to provide useful differences from the same tests on Linux.

This is made worse because there is no caching on account of GitHub's cache size limits.

New approach

This pull requests uses the workflow artifacts mechanism to make the CI workflow more efficient by breaking it into a graph of smaller, dependent jobs:

image

The jobs have the following types:

  • build: Build a pair of compiler_gym Python wheels for Linux and macOS and upload them as artifacts for use by other jobs.
  • build-asan-llvm-service: Build the LLVM service with address sanitizer support (and build nothing else). Upload the artifact for use in other jobs.
  • test: Once the build job has complete, download the wheel artifact and run the pytest suite on it, excluding the examples/ and llvm/ directories. Upload a code coverage report artifact.
  • test-examples: Once the build job has complete, download the wheel artifact and run the pytest suite on it from the examples/ directory. Upload a code coverage report artifact.
  • test-llvm-env: Once the build job has complete, download the wheel artifact and run the pytest suite on it from the llvm/ directory. Upload a code coverage report artifact.
  • test-llvm-env-linux-asan: Once the build and build-asan-llvm-service jobs has complete, download the wheel artifact, repack it using the asan LLVM service build, and run the pytest suite on it from the llvm/ directory.
  • upload-coverage-reports: Download all code cover report artifacts and combine them into a single upload to codecov.com.

Splitting the building and testing into separate jobs enables a single build to be shared across test runners.

Sharding the test suite into three subsets ("core", examples, and llvm-env) achieves greater test parallelism, reducing the wall time 13 minutes.

Further, each of the build / test jobs are defined independently for macOS and Linux so that the tests for one platform do not block on the build for another. In total 20 jobs are spawned:

Job OS Python Runtime
build-linux Linux 3.9 18:21
bazel-macos macOS 3.9 30:34
build-asan-llvm-service Linux 3.9 17:56
test-linux Linux 3.6 3:09
test-linux Linux 3.7 3:04
test-linux Linux 3.8 3:23
test-linux Linux 3.9 3:25
test-macos macOS 3.8 4:08
test-macos macOS 3.9 2:20
test-examples-linux Linux 3.6 2:19
test-examples-linux Linux 3.7 2:07
test-examples-linux Linux 3.8 2:00
test-examples-linux Linux 3.9 2:10
test-examples-macos macOS 3.9 3:29
test-llvm-env-linux Linux 3.6 31:32
test-llvm-env-linux Linux 3.7 26:21
test-llvm-env-linux Linux 3.8 26:30
test-llvm-env-linux Linux 3.9 31:38
test-llvm-env-macos macOS 3.9 15:25
test-llvm-env-linux-asan Linux 3.9 34:33
upload-coverage-reports - - 0:06
Total - - 4:24:31
Wall time - - 54:07

Differences with old CI config

The functionality of the new CI workflow is not exactly equivalent to the old config:

  • Code coverage reports are now aggregated from all test runners, covering all Python and OS versions.
  • The 5 C++ tests are no longer run on the CI. To run them, use bazel test locally.
  • Python versions 3.6 and 3.7 are no longer tested on macOS. Only the core test suite is run on Python 3.8, not the LLVM / example tests.

Drawbacks

  • Implementation complexity: The new CI workflow is much more efficient, but this is achieved by having a more complex workflow configuration file. The CI workflow definition YAMLs have grown from 178 LOC spread across 3 files to 477 LOC in a single file. Worse, the YAML contains large amounts of duplicate code, as there is no mechanism for templating jobs and each of the test-<foo> and build-<foo> job definitions typically differ only in one or two lines.

  • Maintenance burden: Sharding the test suite so that test runners execute only a subset of the test suite is an entirely manual process and introduces a maintenance burden. Now any changes to the test suite will require updating the CI configs. Additionally, the test job shards have an unbalanced amount of work to do. For example, the LLVM tests take approximately 30 minutes, whereas the example tests take only 2 minutes. Further PRs may be required to achieve a more granular breakdown of test jobs.

  • Increased number of macOS jobs: GitHub's runners permit a maximum of 5 macOS jobs to run in parallel. The new CI configuration specifies 5 jobs. In the future as we add more macOS jobs we will not be able to achieve greater parallelism.

Issue #385.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2021
@ChrisCummins ChrisCummins force-pushed the ci-jobs branch 4 times, most recently from 756c2f0 to db3d161 Compare September 10, 2021 18:11
@codecov-commenter
Copy link

codecov-commenter commented Sep 10, 2021

Codecov Report

Merging #392 (a72efb8) into development (a8408e8) will decrease coverage by 0.73%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           development     #392      +/-   ##
===============================================
- Coverage        85.91%   85.17%   -0.74%     
===============================================
  Files               87       87              
  Lines             4743     4743              
===============================================
- Hits              4075     4040      -35     
- Misses             668      703      +35     
Impacted Files Coverage Δ
compiler_gym/envs/llvm/datasets/poj104.py 37.70% <0.00%> (-39.35%) ⬇️
compiler_gym/envs/llvm/datasets/anghabench.py 55.55% <0.00%> (-25.00%) ⬇️
compiler_gym/envs/llvm/datasets/cbench.py 77.99% <0.00%> (-0.78%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a8408e8...a72efb8. Read the comment docs.

@ChrisCummins ChrisCummins force-pushed the ci-jobs branch 4 times, most recently from 231f1dd to 62aa880 Compare September 11, 2021 12:48
@ChrisCummins ChrisCummins changed the title 🏗️ WIP: Overhaul CI workflows to improve efficiency Overhaul CI workflows to improve efficiency Sep 11, 2021
@ChrisCummins ChrisCummins force-pushed the ci-jobs branch 6 times, most recently from 46e8ade to 2f4cef4 Compare September 12, 2021 22:23
@ChrisCummins ChrisCummins marked this pull request as ready for review September 13, 2021 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants