Overhaul CI workflows to improve efficiency #392

ChrisCummins · 2021-09-10T15:57:24Z

This patch set overhauls the CI workflow to achieve a 59.4% reduction in total compute time and a 19.4% reduction in wall time.

Background

The "CI" workflow is run on every pull request and update to the stable and development branches of CompilerGym. It is responsible for running the test suite on a range of supported Python versions and operating systems to catch regressions and test new features. The CI workflow had grown to be very computationally hungry because of the large number of tests, lengthy build process, and large number of runtime configurations.

In the previous configuration, the CI workflow would spawn four job types:

bazel_test (example run): these jobs would run the test suite using bazel.
install_test (example run): these jobs would build the compiler_gym Python wheel and then run the test suite using pytest.
llvm-service-asan (example run): the same as install_test, except that it would build the LLVM compiler service with address sanitizer support and run only the LLVM test suite with leaking checking enabled.
pytest-cov (example run): the same install_test, except it run the test suite with code coverage enabled and upload the report to codecov.com.

In total, 12 jobs were spawned and processed independently and in parallel requiring a massive 11 hours of compute time for every single change or PR update:

Job	OS	Python	Runtime
bazel_test	Linux	3.9	53:06
bazel_test	macOS	3.9	1:05:17
install_test	Linux	3.6	58:05
install_test	Linux	3.7	56:36
install_test	Linux	3.8	49:35
install_test	Linux	3.9	54:26
install_test	macOS	3.6	52:19
install_test	macOS	3.7	47:56
install_test	macOS	3.8	53:58
install_test	macOS	3.9	43:07
llvm-service-asan	Linux	3.9	1:07:35
pytest-cov	Linux	3.9	53:13
Total	-	-	10:55:04
Wall time	-	-	1:07:35

Much of this compute time is redundant and wasteful:

Each of the 8 install_test jobs builds a Python wheel from scratch even though the Python wheel is version insensitive.
The bazel_test and install_test jobs run the same test suite redundantly.
Only a single instance of the tests could be run macOS. Running multiple test suites on macOS using different Python versions is unlikely to provide useful differences from the same tests on Linux.

This is made worse because there is no caching on account of GitHub's cache size limits.

New approach

This pull requests uses the workflow artifacts mechanism to make the CI workflow more efficient by breaking it into a graph of smaller, dependent jobs:

The jobs have the following types:

build: Build a pair of compiler_gym Python wheels for Linux and macOS and upload them as artifacts for use by other jobs.
build-asan-llvm-service: Build the LLVM service with address sanitizer support (and build nothing else). Upload the artifact for use in other jobs.
test: Once the build job has complete, download the wheel artifact and run the pytest suite on it, excluding the examples/ and llvm/ directories. Upload a code coverage report artifact.
test-examples: Once the build job has complete, download the wheel artifact and run the pytest suite on it from the examples/ directory. Upload a code coverage report artifact.
test-llvm-env: Once the build job has complete, download the wheel artifact and run the pytest suite on it from the llvm/ directory. Upload a code coverage report artifact.
test-llvm-env-linux-asan: Once the build and build-asan-llvm-service jobs has complete, download the wheel artifact, repack it using the asan LLVM service build, and run the pytest suite on it from the llvm/ directory.
upload-coverage-reports: Download all code cover report artifacts and combine them into a single upload to codecov.com.

Splitting the building and testing into separate jobs enables a single build to be shared across test runners.

Sharding the test suite into three subsets ("core", examples, and llvm-env) achieves greater test parallelism, reducing the wall time 13 minutes.

Further, each of the build / test jobs are defined independently for macOS and Linux so that the tests for one platform do not block on the build for another. In total 20 jobs are spawned:

Job	OS	Python	Runtime
build-linux	Linux	3.9	18:21
bazel-macos	macOS	3.9	30:34
build-asan-llvm-service	Linux	3.9	17:56
test-linux	Linux	3.6	3:09
test-linux	Linux	3.7	3:04
test-linux	Linux	3.8	3:23
test-linux	Linux	3.9	3:25
test-macos	macOS	3.8	4:08
test-macos	macOS	3.9	2:20
test-examples-linux	Linux	3.6	2:19
test-examples-linux	Linux	3.7	2:07
test-examples-linux	Linux	3.8	2:00
test-examples-linux	Linux	3.9	2:10
test-examples-macos	macOS	3.9	3:29
test-llvm-env-linux	Linux	3.6	31:32
test-llvm-env-linux	Linux	3.7	26:21
test-llvm-env-linux	Linux	3.8	26:30
test-llvm-env-linux	Linux	3.9	31:38
test-llvm-env-macos	macOS	3.9	15:25
test-llvm-env-linux-asan	Linux	3.9	34:33
upload-coverage-reports	-	-	0:06
Total	-	-	4:24:31
Wall time	-	-	54:07

Differences with old CI config

The functionality of the new CI workflow is not exactly equivalent to the old config:

Code coverage reports are now aggregated from all test runners, covering all Python and OS versions.
The 5 C++ tests are no longer run on the CI. To run them, use bazel test locally.
Python versions 3.6 and 3.7 are no longer tested on macOS. Only the core test suite is run on Python 3.8, not the LLVM / example tests.

Drawbacks

Implementation complexity: The new CI workflow is much more efficient, but this is achieved by having a more complex workflow configuration file. The CI workflow definition YAMLs have grown from 178 LOC spread across 3 files to 477 LOC in a single file. Worse, the YAML contains large amounts of duplicate code, as there is no mechanism for templating jobs and each of the test-<foo> and build-<foo> job definitions typically differ only in one or two lines.
Maintenance burden: Sharding the test suite so that test runners execute only a subset of the test suite is an entirely manual process and introduces a maintenance burden. Now any changes to the test suite will require updating the CI configs. Additionally, the test job shards have an unbalanced amount of work to do. For example, the LLVM tests take approximately 30 minutes, whereas the example tests take only 2 minutes. Further PRs may be required to achieve a more granular breakdown of test jobs.
Increased number of macOS jobs: GitHub's runners permit a maximum of 5 macOS jobs to run in parallel. The new CI configuration specifies 5 jobs. In the future as we add more macOS jobs we will not be able to achieve greater parallelism.

Issue #385.

codecov-commenter · 2021-09-10T19:00:42Z

Codecov Report

Merging #392 (a72efb8) into development (a8408e8) will decrease coverage by 0.73%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           development     #392      +/-   ##
===============================================
- Coverage        85.91%   85.17%   -0.74%     
===============================================
  Files               87       87              
  Lines             4743     4743              
===============================================
- Hits              4075     4040      -35     
- Misses             668      703      +35

Impacted Files	Coverage Δ
compiler_gym/envs/llvm/datasets/poj104.py	`37.70% <0.00%> (-39.35%)`	⬇️
compiler_gym/envs/llvm/datasets/anghabench.py	`55.55% <0.00%> (-25.00%)`	⬇️
compiler_gym/envs/llvm/datasets/cbench.py	`77.99% <0.00%> (-0.78%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a8408e8...a72efb8. Read the comment docs.

Issue facebookresearch#385.

The asan CI job started failing with a permission error on the Csmith binary. Try setting the executable bit: PermissionError: [Errno 13] Permission denied: '/opt/hostedtoolcache/Python/3.9.6/x64/lib/python3.9/site-packages/compiler_gym/third_party/csmith/csmith/bin/csmith'

This is because other tests can clobber the shared FLAGS state, changing the behavior of the tests.

This adds a `pip install -r compiler_gym/requirements.txt` step to the `make install` target, as otherwise the package can be installed without resolving the required deps.

Issue facebookresearch#385.

Don't run a dedicate coverage test job, instead collect coverage reports from all test jobs and merge the results.

Issue facebookresearch#385.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2021

ChrisCummins force-pushed the ci-jobs branch 4 times, most recently from 756c2f0 to db3d161 Compare September 10, 2021 18:11

ChrisCummins force-pushed the ci-jobs branch 4 times, most recently from 231f1dd to 62aa880 Compare September 11, 2021 12:48

ChrisCummins changed the title ~~🏗️ WIP: Overhaul CI workflows to improve efficiency~~ Overhaul CI workflows to improve efficiency Sep 11, 2021

ChrisCummins force-pushed the ci-jobs branch 6 times, most recently from 46e8ade to 2f4cef4 Compare September 12, 2021 22:23

ChrisCummins marked this pull request as ready for review September 13, 2021 08:30

ChrisCummins and others added 11 commits September 13, 2021 14:52

[ci] Install and upgrade pip and wheel packages.

b58aef6

Issue facebookresearch#385.

[ci] Decompose the CI workflow into setup/build/test jobs.

fe1b720

Issue facebookresearch#385.

[llvm] Skip dataset pickle tests on CI.

753c1fe

[tests] Run fewer dataset tests on CI.

fc2fc3e

[ci] Split LLVM tests into a separate job.

a8a8955

Issue facebookresearch#385.

[ci] Don't redundantly install compiler gym requirements.

25c2260

[ci] Merge asan tests into CI workflow.

04f4c52

[ci] Rename jobs.

6fb08ed

[Makefile] Run example tests on make install-test.

1a0cffc

[tests] Merge and de-duplicate flag definitions.

dd0a417

ChrisCummins and others added 17 commits September 13, 2021 14:53

[ci] Split examples tests into separate jobs.

d9960c8

[tests] Always clear FLAGS state before manual_env tests.

f630b54

This is because other tests can clobber the shared FLAGS state, changing the behavior of the tests.

[Makefile] Install compiler gym dependencies on make install.

df55343

This adds a `pip install -r compiler_gym/requirements.txt` step to the `make install` target, as otherwise the package can be installed without resolving the required deps.

[ci] Simplify job naming scheme.

0ad6e37

Issue facebookresearch#385.

[ci] Collect coverage reports from regular test jobs.

ce6aa3d

Don't run a dedicate coverage test job, instead collect coverage reports from all test jobs and merge the results.

[ci] Fixes for examples tests.

959ad18

Issue facebookresearch#385.

[Makefile] Enable coverage report path to be changed.

0bc8ead

[ci] Version the build artifacts by run ID.

40f61d9

[ci] Report all test coverage from all jobs.

0153772

[ci] Fix coverage path setter.

2da218b

Issue facebookresearch#385.

[ci] Run fewer jobs on macOS.

e282b4c

[ci] Fix artifact names.

ada610e

[ci] Compress asan build artifact to reduce size.

1cdf76c

[ci] Fix asan job.

230eb98

[ci] Remove double-os name in CI artifact.

1eaa72f

[ci] Fix CI job names.

d8832ce

[ci] Use more consistent job name.

9fb7432

ChrisCummins force-pushed the ci-jobs branch from 4260e72 to 9fb7432 Compare September 13, 2021 13:54

ChrisCummins mentioned this pull request Sep 13, 2021

Improve the CI efficiency #385

Closed

5 tasks

ChrisCummins merged commit 08ab58c into facebookresearch:development Sep 13, 2021

ChrisCummins deleted the ci-jobs branch September 13, 2021 23:28

ChrisCummins mentioned this pull request Sep 16, 2021

[ci] Test coverage reports are no longer working #401

Closed

This was referenced Sep 28, 2021

CompilerGym Release v0.2.0 #434

Merged

CompilerGym Release v0.2.0 #439

Merged

ChrisCummins mentioned this pull request May 31, 2022

Run all bazel tests in CI? #693

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul CI workflows to improve efficiency #392

Overhaul CI workflows to improve efficiency #392

ChrisCummins commented Sep 10, 2021 •

edited

Loading

codecov-commenter commented Sep 10, 2021 •

edited

Loading

Overhaul CI workflows to improve efficiency #392

Overhaul CI workflows to improve efficiency #392

Conversation

ChrisCummins commented Sep 10, 2021 • edited Loading

Background

New approach

Differences with old CI config

Drawbacks

codecov-commenter commented Sep 10, 2021 • edited Loading

Codecov Report

ChrisCummins commented Sep 10, 2021 •

edited

Loading

codecov-commenter commented Sep 10, 2021 •

edited

Loading