Take advantage of larger job runners in CI build tests #921
+18
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks to the help of Google's ML Velocity team, the TensorFlow Quantum project has access to larger job runners. We can use them for the really time-consuming jobs in the CI build checks workflow, which are the library build and the tutorial tests.
In addition, this PR makes a few other small adjustments:
The
tutorial-testsjob does not need to depend onwheel-buildto finish. (Maybe it did in the past?) Consequently, we can take out the dependency and have all 3 jobs run in parallel, which speeds up on the overall workflow.In two places where
./configure.shwas invoked, that step was immediately followed by a step to run./scripts/build_pip_package_test.sh, which also runs./configure.sh. We can take out the redundant invocations ofconfigure.shin this workflow.Note: I didn't change the
wheel-buildjob to use the new runners because it's not a bottleneck – the most time consuming job in here is the tutorials tests – the wheel-build job is not the bottleneck. The larger runners are more expensive ($/per minute) to run, so if we can't benefit from them, it doesn't make sense to use them.Here is an example of changed workflow run-times. First, a sample of what it is before changes:
And now with the workflow changes:
A typical run of the build tests has gone from ~22 minutes to ~7 min (approx 1/3 of what it used to be); that speedup is due to the use of the new ML team runners. The overall time has gone down from ~24 min to ~16 min, or about 2/3 of what it used to be. The bottleneck is the tutorial tests. The time for doing the tutorial tests has barely improved because the tutorial test script does not take advantage of parallelism. (Something to be improved in the future.)