-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Parallelise provider tests execution in order to speed them up #30305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c08b2f5 to
2582a0d
Compare
|
Example output: Also - it allows to run all the tests locally (including all providers) much faster (if you have multiple CPUS and a lot of memory) if you want to see if you have not broken any of the providers. Plus as a side effect, if any of the provider tests fail, you know exactly which one and seeing the output of that test is much easier (unfolding the result of single provider is much faster and it's easier to scroll through the output of only that provider. |
c548f3b to
6ad546d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will make Providers tests safe to run for Public runners - each of them will require much less of memory and the way it is done where each "provider" test will run in their separate docker container that will be torn down immediately when the test type completes, we will continuously reclaim resources from each test type while we are running the tests on Public runners.
|
I had to clean the networks a bit less aggressively (selectively each network rather than by prunning all unused networks - due to race conditions that were caused by massive parallelism we have :D |
6ad546d to
972b9e5
Compare
|
BTW. It seems we might save a LOT on more parallelisation. The full tests take ~ 13 minutes instead of ~ 17 which is whooping ~25%. I just need to work a bit on stability, as it seems we might put a bit too much pressure on the machines (and I need to check it with public runners) |
972b9e5 to
8dec067
Compare
|
OK. For public runners, it's as expected - slower than running all Provider tests together (34m with the split vs 26m without) - the overhead of startting docker-composes (including the DB) there is significantly bigger than parallelism gain (there are just 2 CPUS). So I Will disable this split for public runners, and see if I can stabilize it for the self-hosted ones. |
Previously all tests for providers were executed as single test type,
even if only subset of provider tests were executed. However when
tests for all providers are executed (for example when main tests
are run or when you change some important files and add new
dependencies) "Providers" test run all the tests for all providers
sequentially which makes them longest running test type by far.
This has the side effect that often for those cases Providers test
type is running alone where all the other test types had already
completed. This means that parallelism of the CI instances is not
used to a full extent.
This change leverages the current "parallel" framework of breeze that
runs tests and rather than running single "Providers[...]" test type,
it breakes the Provider test type into separate test type for each
provider (if all providers are tested, this means that there are
70+ provider tests/docker composes running - not all in parallel, but
up to CPU number ones are run in parallel via pooling mechanism).
This required a number of small changes, that resulted not only
faster runs in CI and better use of multi-CPU instances, but also
it allows much easier to utilise parallelism of local machines
of develiopers to run complete set of Airflow tests using as many
CPUS as they have allocated to their docker engine.
The changes include:
* splitting the Provider type into as many providers are being
tested ("Providers[nnn]")
* escaping test type to avoid rich treating the provider names
as not-defined markup and allows to display status in coloured
form
* pruning of docker-compose networks before, after and during
the tests in order to avoid network exhaustion - each test type
has its own docker-compose created network
* avoiding unnecessary environment checks before each test in
case of parallel test execution (produced unnecessary output)
* amazon provider, general providers and WWW are always put at the
beginning of the list if they are in the list - they are by far the
longest so by starting them early we give them a chance to not be
the last running test type, using parallelism / multiple CPUS
more efficiently.
8dec067 to
18b6e3c
Compare
|
Actually it turned out that just sorting the test types (and having Providers and WWW first) is enough to get similar gains (with much more stable results). Closing in favour of that . |

Previously all tests for providers were executed as single test type, even if only subset of provider tests were executed. However when tests for all providers are executed (for example when main tests are run or when you change some important files and add new dependencies) "Providers" test run all the tests for all providers sequentially which makes them longest running test type by far.
This has the side effect that often for those cases Providers test type is running alone where all the other test types had already completed. This means that parallelism of the CI instances is not used to a full extent.
This change leverages the current "parallel" framework of breeze that runs tests and rather than running single "Providers[...]" test type, it breakes the Provider test type into separate test type for each provider (if all providers are tested, this means that there are 70+ provider tests/docker composes running - not all in parallel, but up to CPU number ones are run in parallel via pooling mechanism).
This required a number of small changes, that resulted not only faster runs in CI and better use of multi-CPU instances, but also it allows much easier to utilise parallelism of local machines of develiopers to run complete set of Airflow tests using as many CPUS as they have allocated to their docker engine.
The changes include:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.