Introduce a way to run test in parallel #1025

benoit-cty · 2021-06-25T19:18:19Z

Hi there!

I really enjoy OpenFisca, but I recently encountered an issue.

Here is what I did:

openfisca test --country-package openfisca_france tests

Here is what I expected to happen:

Use of all the core of my CPU to run test so that it took less than 3 minutes.

Here is what actually happened:

Only one core is used, so test took 17 minutes.

Context

I identify more as a:

Developer (I create tools that use the existing OpenFisca code at https://LexImpact.an.fr).

Experimentation I have made

Use GNU parallel

find tests/**/*.{yaml,yml} | parallel -j 32 --progress openfisca test --country-package openfisca_france | grep FAILED
Pros:

it is really fast, less than 3 minutes.
Cons:
All the test output is printed to the console, so you need to process them to know error, with grep FAILED for example.
Need an install at OS level sudo apt-get install parallel.

Use of Xdist

pytest-xdist is a Pytest plugins that run test in parallel.
To use it:
pip install pytest-xdist[psutil]
Add in setup.cfg:

[tool:pytest]
addopts      = -n auto`

Run the test
openfisca test --country-package openfisca_france tests

openfisca test  --country-package openfisca_france tests
=================================================================================================== test session starts ====================================================================================================
platform linux -- Python 3.8.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /LEXIMPACT/openfisca-france, configfile: setup.cfg
plugins: forked-1.3.0, xdist-2.3.0
gw0 [42] / gw1 [42] / gw2 [42] / gw3 [42] / gw4 [42] / gw5 [42] / gw6 [42] / gw7 [42] / gw8 [42] / gw9 [42] / gw10 [42] / gw11 [42] / gw12 [42] / gw13 [42] / gw14 [42] / gw15 [42]
..........................................
============================================================================================ 42 passed, 1522 warnings in 16.44s ============================================================================================

Pros:

Use all CPU core
Nice pytest output
Cons:
Need to upgrade pytest version dependencies
Don't work as expected : it run the test of the project like a normal pytest, not the OpenFisca YAML testfiles.

It seems that the OpenFisca pytest plugin https://github.com/openfisca/openfisca-core/blob/master/openfisca_core/tools/test_runner.py need to override some xdist method to give it the right test.

Use pytest-parallel

https://github.com/browsertron/pytest-parallel
Not tested, it seems to work like Xdist, so may have the same problem.

The text was updated successfully, but these errors were encountered:

bonjourmauko · 2021-06-27T09:12:04Z

Hi @benoit-cty and thanks for this issue.

I see there is an option overlap regarding -n https://github.com/openfisca/openfisca-core/blob/master/openfisca_core/scripts/openfisca_command.py#L34-L43.

So I've tried this:

PYTEST_ADDOPTS="--numprocesses=auto" openfisca test --country-package openfisca_france tests

And got:

platform darwin -- Python 3.7.0, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 rootdir: /Users/hyperion/Sites/openfisca/openfisca-france, configfile: setup.cfg plugins: xdist-2.3.0, forked-1.3.0 gw0 [42] / gw1 [42] / gw2 [42] / gw3 [42] .......................................... ==================================================================================== 42 passed, 1510 warnings in 34.30s ====================================================================================

~~We should further investigate why all those warnings.~~

The same result you had, so I confirm that it actually overrides OpenFisca's test collector.

bonjourmauko · 2021-06-27T09:50:13Z

Tried directly adding the option to openfisca test.

# openfisca_command.py

    def build_test_parser(parser):
        parser.add_argument('path', help = "paths (files or directories) of tests to execute", nargs = '+')
        parser = add_tax_benefit_system_arguments(parser)
        parser.add_argument('-n', '--name_filter', default = None, help = "partial name of tests to execute. Only tests with the given name_filter in their name, file name, or keywords will be run.")
        parser.add_argument('-p', '--pdb', action = 'store_true', default = False, help = "drop into debugger on failures or errors")
        parser.add_argument('--performance-graph', '--performance', action = 'store_true', default = False, help = "output a performance graph in a 'performance_graph.html' file")
        parser.add_argument('--performance-tables', action = 'store_true', default = False, help = "output performance CSV tables")
        parser.add_argument('-v', '--verbose', action = 'store_true', default = False, help = "increase output verbosity")
        parser.add_argument('-o', '--only-variables', nargs = '*', default = None, help = "variables to test. If specified, only test the given variables.")
        parser.add_argument('-i', '--ignore-variables', nargs = '*', default = None, help = "variables to ignore. If specified, do not test the given variables.")
        parser.add_argument('-d', '--distributed', nargs = 1, default = None, help = "run tests distributed. Use `auto` to detect the number of cores.")

# run_test.py

    options = {
        'pdb': args.pdb,
        'performance_graph': args.performance_graph,
        'performance_tables': args.performance_tables,
        'verbose': args.verbose,
        'name_filter': args.name_filter,
        'only_variables': args.only_variables,
        'ignore_variables': args.ignore_variables,
        'distributed': args.distributed,
        }

# test_runner.py

    argv = ["--capture", "no"]

    if options.get('pdb'):
        argv.append('--pdb')

    if options.get("distributed"):
        argv.append(f"--numprocesses={options.get('distributed')[0]}")

Results where not better:

COUNTRY_TEMPLATE_PATH=`python -c "import openfisca_country_template; print(openfisca_country_template.CountryTaxBenefitSystem().get_package_metadata()['location'])"`
openfisca test -d auto $COUNTRY_TEMPLATE_PATH/openfisca_country_template/tests/

=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.7.0, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/hyperion
plugins: xdist-2.3.0, cov-2.11.1, forked-1.3.0
gw0 [0] / gw1 [0] / gw2 [0] / gw3 [0]

========================================================================================== no tests ran in 0.50s ===========================================================================================

benoit-cty · 2021-06-27T10:11:08Z

Thanks for investigating, I think I will open an issue in Xdist to get some help.

bonjourmauko · 2021-06-27T10:17:19Z

Finally from https://github.com/pytest-dev/pytest-xdist/blob/1dc019709c15678faa622834d3bc851abddb426c/OVERVIEW.md#L70

xdist works by spawning one or more workers, which are controlled by the master. Each worker is responsible for performing a full test collection and afterwards running tests as dictated by the master.

The master receives the result of the collection from all nodes. At this point the master performs some sanity check to ensure that all workers collected the same tests (including order), bailing out otherwise. If all is well, it converts the list of test-ids into a list of simple indexes, where each index corresponds to the position of that test in the original collection list. This works because all nodes have the same collection list, and saves bandwidth because the master can now tell one of the workers to just execute test index 3 index of passing the full test id.

Why does each worker do its own collection, as opposed to having the master collect once and distribute from that collection to the workers?

If collection was performed by master then it would have to serialize collected items to send them through the wire, as workers live in another process. The problem is that test items are not easily (impossible?) to serialize, as they contain references to the test functions, fixture managers, config objects, etc. Even if one manages to serialize it, it seems it would be very hard to get it right and easy to break by any small change in pytest.

bonjourmauko · 2021-06-27T13:05:48Z

So it seems pytest-xdist creates a master worker, that interrupts normal test collection —so pytest_collect_item is not called, instantiates a node for each processor, then lets each node do the collection work.

It seems to me that the solution is to hook up OpenFiscaPlugin to each node, however I wasn't able to do so.

Hope it helps!

benoit-cty · 2021-06-28T09:56:29Z

Thanks for all the details, I've openned an XDist Issue: pytest-dev/pytest-xdist#677

bonjourmauko · 2021-07-31T09:34:44Z

pytest-dev/pytest-xdist#681 (comment)

unfortunately manually passed pytest plugins do not pass over to xdist

if your plugin is a straightforward single file thing anyway i strongly recomment to use the -p option to pytest to pass it over as then the workers in xdist will also try to use them

MattiSG · 2021-08-19T14:44:49Z

If I understand that correctly, xdist is a dead-end.

Why doesn't nose suffice, as was the intention in #516?

bonjourmauko · 2021-08-19T16:33:05Z

Hello @MattiSG,

If I understand that correctly, xdist is a dead-end.

As I currently understand it yes, because of the way we use pystest.

Why doesn't nose suffice, as was the intention in #516?

At the time of that discussion, nose was being shelved for deprecation. However, work on nose2 seems quite active today.

bonjourmauko · 2021-08-19T17:14:12Z

By the way, I've got parallel tests probably working here, to be tested with France for example.

MattiSG · 2021-08-24T10:28:24Z

@HAEKADI since you have created a GitHub Actions syntax to try out parallel tests in #1027, could you please investigate if @benoit-cty’s suggestion to use https://github.com/nektos/act would also work for parallelising locally? 😃

HAEKADI · 2021-09-23T14:00:10Z

I investigated the use of act in OpenFisca-France here :)

bonjourmauko self-assigned this Jun 27, 2021

bonjourmauko added the kind:ci Continuous ops, integration & deployment label Jun 27, 2021

benoit-cty mentioned this issue Jun 28, 2021

How to use xdist with a custom test collector ? pytest-dev/pytest-xdist#677

Closed

MattiSG mentioned this issue Jul 27, 2021

Improve tests performance #1027

Closed

MattiSG mentioned this issue Aug 19, 2021

Switch from CircleCI to GitHub Actions #1030

Closed

bonjourmauko mentioned this issue Aug 20, 2021

Run tests in a parallel matrix #1031

Closed

10 tasks

bonjourmauko added this to the Improved testing milestone Sep 29, 2021

bonjourmauko mentioned this issue Dec 6, 2022

Introduce a way to run tests in parallel #1171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a way to run test in parallel #1025

Introduce a way to run test in parallel #1025

benoit-cty commented Jun 25, 2021 •

edited

Loading

bonjourmauko commented Jun 27, 2021 •

edited

Loading

bonjourmauko commented Jun 27, 2021

benoit-cty commented Jun 27, 2021

bonjourmauko commented Jun 27, 2021

bonjourmauko commented Jun 27, 2021

benoit-cty commented Jun 28, 2021

bonjourmauko commented Jul 31, 2021

MattiSG commented Aug 19, 2021

bonjourmauko commented Aug 19, 2021 •

edited

Loading

bonjourmauko commented Aug 19, 2021

MattiSG commented Aug 24, 2021

HAEKADI commented Sep 23, 2021

Introduce a way to run test in parallel #1025

Introduce a way to run test in parallel #1025

Comments

benoit-cty commented Jun 25, 2021 • edited Loading

Here is what I did:

Here is what I expected to happen:

Here is what actually happened:

Context

Experimentation I have made

Use GNU parallel

Use of Xdist

Use pytest-parallel

bonjourmauko commented Jun 27, 2021 • edited Loading

bonjourmauko commented Jun 27, 2021

benoit-cty commented Jun 27, 2021

bonjourmauko commented Jun 27, 2021

bonjourmauko commented Jun 27, 2021

benoit-cty commented Jun 28, 2021

bonjourmauko commented Jul 31, 2021

MattiSG commented Aug 19, 2021

bonjourmauko commented Aug 19, 2021 • edited Loading

bonjourmauko commented Aug 19, 2021

MattiSG commented Aug 24, 2021

HAEKADI commented Sep 23, 2021

benoit-cty commented Jun 25, 2021 •

edited

Loading

bonjourmauko commented Jun 27, 2021 •

edited

Loading

bonjourmauko commented Aug 19, 2021 •

edited

Loading