Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPI version 3.0 and node oversubscription #182

Closed
jrper opened this issue Mar 14, 2018 · 3 comments
Closed

OpenMPI version 3.0 and node oversubscription #182

jrper opened this issue Mar 14, 2018 · 3 comments

Comments

@jrper
Copy link
Contributor

jrper commented Mar 14, 2018

It see that OpenMPI version 3.0 has changed the default behaviour regarding oversubscription and mpirun. On this version, a plain call like mpirun -np 8 python -c "print('Hello')" will fail with an MPI error on systems with fewer than 8 slots available. This includes several of the Fluidity tests, which can request anything up to 16 processes.

The "right way" to do things is now to specify mpirun -np 8 -oversubscribe python -c "print('Hello')", on specific calls, or to define an environment variable, OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -np 8 python -c "print('Hello')" to get back to the openmpi 2.0 behaviour. I'm not sure what the right thing to do for us is, possibly to use the last option inside of test harness itself?

@Patol75
Copy link
Contributor

Patol75 commented Mar 15, 2018

From another GitHub project, they ran into the same kind of error (dealii/dealii#5123) and fixed it using the environment variable method (dealii/dealii#5142).
Another idea could be to have the test cases adaptive, in the sense that they would ask the system for the number of available processors, with nproc --all for example, and adjust the number of requested MPI processes accordingly. But this would imply having to modify quite a few test cases xml files, with probably nprocs not being treated as an attribute any more (whenever nprocs > 1) but rather included into the command line element such as:

#!/bin/bash
nprocs=16
nprocsLocal=$(nproc --all)
if (( nprocsLocal >= nprocs )); then
  # All fine
else
  # Either oversubscribe or use the maximum number of procs available
fi

Or even better, it could stay an attribute if its value is known at runtime (I have no idea if true or not). The xml modification could be achieved with a simple enough lxml Python script.

@stephankramer
Copy link
Contributor

Seems to me using the environment variable, setting it in testharness would be the simplest solution indeed. As @Patol75 's link suggests, we can just always set it, and it shouldn't hurt the case where we don't use openmpi 3.0

As for the other suggestions: I'm afraid it's a little more complicated:

  • the tests don't run sequentially. Testharness runs multiple tests in paralel, with the maximum number of tests being run set by the -threads option (or use make THREADS=8 test). This is essential as many tests are serial, so we need this to get a decent turn around time on multi-core system.
  • if the tests themselves are parallel already, testharness does not take that into account - i.e. what it doesn't do is add up the specified nprocs for the tests such that the number of requested cores of all concurrently running tests doesn't exceed the number of threads specified as the testharness command line option. Instead it simply oversubscribes. So if you run with -threads=4 and it happens to pick 4 tests that each have nprocs=8, it will be using 32 cores. This could be made more clever, but would require to implement some kind of scheduling mechanism (and there's various ways to go about that). Since the simple strategy of just overscribing worked reasonably well, we haven't bothered doing anything more sophisticated
  • making the tests such that they can run on an arbitrary number of processors is another can of worms. In principle it can be done, but it would indeed be rather invasive and you will find a lot of corner cases, for instance if it tries to run a very small problem (many tests have trivially small meshes) on too many cores, some tests actually test flredecomping between different number of cores, etc. etc.

jrper added a commit that referenced this issue May 14, 2018
Apply the change proposed in #182 to allow oversubscription of MPI slots in openmpi 3.0 and above.
jrper added a commit that referenced this issue Jun 6, 2018
Apply the change proposed in #182 to allow oversubscription of MPI slots in openmpi 3.0 and above.
tmbgreaves pushed a commit that referenced this issue Jun 8, 2018
Apply the change proposed in #182 to allow oversubscription of MPI slots in openmpi 3.0 and above.
jrper added a commit that referenced this issue Jun 13, 2018
Apply the change proposed in #182 to allow oversubscription of MPI slots in openmpi 3.0 and above.
@jrper jrper closed this as completed Jun 13, 2018
@jrper
Copy link
Contributor Author

jrper commented Jun 13, 2018

Closed by #199, through added environment variable in the testharness.

tmbgreaves added a commit that referenced this issue Apr 7, 2021
Core oversubscription should be switched on for tests through
testharness (see issue #182) but doesn't appear to be working,
so explicitly turning it on in the base images.
tmbgreaves added a commit that referenced this issue Apr 7, 2021
Core oversubscription should be switched on for tests through
testharness (see issue #182) but doesn't appear to be working,
so explicitly turning it on in the base images.
tmbgreaves added a commit that referenced this issue Apr 7, 2021
Core oversubscription should be switched on for tests through
testharness (see issue #182) but doesn't appear to be working,
so explicitly turning it on in the base images.
tmbgreaves added a commit that referenced this issue Apr 7, 2021
Core oversubscription should be switched on for tests through
testharness (see issue #182) but doesn't appear to be working,
so explicitly turning it on in the base images.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants