-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMPI version 3.0 and node oversubscription #182
Comments
From another GitHub project, they ran into the same kind of error (dealii/dealii#5123) and fixed it using the environment variable method (dealii/dealii#5142).
Or even better, it could stay an attribute if its value is known at runtime (I have no idea if true or not). The xml modification could be achieved with a simple enough lxml Python script. |
Seems to me using the environment variable, setting it in testharness would be the simplest solution indeed. As @Patol75 's link suggests, we can just always set it, and it shouldn't hurt the case where we don't use openmpi 3.0 As for the other suggestions: I'm afraid it's a little more complicated:
|
Apply the change proposed in #182 to allow oversubscription of MPI slots in openmpi 3.0 and above.
Apply the change proposed in #182 to allow oversubscription of MPI slots in openmpi 3.0 and above.
Apply the change proposed in #182 to allow oversubscription of MPI slots in openmpi 3.0 and above.
Apply the change proposed in #182 to allow oversubscription of MPI slots in openmpi 3.0 and above.
Closed by #199, through added environment variable in the testharness. |
Core oversubscription should be switched on for tests through testharness (see issue #182) but doesn't appear to be working, so explicitly turning it on in the base images.
Core oversubscription should be switched on for tests through testharness (see issue #182) but doesn't appear to be working, so explicitly turning it on in the base images.
Core oversubscription should be switched on for tests through testharness (see issue #182) but doesn't appear to be working, so explicitly turning it on in the base images.
Core oversubscription should be switched on for tests through testharness (see issue #182) but doesn't appear to be working, so explicitly turning it on in the base images.
It see that OpenMPI version 3.0 has changed the default behaviour regarding oversubscription and mpirun. On this version, a plain call like
mpirun -np 8 python -c "print('Hello')"
will fail with an MPI error on systems with fewer than 8 slots available. This includes several of the Fluidity tests, which can request anything up to 16 processes.The "right way" to do things is now to specify
mpirun -np 8 -oversubscribe python -c "print('Hello')"
, on specific calls, or to define an environment variable,OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -np 8 python -c "print('Hello')"
to get back to the openmpi 2.0 behaviour. I'm not sure what the right thing to do for us is, possibly to use the last option inside of test harness itself?The text was updated successfully, but these errors were encountered: