mpi: Patch neighborhood construction #1768
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
From the MPI specs (for example at this link):
In master, we're catching the MPI_ERR_ARG as if that were an indication that the MPI rank is at the boundary. This is the case for many many MPI distributions and configurations, but (unsurprisingly) not for all. I have a server with several GPUs and the NVidia MPI configured as per our
Dockerfile.nvidia
whereGet_cart_rank
does not triggerMPI_ERR_RANK
for MPI ranks at the boundary -- rather, it provides the MPI rank of the process at the opposite side of the virtual topology.With this patch, we explicitly use
MPI.PROC_NULL
, when necessary, for the neighborhood of the MPI ranks at the grid boundary.PS: I don't really know how to add a test here. But this edit simply makes sure we honor the MPI specification as we should have done in the first place