Skip to content

osc/rdma: ensure bml add_procs has been called for all local procs#8452

Merged
hppritcha merged 1 commit intoopen-mpi:v4.0.xfrom
hjelmn:v4.0.x_call_add_procs_on_all_allocated_procs_if_osc_rdma_is_selected_in_case_btl_ucx_is_in_use_for_two_sided_really_i_need_to_determine_the_best_way_to_ensure_that_vader_works_in_this_case_without_slowing_it_down
Feb 8, 2021
Merged

osc/rdma: ensure bml add_procs has been called for all local procs#8452
hppritcha merged 1 commit intoopen-mpi:v4.0.xfrom
hjelmn:v4.0.x_call_add_procs_on_all_allocated_procs_if_osc_rdma_is_selected_in_case_btl_ucx_is_in_use_for_two_sided_really_i_need_to_determine_the_best_way_to_ensure_that_vader_works_in_this_case_without_slowing_it_down

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Feb 5, 2021

This fixes a bug when ob1 was not selected as the pml but osc/rdma may be
selected for an MPI window. In some cases we may use btl/sm. If this is the
case we need to ensure btl/sm knows about all the local procs (not just the
ones in the communicator). This is required for btl/sm to correctly function
at this time.

In the future btl/sm should be made more resilient.

Fixes #8434

Signed-off-by: Nathan Hjelm hjelmn@google.com
(cherry picked from commit 8040d05)

This fixes a bug when ob1 was not selected as the pml but osc/rdma may be
selected for an MPI window. In some cases we may use btl/sm. If this is the
case we need to ensure btl/sm knows about all the local procs (not just the
ones in the communicator). This is required for btl/sm to correctly function
at this time.

In the future btl/sm should be made more resilient.

Fixes open-mpi#8434

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 8040d05)
@hjelmn
Copy link
Member Author

hjelmn commented Feb 5, 2021

Well, it appears git does put a limit on the branch name.

@gpaulsen
Copy link
Member

gpaulsen commented Feb 8, 2021

@hppritcha #8453 for v4.1.x was merged. We should merge also.
There is some discussion that this only fixes a symptom, but that it's still incorrect that UCX PML is not calling add_procs on local processes inside of MPI_Init (they're doing it lazily later, but apparently that breaks some design)

@hppritcha hppritcha merged commit fd204e3 into open-mpi:v4.0.x Feb 8, 2021
@gpaulsen gpaulsen added the NEWS label Feb 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments