Skip to content

Conversation

@awlauria
Copy link
Contributor

@awlauria awlauria commented Aug 16, 2021

Some misc. fixes that haven't been brought back.

 * To prevent deadlock force all previously requested locks to complete
   before starting a new lock. Otherwise this can lead to deadlock if
   the locks are processes in arbitrary order.

Signed-off-by: Mark Allen <[email protected]>
(cherry picked from commit 20b4618)
 * The `opal_list_remove_first` is protected by the accumulation lock
   not the module lock. So make sure to use the same lock to protect
   the `opal_list_append` to prevent current access, which can cause
   a segv and/or lost enties in the list.

Signed-off-by: Joshua Hursey <[email protected]>
(cherry picked from commit 130d196)
 * The `ompi_osc_pt2pt_sync_pscw_peer` call needs ot be protected when
   called, and the `sync` structure need to be protected in the `start`
   function.
   - This prevents the `ompi_osc_pt2pt_sync_pscw_peer` function from
     processing while `start` is also in progress on the same window in
     two different threads (e.g., `progress` and `main` thread)
   - This seems to happen when the 'main' thread is part way through
     the `start` function then the progress thread starts processing
     the `post` message received from another peer for this window.
     Both functions try to access the `peer_list` portion of the
     structure and a NULL is stepped on in the `ompi_osc_pt2pt_sync_pscw_peer`
     function.
   - This patch locks the `sync` structure for the duration of the `start`
     and ensures that whenever `ompi_osc_pt2pt_sync_pscw_peer` is called
     the caller is holding the `sync->lock`. This provides exclusivity
     to this structure ensuring that the latter function sees a fully
     updated structure.

Signed-off-by: Joshua Hursey <[email protected]>
(cherry picked from commit ab49b8a)
@awlauria awlauria added this to the v4.1.2 milestone Aug 16, 2021
@jsquyres
Copy link
Member

bot:ompi:retest

@jsquyres jsquyres merged commit 0b85b8c into open-mpi:v4.1.x Aug 30, 2021
@awlauria awlauria deleted the osc_pt2pt_changes_v4.1.x branch March 17, 2022 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants