Skip to content

Conversation

@bureddy
Copy link
Member

@bureddy bureddy commented Dec 3, 2022

Why

Fix hang during exit with TCP transport

How

Call opal_progress during pmix fence. Otherwise a ucx/tcp transport used by coll/hcoll component would not make progress an cause a hang.

Signed-off-by: Devendar Bureddy [email protected]

@github-actions
Copy link

github-actions bot commented Dec 3, 2022

Hello! The Git Commit Checker CI bot found a few problems with this PR:

31fb932: common/ucx: call opal_progress when waiting for pm...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

Comment on lines +419 to +422
MCA_COMMON_UCX_PROGRESS_LOOP(worker) {
if(fenced) {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the difference if whether progress is called if fenced==true before line 419?
but if fence is already completed why still need to call progress?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the fix here is that MCA_COMMON_UCX_PROGRESS_LOOP also invokes opal_progress which progresses hcoll

@github-actions
Copy link

github-actions bot commented Dec 4, 2022

Hello! The Git Commit Checker CI bot found a few problems with this PR:

31fb932: common/ucx: call opal_progress when waiting for pm...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@jsquyres jsquyres added this to the v4.1.5 milestone Dec 4, 2022
Signed-off-by: Devendar Bureddy <[email protected]>
(cherry picked from commit ed18cd3)
@jsquyres jsquyres merged commit 5980bac into open-mpi:v4.1.x Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants