-
Notifications
You must be signed in to change notification settings - Fork 942
Fix usage of PMIx_Fence_nb to conform with PMIx Standard #13628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@janjust This is timing out in some test NVIDIA runs, but I have no idea why and no way to test it myself (my Mac doesn't support UCX). I can try to do it in some kind of container, but would need advice on how. Otherwise, I'd just consider this as a suggestion PR and close it. |
|
@rhc54 let me give it a try |
|
I'm guessing update pmix sha is intended with this PR |
|
I removed the changes to common_ucx.c and now it passes. So there is something odd going on down there, but I don't have the ability to test it. Perhaps what is needed is to set the active flag to "false" in the case of a singleton, or when the fence isn't going to issue a callback? Might need some experimenting. |
|
yeah it now passes, because the initial PR, you didn't set the fenced in case of atomic completion so when it came to tear down it was stuck in the if(!fenced) the _nb code worked fine. |
|
See the other comment thread on #13626 (comment) - there is a necessary change to fix what @hppritcha observed. Your tests don't seem to hit that use-case. I'm just not including it here. I updated PMIx here because it was changed to strictly conform to the Standard - which it wasn't before. Hence, the spml/ucx component will now fail with these changes when running as a singleton. |
|
Please remove the PMIx sha update and include in a separate PR. |
hppritcha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the pmix sha removal from this pr and open a separate one to do that.
The PMIx_Fence_nb function can return PMIX_OPERATION_SUCCEEDED to indicate that the function was executed atomically and the callback function will therefore not be called. The PMIx Standard lists a few reasons why this can happen, but the point here was to fix usage to properly handle that possibility. Signed-off-by: Ralph Castain <rhc@pmix.org>
The PMIx_Fence_nb function can return PMIX_OPERATION_SUCCEEDED to indicate
that the function was executed atomically and the callback function will
therefore not be called. The PMIx Standard lists a few reasons why this
can happen, but the point here was to fix usage to properly handle
that possibility.