OSHMEM/MCA/ATOMIC/UCX: check UCX version for non-blocking atomics value packing#13665
OSHMEM/MCA/ATOMIC/UCX: check UCX version for non-blocking atomics value packing#13665gleon99 merged 1 commit intoopen-mpi:mainfrom
Conversation
| op, &value, 1, rva, ucx_mkey->rkey, | ||
| &mca_spml_ucp_request_params[size >> 3]); | ||
| res = UCS_PTR_IS_ERR(status_ptr) ? OSHMEM_ERROR : OSHMEM_SUCCESS; | ||
| if (mca_atomic_ucx_nb_support) { |
There was a problem hiding this comment.
Can you please add some documentation re why it is ok to drop the request and not care about the value anymore?
There was a problem hiding this comment.
This is the PR in UCX updating documentation: openucx/ucx#11126
which will be ported back to v1.20.x branch
There was a problem hiding this comment.
On second thought: why do we even need the special handling here? If UCX 1.20 returns no request and caches the value, we can just call opal_common_ucx_wait_request to handle potential errors. The function will never block (because there is no request. So, essential this PR does nothing. We could add documentation that the call to opal_common_ucx_wait_request is just for error handling on UCX >1.20.
|
@roiedanino please squash your commits into one and then it's ready to be merged. |
…ue packing Signed-off-by: Roie Danino <rdanino@nvidia.com> OSHMEM/MCA/ATOMIC/UCX: added doc' explaining why there's no need to wait for completion / free status_ptr Signed-off-by: Roie Danino <rdanino@nvidia.com>
9747295 to
3d25da6
Compare
What
Added a check for UCX version which packs the atomic operation "value" buffer
Why
Otherwise, we might cause stack corruption when UCX tries to access the atomic value pointer after exiting
mca_atomic_ucx_opscopeThis is a fix for changes introduced in #13568, and relevant for those cherry-pick PRs: #13631 and #13608