Skip to content

OSHMEM/MCA/ATOMIC/UCX: check UCX version for non-blocking atomics value packing#13665

Merged
gleon99 merged 1 commit intoopen-mpi:mainfrom
roiedanino:fix-ucx-atomics
Jan 20, 2026
Merged

OSHMEM/MCA/ATOMIC/UCX: check UCX version for non-blocking atomics value packing#13665
gleon99 merged 1 commit intoopen-mpi:mainfrom
roiedanino:fix-ucx-atomics

Conversation

@roiedanino
Copy link
Contributor

What

Added a check for UCX version which packs the atomic operation "value" buffer

Why

Otherwise, we might cause stack corruption when UCX tries to access the atomic value pointer after exiting mca_atomic_ucx_op scope

This is a fix for changes introduced in #13568, and relevant for those cherry-pick PRs: #13631 and #13608

@roiedanino
Copy link
Contributor Author

@devreal

op, &value, 1, rva, ucx_mkey->rkey,
&mca_spml_ucp_request_params[size >> 3]);
res = UCS_PTR_IS_ERR(status_ptr) ? OSHMEM_ERROR : OSHMEM_SUCCESS;
if (mca_atomic_ucx_nb_support) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some documentation re why it is ok to drop the request and not care about the value anymore?

Copy link
Contributor Author

@roiedanino roiedanino Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PR in UCX updating documentation: openucx/ucx#11126
which will be ported back to v1.20.x branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought: why do we even need the special handling here? If UCX 1.20 returns no request and caches the value, we can just call opal_common_ucx_wait_request to handle potential errors. The function will never block (because there is no request. So, essential this PR does nothing. We could add documentation that the call to opal_common_ucx_wait_request is just for error handling on UCX >1.20.

@devreal
Copy link
Contributor

devreal commented Jan 20, 2026

@roiedanino please squash your commits into one and then it's ready to be merged.

…ue packing

Signed-off-by: Roie Danino <rdanino@nvidia.com>

OSHMEM/MCA/ATOMIC/UCX: added doc' explaining why there's no need to wait for completion / free status_ptr

Signed-off-by: Roie Danino <rdanino@nvidia.com>
@gleon99 gleon99 merged commit e038451 into open-mpi:main Jan 20, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants