Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion oshmem/mca/atomic/ucx/atomic_ucx_module.c
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,19 @@ static ucp_request_param_t mca_spml_ucp_request_params[] = {
};
#endif

static int mca_atomic_ucx_nb_support = 0;

/*
* Initial query function that is invoked during initialization, allowing
* this module to indicate what level of thread support it provides.
*/
int mca_atomic_ucx_startup(bool enable_progress_threads, bool enable_threads)
{
unsigned major, minor, release_number;
ucp_get_version(&major, &minor, &release_number);

mca_atomic_ucx_nb_support = UCX_VERSION(major, minor, release_number) >= UCX_VERSION(1, 20, 0);

return OSHMEM_SUCCESS;
}

Expand Down Expand Up @@ -73,7 +80,15 @@ int mca_atomic_ucx_op(shmem_ctx_t ctx,
status_ptr = ucp_atomic_op_nbx(ucx_ctx->ucp_peers[pe].ucp_conn,
op, &value, 1, rva, ucx_mkey->rkey,
&mca_spml_ucp_request_params[size >> 3]);
res = UCS_PTR_IS_ERR(status_ptr) ? OSHMEM_ERROR : OSHMEM_SUCCESS;
if (mca_atomic_ucx_nb_support) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some documentation re why it is ok to drop the request and not care about the value anymore?

Copy link
Contributor Author

@roiedanino roiedanino Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the PR in UCX updating documentation: openucx/ucx#11126
which will be ported back to v1.20.x branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought: why do we even need the special handling here? If UCX 1.20 returns no request and caches the value, we can just call opal_common_ucx_wait_request to handle potential errors. The function will never block (because there is no request. So, essential this PR does nothing. We could add documentation that the call to opal_common_ucx_wait_request is just for error handling on UCX >1.20.

/* UCX is packing (copying) the value pointer, so there's no need to wait for completion
(no stack corruption concerns). Additionally, there's no need to free the status pointer
as its already freed by ucp_atomic_op_nbx when a reply buffer is not provided. */
res = UCS_PTR_IS_ERR(status_ptr) ? OSHMEM_ERROR : OSHMEM_SUCCESS;
} else {
res = opal_common_ucx_wait_request(status_ptr, ucx_ctx->ucp_worker[0],
"ucp_atomic_op_nbx");
}
#else
status = ucp_atomic_post(ucx_ctx->ucp_peers[pe].ucp_conn,
op, value, size, rva,
Expand Down