Skip to content

SPML/UCX: removed direct dependency to SPML UCX#8567

Merged
yosefe merged 1 commit intoopen-mpi:masterfrom
hoopoepg:topic/ucx-atomic-dependencies
Mar 11, 2021
Merged

SPML/UCX: removed direct dependency to SPML UCX#8567
yosefe merged 1 commit intoopen-mpi:masterfrom
hoopoepg:topic/ucx-atomic-dependencies

Conversation

@hoopoepg
Copy link
Contributor

@hoopoepg hoopoepg commented Mar 9, 2021

  • added pointer to global object mca_spml_ucx into
    local context datatype to eliminate linker
    dependency

fixes #8542

@hoopoepg hoopoepg force-pushed the topic/ucx-atomic-dependencies branch from b0af83e to 5c76d8a Compare March 9, 2021 10:25
@hoopoepg hoopoepg requested a review from yosefe March 9, 2021 12:01
.ucp_peers = NULL,
.options = 0
.options = 0,
.spml_ucx = &mca_spml_ucx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just move synchronized_quiet to this struct? instead of moving &mca_spml_ucx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case we have to additionally handle initialization of synchronized_quiet and update this value in global object - it makes fix more complicated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it will worth it: better to keep "mca_spml_ucx" private, and mca_spml_ucx_ctx_t "shared"
and not point from "shared" to "private"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then may be pointer to synchronized_quiet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think better just move this bool field. no need to keep it in 2 places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

@gpaulsen
Copy link
Member

gpaulsen commented Mar 9, 2021

Is this master / v5.0 only? or intended for bringing back to v4.0.x/v4.1.x?

@hoopoepg
Copy link
Contributor Author

hoopoepg commented Mar 9, 2021

as I can see 4.x affected too...
backport it into 4.x branches?

rc = mca_spml_ucx_ctx_create_common(SHMEM_CTX_PRIVATE, &mca_spml_ucx.aux_ctx);
if (OSHMEM_SUCCESS == rc) {
mca_spml_ucx.aux_ctx->spml_ucx = &mca_spml_ucx;
mca_spml_ucx.aux_ctx->synchronized_quiet = mca_spml_ucx_ctx_default.synchronized_quiet;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should move it inside mca_spml_ucx_ctx_create_common

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will not work: there is array of pre-allocated objects which created without mca_spml_ucx_ctx_create_common call

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but mca_spml_ucx_ctx_create_common already sets other fields of *mca_spml_ucx.aux_ctx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, moved

- added synchronise_quiet parameter to local
  context object

Signed-off-by: Sergey Oblomov <sergeyo@nvidia.com>
@hoopoepg hoopoepg force-pushed the topic/ucx-atomic-dependencies branch from 5f6f176 to 01d7164 Compare March 10, 2021 06:10
@hoopoepg hoopoepg added this to the v5.0.0 milestone Mar 10, 2021
@yosefe
Copy link
Contributor

yosefe commented Mar 10, 2021

@jsquyres can you please give this a try, to verify it fixes #8542?

@jsquyres
Copy link
Member

Seems to work for me. 👍

@gpaulsen
Copy link
Member

@hoopoepg could you please merge, and then PR to release branches? Thank you.

@hoopoepg
Copy link
Contributor Author

@hoopoepg could you please merge

I can't - have no permissions to merge into master
@jsquyres could you merge?

@yosefe yosefe merged commit 9f845f0 into open-mpi:master Mar 11, 2021
@yosefe yosefe deleted the topic/ucx-atomic-dependencies branch March 11, 2021 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UCX symbol error

4 participants