-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds fabric handle and memory protection flags to cuda_async_memory_resource #1743
Adds fabric handle and memory protection flags to cuda_async_memory_resource #1743
Conversation
@bdice if you get a chance, I can't set the labels here, so I will need some help with this. |
/ok to test |
2 similar comments
/ok to test |
/ok to test |
I may not be in the right team or have the right permissions. If a reviewer could try triggering the build that would be great. |
…esource Signed-off-by: Alessandro Bellina <[email protected]>
23c651e
to
e91c3ca
Compare
Note, I force pushed to put have a single signed commit. I didn't realize I needed to sign every commit in the PR. |
/ok to test |
hi @vyasr when you get a chance, could you take a look? |
if (access_flag) { | ||
cudaMemAccessDesc desc; | ||
desc.location = pool_props.location; | ||
desc.flags = static_cast<cudaMemAccessFlags>(access_flag.value_or(access_flags::prot_none)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside this block it must be true that access_flag
has a value, so I think you can use
desc.flags = static_cast<cudaMemAccessFlags>(*access_flag);
Thinking more. In the case that access_flag
is nullopt
we never call cudaMemPoolSetAccess
at all. Is this the intention?
I think you probably mean something like (without the need for the conditional)
cudaMemAccessDesc desc = {
.location = pool_props.location,
.flags = static_cast<cudaMemAccessFlags>(access_flag.value_or(access_flags::prot_none))
};
RMM_CUDA_TRY(cudaMemPoolSetAccess(pool_handle(), &desc, 1));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So before this change, we never called cudaMemPoolSetAccess
. I didn't want to add another api call in the default case where the caller doesn't want to set access at all, reserving this for the case where the caller states they want to set it. Would you be OK if I took your:
desc.flags = static_cast<cudaMemAccessFlags>(*access_flag);
suggestion in the block, but didn't call the API in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never called cudaMemPoolSetAccess
I tried to understand from the runtime API docs what the consequence of the current status quo is (and whether it differs from unilaterally calling cudaMemPoolSetAccess(..., PROT_NONE)
), but I could not figure it out. So unless we have good evidence that just doing so is fine, I suspect we should do as you did.
I would encourage the use of the named struct field initialiser though :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The named struct field initializer is a change I was able to make. Pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Alessandro Bellina <[email protected]>
8d40966
to
56220a4
Compare
@wence- please take another look, let me know if you still want part of the |
I think it's ok without. |
… check-style Signed-off-by: Alessandro Bellina <[email protected]>
/ok to test |
/ok to test |
I’ll take another review pass today. |
/ok to test |
Hi @bdice let me know if you have further comments. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abellina! All looks fine to me.
/merge |
Closes #1753 It is a follow up from #1743 I would like for rapidsai/cudf#17553 to merge first, that way I don't break the build. I've learned that I was using `cudaMemPoolSetAccess` incorrectly. This API should only be used from a `peer` device, not from the device that created the pool. This is the reason why calling `cudaMemPoolSetAccess` with none throws an error as documented here #1753. I have tested that I can still export the fabric handles and import them using UCX in a peer device with the default access that pool owner device gets (read+write is the default). Note that this read+write default access cannot be revoked from the owner, as it wouldn't make sense to have memory that nobody has access to, but peers can call `cudaMemPoolSetAccess` to gain read+write access or to stop accessing (none) a peer's pool memory. Authors: - Alessandro Bellina (https://github.com/abellina) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1754
Closes #1741
Description
This PR adds a new
fabric
handle type inallocation_handle_type
. It also adds an optionalaccess_flags
to set the memory access desired when exporting (prot_none
, orprot_read_write
). Pools that are not meant to be shareable should omit these flags.Please note that I can't add a unit test that exports or imports these fabric handles, because it would require system setup that doesn't look to be portable.
Checklist