-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][UR][HIP] Hip adapter multi device ctx #11105
Conversation
Same as #10737 but for HIP. This move to a multi device context makes context interop tricky for reasons outlined here #10975 . I have suggested that we deprecate context interop for CUDA and HIP plugins. Images are still not supported in AMDGPU, so I will most likely need to revisit some of this code relating to images once the tests are working. |
9402e7a
to
97c04eb
Compare
@@ -279,29 +270,16 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemGetInfo(ur_mem_handle_t hMemory, | |||
/// \return UR_RESULT_SUCCESS | |||
UR_APIEXPORT ur_result_t UR_APICALL | |||
urMemGetNativeHandle(ur_mem_handle_t hMem, ur_native_handle_t *phNativeMem) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I have made this entry point UNSUPPORTED for the reasons listed in comment.
97c04eb
to
e32e26e
Compare
b93c0fa
to
d6ea231
Compare
d1c92cb changed ScopedContext to take a device instead of a context thus sematically changing its meaning. This rename makes it clear what the intented usages are.
e707ee2
to
0de2831
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L0 review requested here, but no L0 code.
+1 just for to allow merging.
fb6a43f
to
c2a3cf7
Compare
This reverts commit c2a3cf7.
@@ -121,8 +119,10 @@ urContextRetain(ur_context_handle_t hContext) { | |||
|
|||
UR_APIEXPORT ur_result_t UR_APICALL urContextGetNativeHandle( | |||
ur_context_handle_t hContext, ur_native_handle_t *phNativeContext) { | |||
// FIXME this only returns the native context of the first device in the | |||
// SYCL context. This entry point should be deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking the PR doing it:
/// aforementioned custom `hipCtx_t`s. | ||
/// The HIP documentation, confirmed with performance analysis, suggest using | ||
/// the Primary context whenever possible. The Primary context is also used by | ||
/// the HIP Runtime API. For UR applications to interop with HIP Runtime API, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually true for HIP as well or just copied over from CUDA? Is there a HIP runtime API and a HIP driver API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I think this is not relevant to HIP. Will remove
@@ -68,7 +68,7 @@ UR_APIEXPORT ur_result_t UR_APICALL USMFreeImpl(ur_context_handle_t hContext, | |||
void *pMem) { | |||
ur_result_t Result = UR_RESULT_SUCCESS; | |||
try { | |||
ScopedContext Active(hContext->getDevice()); | |||
// Free does not require a scopedDevice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Free does not require a scopedDevice | |
// Free does not require a ScopedDevice |
@@ -151,7 +152,6 @@ urUSMGetMemAllocInfo(ur_context_handle_t hContext, const void *pMem, | |||
UrReturnHelper ReturnValue(propValueSize, pPropValue, pPropValueSizeRet); | |||
|
|||
try { | |||
ScopedContext Active(hContext->getDevice()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this needed for the hipPointerGetAttributes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No apparently not
// First allocation will be made at urMemBufferCreate if context only | ||
// has one device | ||
if (PerformInitialCopy && hContext->NumDevices == 1) { | ||
// Operates on the default stream of the current CUDA context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Operates on the default stream of the current CUDA context. | |
// Operates on the default stream of the current HIP context. |
// to make buffer data available on device before any other UR | ||
// call uses it. | ||
hipStream_t defaultStream = 0; | ||
UR_CHECK_ERROR(hipStreamSynchronize(defaultStream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't hipMemcpyHtoD
synchronous? Also the comment says cuMemCpyHtoD
if (URMemObj != nullptr) { | ||
// First allocation will be made at urMemBufferCreate if context only | ||
// has one device | ||
if (PerformInitialCopy && hContext->NumDevices == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to do the initial copy even if there's more devices, if COPY_HOST_PTR
is used
} | ||
|
||
hipArray *SrcArray = hImageSrc->Mem.SurfaceMem.getArray(); | ||
hipArray *SrcArray = ImageSrc->getArray(hQueue->getDevice()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner to collapse allocateMemObjOnDeviceIfNeeded
into the getters? You could have them do the allocation if needed when we try to get a device pointer.
We could even potentially have getters do the migration too, but that might not work in all cases like kernel arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea I think. Let me try to figure that out a bit
// Use the default stream if copying from another device | ||
hipStream_t HIPStream = DeviceToCopyFrom == hQueue->getDevice() | ||
? hQueue->getNextTransferStream() | ||
: hipStream_t{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work for hipMemcpyDToD
then? does the stream have to be for the origin device and not for the destination device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EnqueueMemBufferRead is only to read a buffer to the host, so a DtoD is never needed, only a DtoH from the most up to date device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So DtoH requires a stream from that particular device?
Allow multi device context in HIP plugin. This makes a default context contain all of the devices in a given platform.