Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][UR][HIP] Hip adapter multi device ctx #11105

Closed
wants to merge 8 commits into from

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Sep 7, 2023

Allow multi device context in HIP plugin. This makes a default context contain all of the devices in a given platform.

@hdelan hdelan requested review from a team as code owners September 7, 2023 16:25
@hdelan
Copy link
Contributor Author

hdelan commented Sep 7, 2023

Same as #10737 but for HIP.

This move to a multi device context makes context interop tricky for reasons outlined here #10975 . I have suggested that we deprecate context interop for CUDA and HIP plugins.

Images are still not supported in AMDGPU, so I will most likely need to revisit some of this code relating to images once the tests are working.

@hdelan hdelan force-pushed the hip-adapter-multi-device-ctx branch 2 times, most recently from 9402e7a to 97c04eb Compare September 7, 2023 16:33
@@ -279,29 +270,16 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemGetInfo(ur_mem_handle_t hMemory,
/// \return UR_RESULT_SUCCESS
UR_APIEXPORT ur_result_t UR_APICALL
urMemGetNativeHandle(ur_mem_handle_t hMem, ur_native_handle_t *phNativeMem) {
Copy link
Contributor Author

@hdelan hdelan Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I have made this entry point UNSUPPORTED for the reasons listed in comment.

@hdelan hdelan force-pushed the hip-adapter-multi-device-ctx branch from 97c04eb to e32e26e Compare September 7, 2023 16:51
@hdelan hdelan requested a review from a team as a code owner September 7, 2023 16:51
@hdelan hdelan force-pushed the hip-adapter-multi-device-ctx branch 3 times, most recently from b93c0fa to d6ea231 Compare September 13, 2023 20:43
ldrumm and others added 5 commits September 14, 2023 15:30
d1c92cb changed ScopedContext to take a device instead of a context
thus sematically changing its meaning. This rename makes it clear what
the intented usages are.
@hdelan hdelan force-pushed the hip-adapter-multi-device-ctx branch from e707ee2 to 0de2831 Compare September 14, 2023 14:31
Copy link
Contributor

@jandres742 jandres742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L0 review requested here, but no L0 code.

+1 just for to allow merging.

@hdelan hdelan force-pushed the hip-adapter-multi-device-ctx branch from fb6a43f to c2a3cf7 Compare September 25, 2023 19:01
@@ -121,8 +119,10 @@ urContextRetain(ur_context_handle_t hContext) {

UR_APIEXPORT ur_result_t UR_APICALL urContextGetNativeHandle(
ur_context_handle_t hContext, ur_native_handle_t *phNativeContext) {
// FIXME this only returns the native context of the first device in the
// SYCL context. This entry point should be deprecated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// aforementioned custom `hipCtx_t`s.
/// The HIP documentation, confirmed with performance analysis, suggest using
/// the Primary context whenever possible. The Primary context is also used by
/// the HIP Runtime API. For UR applications to interop with HIP Runtime API,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually true for HIP as well or just copied over from CUDA? Is there a HIP runtime API and a HIP driver API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I think this is not relevant to HIP. Will remove

@@ -68,7 +68,7 @@ UR_APIEXPORT ur_result_t UR_APICALL USMFreeImpl(ur_context_handle_t hContext,
void *pMem) {
ur_result_t Result = UR_RESULT_SUCCESS;
try {
ScopedContext Active(hContext->getDevice());
// Free does not require a scopedDevice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Free does not require a scopedDevice
// Free does not require a ScopedDevice

@@ -151,7 +152,6 @@ urUSMGetMemAllocInfo(ur_context_handle_t hContext, const void *pMem,
UrReturnHelper ReturnValue(propValueSize, pPropValue, pPropValueSizeRet);

try {
ScopedContext Active(hContext->getDevice());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this needed for the hipPointerGetAttributes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No apparently not

// First allocation will be made at urMemBufferCreate if context only
// has one device
if (PerformInitialCopy && hContext->NumDevices == 1) {
// Operates on the default stream of the current CUDA context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Operates on the default stream of the current CUDA context.
// Operates on the default stream of the current HIP context.

// to make buffer data available on device before any other UR
// call uses it.
hipStream_t defaultStream = 0;
UR_CHECK_ERROR(hipStreamSynchronize(defaultStream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't hipMemcpyHtoD synchronous? Also the comment says cuMemCpyHtoD

if (URMemObj != nullptr) {
// First allocation will be made at urMemBufferCreate if context only
// has one device
if (PerformInitialCopy && hContext->NumDevices == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to do the initial copy even if there's more devices, if COPY_HOST_PTR is used

}

hipArray *SrcArray = hImageSrc->Mem.SurfaceMem.getArray();
hipArray *SrcArray = ImageSrc->getArray(hQueue->getDevice());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to collapse allocateMemObjOnDeviceIfNeeded into the getters? You could have them do the allocation if needed when we try to get a device pointer.

We could even potentially have getters do the migration too, but that might not work in all cases like kernel arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea I think. Let me try to figure that out a bit

// Use the default stream if copying from another device
hipStream_t HIPStream = DeviceToCopyFrom == hQueue->getDevice()
? hQueue->getNextTransferStream()
: hipStream_t{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work for hipMemcpyDToD then? does the stream have to be for the origin device and not for the destination device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnqueueMemBufferRead is only to read a buffer to the host, so a DtoD is never needed, only a DtoH from the most up to date device

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So DtoH requires a stream from that particular device?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants