[SYCL][CUDA][PI][runtime][ABI-break] Add support for multi-device context #6446

t4c1 · 2022-07-15T12:30:58Z

Introduces support for having multiple CUDA devices in one context.

To facilitate moving buffer and image memory between devices within the same context, some ABI-breaking changes had to be made to the runtime and PI interface.

This includes expanding the check for whether memory needs to be moved from just checking whether the context is the same to also checking whether device is the same. So this creates a performance regression for multiple devices that share memory within the same context. These will now also make copies of memory allocations for each device. This will be resolved in a future pull request, when we introduce a memory migration PI API for direct transfers between devices without going trough the host.

Tests in intel/llvm-test-suite#1102.

… platform (intel#4571)"

…n runtime

smaslov-intel · 2022-07-15T13:15:29Z

sycl/include/sycl/detail/pi.h

-    pi_context context, pi_mem_flags flags, size_t size, void *host_ptr,
-    pi_mem *ret_mem, const pi_mem_properties *properties = nullptr);
+__SYCL_EXPORT pi_result
+piMemBufferCreate(pi_context context, pi_device device, pi_mem_flags flags,


Why did you need to add "device"? How is it used and what if it is not given?
We didn't need that for buffer migration in L0 BE.

This is the device, the memory will be allocated on. Even if we are putting multiple CUDA devices in a single context, they still have distinct device memories, so for the allocation we need to know what device to allocate on. It needs to be given and the runtime is changed in this PR so that it is always given. On backends that do not need this information device argument can be ignored.

The buffer would be represented by multiple allocations, one on each device in the context. It would then "migrate" (copy between allocations) depending on where the buffer is being accessed from. So what is that single "device" that you are adding and why it was not necessary for OpenCL and Level-Zero but is needed for CUDA?

For CUDA backend we want to be able to put multiple devices in the same context even when they do not share the same memory. So memory allocation can not be located in a context (that could contain different devices with distinct memories), but must be located on a specific device.

As far as I understand it other backends so far only supported putting multiple devices in the same context when they share memory.

No, both OpenCL and Level-Zero backends support context with multiple devices in it even if they have their distinct memories. That's why I said there multiple allocations representing such a buffer in the context, and it is being copied essentially from device where it was used the last to where it is needed next.

I agree that it is a possible way to go, but I am not sure it justifies the "complexity" of breaking backwards compatibility (btw, I am still not sure why you want to have some "initial" device for a buffer allocation since that can immediately change as app starts using it from other devices). The buffer "migration" across devices in the context implemented in OpenCL and Level-Zero is not a big deal, in my view, so how do we justify a re-course?

Adding few folks for extra opinions: @steffenlarsen , @romanovvlad

(btw, I am still not sure why you want to have some "initial" device for a buffer allocation since that can immediately change as app starts using it from other devices)

There is no preemptive allocation on some device with this approach. You only have a device allocation when you are first using a buffer on that device. It can only migrate to another device after that allocation is first used on the device it was allocated on.

There is no preemptive allocation on some device with this approach. You only have a device allocation when you are first using a buffer on that device. It can only migrate to another device after that allocation is first used on the device it was allocated on.

So this is the device that first used that buffer, and nothing more? Eventually it will be migrated (by SYCL RT in your proposal) to whatever other device that needs it in this context, right?

Exactly. Except if the backend returns PLUGIN_MANAGED_OR_SAME from piGetMemoryConnection. In this case RT will do nothing and assume the backend can handle any migrations it needs. That can be used for any pair of devices in the same context for the OpenCL backend.

I added the query.

kbobrovs

sycl/plugins/esimd_emulator part OK

smaslov-intel · 2022-12-19T23:59:39Z

what's the status pf this please?

npmiller · 2023-01-02T13:40:40Z

So this is in a fairly good state (I do need to resolve conflicts), and it works well on CUDA platforms but the CI does flag some issues on other platforms that still need to be investigated.

But even once this is resolved I'm not sure we're ready to merge, because it does changes the PI API, in a way that works well in the SYCL runtime but that may be problematic for other uses, so I've brought up the changes here to the Unified Runtime:

Allocation functions, memory transfers and context oneapi-src/unified-runtime#53

So it might make sense to hold off on merging this until we discuss it further for the Unified Runtime.

npmiller · 2023-01-27T21:22:50Z

/verify with intel/llvm-test-suite#1102

veselypeta · 2023-04-10T12:44:50Z

Should these changes be mirrored in unified-runtime @npmiller ?

github-actions · 2024-09-16T02:00:59Z

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

github-actions · 2024-10-16T02:01:13Z

This pull request was closed because it has been stalled for 30 days with no activity.

t4c1 and others added 16 commits May 18, 2022 13:27

cuda plugin changes

82b8bb9

modification to PI interface and runtime functions

02ceed2

Merge branch 'sycl' into multi_device_context

58591b2

revert CUDA part of "[SYCL][CUDA][HIP] Report every device in its own…

6da5ea9

… platform (intel#4571)"

handle multiple contexts in program and kernel and memory transfers i…

3c44f17

…n runtime

fix memory transfers between devices

70c049c

Merge branch 'sycl' into multi_device_context

7215c67

enable multiple devioces in one context

81ef2cf

format

47a8200

Merge branch 'sycl' into multi_device_context

522a246

fix tests

321829a

Fix typo

0bc7d62

addressed comments

e014f8b

removed commenteed code

301bd66

addressed review comments

5bdaef7

Merge branch 'sycl' into multi_device_context

dbefec7

t4c1 requested review from a team as code owners July 15, 2022 12:30

t4c1 requested a review from sergey-semenov July 15, 2022 12:30

t4c1 mentioned this pull request Jul 15, 2022

[SYCL][CUDA] Add test for a multi-device context intel/llvm-test-suite#1102

Open

t4c1 added 3 commits July 15, 2022 13:34

format

ec70ee0

manual format

1d6198e

fix pi_esimd_emulator

a3761c0

smaslov-intel reviewed Jul 15, 2022

View reviewed changes

actually fix pi_esimd_emulator

ed57c83

kbobrovs previously approved these changes Jul 18, 2022

View reviewed changes

fix for subdevice related tests

791c8b5

t4c1 dismissed kbobrovs’s stale review via 791c8b5 July 19, 2022 14:44

npmiller added 2 commits January 24, 2023 15:27

Merge remote-tracking branch 'origin/sycl' into multi_device_context

11934a3

Fix merge conflicts

a1c0111

npmiller temporarily deployed to aws January 27, 2023 17:19 — with GitHub Actions Inactive

npmiller added 2 commits January 27, 2023 17:35

Merge remote-tracking branch 'origin/sycl' into multi_device_context

76d01d5

[SYCL][CUDA] Fix 2D USM active context

8f4c0ce

npmiller temporarily deployed to aws January 27, 2023 18:03 — with GitHub Actions Inactive

npmiller temporarily deployed to aws January 27, 2023 18:36 — with GitHub Actions Inactive

npmiller added 9 commits February 10, 2023 15:13

[SYCL][CUDA] Clarify CUDA context usages

e06c597

[SYCL][CUDA] Keep track of devices in PI allocations

25dd438

Merge remote-tracking branch 'origin/sycl' into multi_device_context

9179ee9

[SYCL][CUDA] Fix merge issues

7e61df9

Merge remote-tracking branch 'upstream/sycl' into multi_device_context

fefc782

Fix PI cuda tests for multi-device context

d69134c

Merge remote-tracking branch 'upstream/sycl' into multi_device_context

5d4c941

[SYCL] Update tests for multi-device context

c5d6a0e

[SYCL] Fix linting issues

deba394

npmiller temporarily deployed to aws April 5, 2023 18:22 — with GitHub Actions Inactive

npmiller temporarily deployed to aws April 5, 2023 19:30 — with GitHub Actions Inactive

veselypeta mentioned this pull request Apr 11, 2023

[UR] Add PI changes to support multi-device contexts. oneapi-src/unified-runtime#437

Closed

callumfare mentioned this pull request Jun 9, 2023

[SYCL][CUDA] Port CUDA plugin to Unified Runtime #9512

Merged

JackAKirk mentioned this pull request Jun 12, 2023

[Exp][Bindless Images] Experimental Bindless Images oneapi-src/unified-runtime#594

Merged

bader mentioned this pull request Jul 25, 2023

[L0][UR] Multi device context buffer memory management does not work #10555

Open

github-actions bot added the Stale label Sep 16, 2024

github-actions bot closed this Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA][PI][runtime][ABI-break] Add support for multi-device context #6446

[SYCL][CUDA][PI][runtime][ABI-break] Add support for multi-device context #6446

t4c1 commented Jul 15, 2022 •

edited

Loading

smaslov-intel Jul 15, 2022

t4c1 Jul 18, 2022

smaslov-intel Jul 18, 2022

t4c1 Jul 18, 2022

smaslov-intel Jul 18, 2022

smaslov-intel Jul 19, 2022

t4c1 Jul 19, 2022

smaslov-intel Jul 19, 2022

t4c1 Jul 19, 2022

t4c1 Aug 18, 2022

kbobrovs left a comment

smaslov-intel commented Dec 19, 2022

npmiller commented Jan 2, 2023

npmiller commented Jan 27, 2023

veselypeta commented Apr 10, 2023

github-actions bot commented Sep 16, 2024

github-actions bot commented Oct 16, 2024

[SYCL][CUDA][PI][runtime][ABI-break] Add support for multi-device context #6446

[SYCL][CUDA][PI][runtime][ABI-break] Add support for multi-device context #6446

Conversation

t4c1 commented Jul 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kbobrovs left a comment

Choose a reason for hiding this comment

smaslov-intel commented Dec 19, 2022

npmiller commented Jan 2, 2023

npmiller commented Jan 27, 2023

veselypeta commented Apr 10, 2023

github-actions bot commented Sep 16, 2024

github-actions bot commented Oct 16, 2024

t4c1 commented Jul 15, 2022 •

edited

Loading