[SYCL][CUDA] Decouple CUDA contexts from PI contexts #8197

npmiller · 2023-02-03T14:19:31Z

This patch moves the CUDA context from the PI context to the PI device, and switches to always using the primary context.

CUDA contexts are different from SYCL contexts in that they're tied to a single device, and that they are required to be active on a thread for most calls to the CUDA driver API.

As shown in #8124 and #7526 the current mapping of CUDA context to PI context, causes issues for device based entry points that still need to call the CUDA APIs, we have workarounds to solve that but they're a bit hacky, inefficient, and have a lot of edge case issues.

The peer to peer interface proposal in #6104, is also device based, but enabling peer to peer for CUDA is done on the CUDA contexts, so the current mapping would make it difficult to implement.

So this patch solves most of these issues by decoupling the CUDA context from the SYCL context, and simply managing the CUDA contexts in the devices, it also changes the CUDA context management to always use the primary context.

This approach as a number of advantages:

Use of the primary context is recommended by Nvidia
Simplifies the CUDA context management in the plugin
Available CUDA context in device based entry points
Likely more efficient in the general case, with less opportunities to accidentally cause costly CUDA context switches.
Easier and likely more efficient interactions with CUDA runtime applications.
Easier to expose P2P capabilities
Easier to support multiple devices in a SYCL context

It does have a few drawbacks from the previous approach:

Drops support for make_context interop, no sensible "native handle" to pass in (get_native is still supported fine).
No opportunity for users to separate their work into different CUDA contexts. It's unclear if there's any actual use case for this, it seems very uncommon in CUDA codebases to have multiple CUDA contexts for a single CUDA device in the same process.

So overall I believe this should be a net benefit in general, and we could revisit if we run into an edge case that would need more fine grained CUDA context management.

This patch moves the CUDA context from the PI context to the PI device, and switches to always using the primary context. CUDA contexts are different from SYCL contexts in that they're tied to a single device, and that they are required to be active on a thread for most calls to the CUDA driver API. As shown in intel#8124 and intel#7526 the current mapping of CUDA context to PI context, causes issues for device based entry points that still need to call the CUDA APIs, we have workarounds to solve that but they're a bit hacky, inefficient, and have a lot of edge case issues. The peer to peer interface proposal in intel#6104, is also device based, but enabling peer to peer for CUDA is done on the CUDA contexts, so the current mapping would make it difficult to implement. So this patch solves most of these issues by decoupling the CUDA context from the SYCL context, and simply managing the CUDA contexts in the devices, it also changes the CUDA context management to always use the primary context. This approach as a number of advantages: * Use of the primary context is recommended by Nvidia * Simplifies the CUDA context management in the plugin * Available CUDA context in device based entry points * Likely more efficient in the general case, with less opportunities to accidentally cause costly CUDA context switches. * Easier and likely more efficient interactions with CUDA runtime applications. * Easier to expose P2P capabilities * Easier to support multiple devices in a SYCL context It does have a few drawbacks from the previous approach: * Drops support for `make_context` interop, no sensible "native handle" to pass in (`get_native` is still supported fine). * No opportunity for users to separate their work into different CUDA contexts. It's unclear if there's any actual use case for this, it seems very uncommon in CUDA codebases to have multiple CUDA contexts for a single CUDA device in the same process. So overall I believe this should be a net benefit in general, and we could revisit if we run into an edge case that would need more fine grained CUDA context management.

Older versions of gcc struggle with attributes on namespaces

steffenlarsen

Admittedly I am a little worried about decoupling the contexts fully, though I agree with the benefits of using a common default context (as it is also the motivation for sycl_ext_oneapi_default_context). Are we confident there would be no benefit to keeping the relation and just change the default to use primary context and making an inverse of the deprecated property?

sycl/plugins/cuda/pi_cuda.cpp

npmiller · 2023-02-07T11:34:06Z

Are we confident there would be no benefit to keeping the relation and just change the default to use primary context and making an inverse of the deprecated property?

If we keep the existing relation, we would still have issues for device only entry points, and having a primary context on the device but also allowing regular CUDA contexts in the PI context would just add an extra layer of complexity and possible issues with mismatched contexts.

As far as I can tell the use case of having multiple CUDA contexts for one device within the same process, which is enabled by the current architecture seems to be a very rare use case, and for example people that need multiple CUDA contexts for the same device are usually more likely to be using say MPS and have multiple processes anyway.

It's also interesting to note that as I understand it, hipSYCL is using the CUDA runtime API for their CUDA support, which automatically manages contexts and uses primary contexts, so they already have this decoupling between SYCL contexts and CUDA contexts.

So overall I'm pretty confident that in most cases breaking this relationship between the SYCL and CUDA contexts is fine, but I can't say for 100% certain that it won't break some very specific workflows.

However this is also partly why this patch is a little aggressive, I think we're probably better off taking a chance on breaking some very specific workflow, even if it means we need to revisit later on, than carry over all this complexity for an hypothetical workflow we don't even know exists. And if it does break something hopefully we'll then have a better idea of what people are trying to do, and we can maybe come up with a solution that works better with how CUDA works, for example we could do like the CUDA runtime and allow the SYCL runtime to use contexts that are set on the current thread by users.

steffenlarsen

Patch seems to do what it intends to do. I am okay with giving it a go.

@bader | @romanovvlad - Anyone we should check with before going ahead with this?

bader · 2023-02-07T19:08:09Z

Patch seems to do what it intends to do. I am okay with giving it a go.

@bader | @romanovvlad - Anyone we should check with before going ahead with this?

@smaslov-intel?

smaslov-intel · 2023-02-07T19:47:46Z

Patch seems to do what it intends to do. I am okay with giving it a go.
@bader | @romanovvlad - Anyone we should check with before going ahead with this?

@smaslov-intel?

No objections given the explanation in the description (thanks for that, btw)

bader · 2023-02-07T19:54:37Z

Patch seems to do what it intends to do. I am okay with giving it a go.
@bader | @romanovvlad - Anyone we should check with before going ahead with this?

@smaslov-intel?

No objections given the explanation in the description (thanks for that, btw)

Thanks! We need formal approval from @intel/llvm-reviewers-cuda and @intel/llvm-reviewers-runtime teams to merge the PR.

This is no longer needed with #8197

Since intel/llvm#8197, SYCL CUDA backend uses CUDA primary context by default, so individual context setting is no longer required.

…27) * Ignoring VIM temporary files * Vector Addition example update Works with latest DPC++ and SYCL2020 features README updated to reflect CUDA backend has USM support * Removing unnecessary CUDA Driver types Since intel/llvm#8197, SYCL CUDA backend uses CUDA primary context by default, so individual context setting is no longer required. * Using moder queue construction SYCL 1.2.1 device selectors have been deprecated in favour of a new simplified form using lambdas. * Format files Run clang-format on files, separate commit to avoid noise * Explicitly setting CUDA context on host task Because of the changes on SYCL context, it is necessary now to set the active CUDA context manually inside the host task. Note there was some clang-formatting here as well * Addressing feedback from Gordon

The primary context has been default for a while in CUDA PI/Adapter. See #8197. This PR brings the HIP adapter up to speed. It also changes the scoped context to only take a `ur_device_handle_t` since this is coupled with a native primary context in HIP

The primary context has been default for a while in CUDA PI/Adapter. See intel#8197. This PR brings the HIP adapter up to speed. It also changes the scoped context to only take a `ur_device_handle_t` since this is coupled with a native primary context in HIP

The primary context has been default for a while in CUDA PI/Adapter. See intel/llvm#8197. This PR brings the HIP adapter up to speed. It also changes the scoped context to only take a `ur_device_handle_t` since this is coupled with a native primary context in HIP

The primary context has been default for a while in CUDA PI/Adapter. See intel#8197. This PR brings the HIP adapter up to speed. It also changes the scoped context to only take a `ur_device_handle_t` since this is coupled with a native primary context in HIP

The primary context has been default for a while in CUDA PI/Adapter. See intel/llvm#8197. This PR brings the HIP adapter up to speed. It also changes the scoped context to only take a `ur_device_handle_t` since this is coupled with a native primary context in HIP

npmiller requested review from a team as code owners February 3, 2023 14:19

npmiller requested a review from steffenlarsen February 3, 2023 14:19

npmiller temporarily deployed to aws February 3, 2023 14:24 — with GitHub Actions Inactive

npmiller force-pushed the primary-context branch from 7185416 to b1be579 Compare February 3, 2023 14:26

npmiller temporarily deployed to aws February 3, 2023 14:31 — with GitHub Actions Inactive

npmiller temporarily deployed to aws February 3, 2023 15:17 — with GitHub Actions Inactive

npmiller temporarily deployed to aws February 3, 2023 15:55 — with GitHub Actions Inactive

npmiller added 2 commits February 6, 2023 14:30

[SYCL][CUDA] Move deprecation warning to class

8685475

Older versions of gcc struggle with attributes on namespaces

npmiller force-pushed the primary-context branch from cf2bb17 to 8685475 Compare February 6, 2023 14:31

npmiller temporarily deployed to aws February 6, 2023 14:57 — with GitHub Actions Inactive

npmiller temporarily deployed to aws February 6, 2023 15:31 — with GitHub Actions Inactive

steffenlarsen reviewed Feb 6, 2023

View reviewed changes

sycl/plugins/cuda/pi_cuda.cpp Show resolved Hide resolved

steffenlarsen reviewed Feb 7, 2023

View reviewed changes

steffenlarsen approved these changes Feb 8, 2023

View reviewed changes

npmiller temporarily deployed to aws February 8, 2023 09:51 — with GitHub Actions Inactive

JackAKirk mentioned this pull request Feb 8, 2023

[SYCL][DOC] Initial commit of oneapi extension proposal for adding P2P #6104

Merged

npmiller temporarily deployed to aws February 9, 2023 03:22 — with GitHub Actions Inactive

bader merged commit a3542ed into intel:sycl Feb 9, 2023

npmiller added a commit that referenced this pull request Feb 9, 2023

[SYCL][CUDA] Remove obsolete ScopedContext workaround

25d8cb6

This is no longer needed with #8197

npmiller mentioned this pull request Feb 9, 2023

[SYCL][CUDA] Remove obsolete ScopedContext workaround #8265

Merged

bader pushed a commit that referenced this pull request Feb 9, 2023

[SYCL][CUDA] Remove obsolete ScopedContext workaround (#8265)

f82093c

This is no longer needed with #8197

Ruyk added a commit to codeplaysoftware/SYCL-For-CUDA-Examples that referenced this pull request Feb 13, 2023

Removing unnecessary CUDA Driver types

283e69d

Since intel/llvm#8197, SYCL CUDA backend uses CUDA primary context by default, so individual context setting is no longer required.

AerialMantis mentioned this pull request Feb 15, 2023

Updating SYCL For CUDA examples to CUDA 12 and latest intel llvm tip codeplaysoftware/SYCL-For-CUDA-Examples#27

Merged

npmiller mentioned this pull request Jun 16, 2023

[SYCL][CUDA] error: no matching function for call to 'make_context' #9704

Open

npmiller mentioned this pull request Jul 13, 2023

Investigate the need for urContextSetExtendedDeleter oneapi-src/unified-runtime#713

Open

hdelan mentioned this pull request Jul 21, 2023

[HIP][UR] Use primary context in HIP adapter #10514

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA] Decouple CUDA contexts from PI contexts #8197

[SYCL][CUDA] Decouple CUDA contexts from PI contexts #8197

npmiller commented Feb 3, 2023

steffenlarsen left a comment

npmiller commented Feb 7, 2023

steffenlarsen left a comment

bader commented Feb 7, 2023

smaslov-intel commented Feb 7, 2023

bader commented Feb 7, 2023

[SYCL][CUDA] Decouple CUDA contexts from PI contexts #8197

[SYCL][CUDA] Decouple CUDA contexts from PI contexts #8197

Conversation

npmiller commented Feb 3, 2023

steffenlarsen left a comment

Choose a reason for hiding this comment

npmiller commented Feb 7, 2023

steffenlarsen left a comment

Choose a reason for hiding this comment

bader commented Feb 7, 2023

smaslov-intel commented Feb 7, 2023

bader commented Feb 7, 2023