gpu: nvidia: amd: Get native context through device #1765

hdelan · 2023-12-05T17:21:12Z

Description

Since UR now supports multi device context in HIP adapter oneapi-src/unified-runtime#999 (and the same work is soon to follow in the CUDA adapter) it is no longer sensible to use the get_native method for contexts.

The mapping of native contexts to a sycl::context is now many to one, so when get_native_context is called the plugin no longer knows which native context to return. The multi device context UR PR has made the get native context return the native context of the first device in the context. See https://github.com/oneapi-src/unified-runtime/pull/999/files#diff-259fb15eb14976a3bc1939b9bb8197f51d129a111309343bb84677a655758b54R125 . But this will break for multi GPU systems.

This change instead uses the sycl::device to get the native context, since there is a one to one mapping of sycl::device to native contexts.

This change means that old versions of oneDNN will be compatible with newer plugins only if a one GPU context is being used. In order for a multi GPU system to work with the new plugin, a oneDNN version with this patch included will be necessary.

Test results to follow

Checklist

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?

No. The following tests fail for the master branch as well as for my PR branch:

On Nvidia A100:

$ ninja test
...
The following tests FAILED:
4 - gpu-cnn-training-bf16-cpp (Failed)
11 - gpu-primitives-batch-normalization-cpp (Failed)
28 - test_batch_normalization_gpu (Failed)
29 - test_batch_normalization_buffer_gpu (Failed)
30 - test_binary_gpu (SEGFAULT)
31 - test_binary_buffer_gpu (SEGFAULT)
40 - test_convolution_eltwise_forward_f32_gpu (Failed)
41 - test_convolution_eltwise_forward_f32_buffer_gpu (Failed)
59 - test_iface_attr_quantization_gpu (Failed)
60 - test_iface_attr_quantization_buffer_gpu (Failed)

HIP tests all passing for ninja test on gfx1031.

src/gpu/nvidia/sycl_cuda_compat.cpp

src/sycl/sycl_compat.hpp

vpirogov · 2024-01-09T22:54:32Z

@hdelan, could you please address questions and feedback on this PR?

hdelan · 2024-01-10T11:41:49Z

@vpirogov thanks for ping. Changes made

vpirogov · 2024-01-12T20:28:10Z

Thanks, @hdelan! @densamoilov is currently out, we'll have this promoted after the long weekend in US.

src/gpu/nvidia/sycl_cuda_scoped_context.cpp

A lot of code removed that was necessary for primary contexts All native contexts in DPC++ are primary contexts for CUDA and HIP, so we don't need a lot of the checking in oneDNN any more.

src/gpu/nvidia/sycl_cuda_engine.hpp

densamoilov · 2024-01-23T15:54:01Z

@hdelan, please let me know once the PR is ready for review.

hdelan · 2024-01-23T18:12:06Z

@densamoilov I am running tests but am seeing a lot of failures on AMD one the master branch of oneDNN 443600a30fc3427d6ad1522bceefb52100180a17 with DPC++ build e0f74157c87a2a6eb3438936eaad861e501cc418. Tests are failing with PI_ERROR_INVALID_MEM_OBJECT from piextUSMFree as well as some other things.

As a result it is hard to test the correctness of these changes. Can you recommend a particular setup where tests pass on the master branch of oneDNN, ie a certain DPC++ or release version? So that I can test just the contents of this PR

hdelan · 2024-01-24T11:54:30Z

@densamoilov, using release version 2024.0.2 I have the same failures with this patch as the failures on the master branch.

AMD MI210 fail list (same for master branch as well as my patch):

The following tests FAILED:
	 17 - test_concat_gpu (Failed)
	 18 - test_concat_buffer_gpu (Failed)
	 35 - test_cross_engine_reorder_buffer (Failed)
	 38 - test_eltwise_gpu (Failed)
	 39 - test_eltwise_buffer_gpu (Failed)
	 44 - test_iface_attr_quantization_gpu (Failed)
	 45 - test_iface_attr_quantization_buffer_gpu (Failed)
	 50 - test_iface_pd_gpu (Failed)
	 51 - test_iface_pd_buffer_gpu (Failed)
	 52 - test_iface_pd_iter_gpu (Failed)
	 53 - test_iface_pd_iter_buffer_gpu (Failed)
	 56 - test_iface_runtime_dims_gpu (Failed)
	 57 - test_iface_runtime_dims_buffer_gpu (Failed)
	 62 - test_inner_product_backward_data_gpu (Failed)
	 63 - test_inner_product_backward_data_buffer_gpu (Failed)
	 64 - test_inner_product_backward_weights_gpu (Failed)
	 65 - test_inner_product_backward_weights_buffer_gpu (Failed)
	 66 - test_inner_product_forward_gpu (Failed)
	 67 - test_inner_product_forward_buffer_gpu (Failed)
	 68 - test_layer_normalization_gpu (Failed)
	 69 - test_layer_normalization_buffer_gpu (Failed)
	 72 - test_matmul_gpu (Failed)
	 73 - test_matmul_buffer_gpu (Failed)
	 80 - test_prelu_gpu (Failed)
	 81 - test_prelu_buffer_gpu (Failed)
	 82 - test_primitive_cache_mt_gpu (Failed)
	 83 - test_primitive_cache_mt_buffer_gpu (Failed)
	 86 - test_reorder_gpu (Failed)
	 87 - test_reorder_buffer_gpu (Failed)
	 88 - test_resampling_gpu (Failed)
	 89 - test_resampling_buffer_gpu (Failed)
	 92 - test_shuffle_gpu (Failed)
	 93 - test_shuffle_buffer_gpu (Failed)
	 96 - test_sum_gpu (Failed)
	 97 - test_sum_buffer_gpu (Failed)

NVIDIA A100 fail list (same for master as well as my patch):

	  4 - gpu-cnn-training-bf16-cpp (Failed)
	 11 - gpu-primitives-batch-normalization-cpp (Failed)
	 28 - test_batch_normalization_gpu (Failed)
	 29 - test_batch_normalization_buffer_gpu (Failed)
	 30 - test_binary_gpu (SEGFAULT)
	 31 - test_binary_buffer_gpu (SEGFAULT)
	 40 - test_convolution_eltwise_forward_f32_gpu (Failed)
	 41 - test_convolution_eltwise_forward_f32_buffer_gpu (Failed)
	 59 - test_iface_attr_quantization_gpu (Failed)
	 60 - test_iface_attr_quantization_buffer_gpu (Failed)

So I suppose this PR is ready for review

Get native context through device

ceee123

mgouicem requested a review from densamoilov December 6, 2023 08:21

hdelan changed the title ~~[draft] Get native context through device~~ [HIP][CUDA] Get native context through device Dec 6, 2023

hdelan changed the title ~~[HIP][CUDA] Get native context through device~~ gpu: nvidia: amd: Get native context through device Dec 6, 2023

This was referenced Dec 6, 2023

gpu: sycl: remove deprecated sycl::get_native() for obtaining cuda context #1759

Closed

[CUDA][HIP] Use device to get native context uxlfoundation/oneMath#425

Merged

densamoilov reviewed Dec 15, 2023

View reviewed changes

src/gpu/nvidia/sycl_cuda_compat.cpp Show resolved Hide resolved

src/gpu/nvidia/sycl_cuda_compat.cpp Show resolved Hide resolved

src/gpu/nvidia/sycl_cuda_compat.cpp Show resolved Hide resolved

src/sycl/sycl_compat.hpp Outdated Show resolved Hide resolved

Respond to comments

0c643e9

hdelan force-pushed the deprecated-get-native-context branch from 58a625d to 0c643e9 Compare January 10, 2024 11:52

Reinstate cuCtxSetCurrent

aaced73

hdelan force-pushed the deprecated-get-native-context branch from a0fa401 to aaced73 Compare January 12, 2024 11:30

vpirogov added this to the v3.4 milestone Jan 12, 2024

densamoilov reviewed Jan 17, 2024

View reviewed changes

src/gpu/nvidia/sycl_cuda_scoped_context.cpp Show resolved Hide resolved

densamoilov reviewed Jan 18, 2024

View reviewed changes

src/gpu/nvidia/sycl_cuda_scoped_context.cpp Outdated Show resolved Hide resolved

Fix build and remove unnecessary code

9eb8191

A lot of code removed that was necessary for primary contexts All native contexts in DPC++ are primary contexts for CUDA and HIP, so we don't need a lot of the checking in oneDNN any more.

hdelan commented Jan 18, 2024

View reviewed changes

src/gpu/nvidia/sycl_cuda_engine.hpp Show resolved Hide resolved

Fix AMD build

546fd18

hdelan force-pushed the deprecated-get-native-context branch from c0969cd to 460d15f Compare January 22, 2024 17:36

Release primary context when finished with it

824ea18

hdelan force-pushed the deprecated-get-native-context branch from 460d15f to 824ea18 Compare January 22, 2024 18:22

densamoilov added 4 commits January 26, 2024 08:54

Fix copyrights in sycl_hip_compat.cpp

a9c2f52

Refactor sycl_hip_stream.cpp

098896d

Release primary context

7eb680c

Remove device from stream

c8bd426

densamoilov added 4 commits January 26, 2024 09:47

Fix copyright sycl_cuda_compat.cpp

4ea4a9d

Update sycl_cuda_engine.hpp

b6478e0

Update sycl_cuda_stream.cpp

db20270

Update sycl_cuda_stream.hpp

9893019

densamoilov approved these changes Jan 26, 2024

View reviewed changes

densamoilov merged commit ba51695 into oneapi-src:main Jan 26, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu: nvidia: amd: Get native context through device #1765

gpu: nvidia: amd: Get native context through device #1765

hdelan commented Dec 5, 2023 •

edited

Loading

vpirogov commented Jan 9, 2024

hdelan commented Jan 10, 2024

vpirogov commented Jan 12, 2024

densamoilov commented Jan 23, 2024 •

edited

Loading

hdelan commented Jan 23, 2024 •

edited

Loading

hdelan commented Jan 24, 2024

gpu: nvidia: amd: Get native context through device #1765

gpu: nvidia: amd: Get native context through device #1765

Conversation

hdelan commented Dec 5, 2023 • edited Loading

Description

Checklist

General

vpirogov commented Jan 9, 2024

hdelan commented Jan 10, 2024

vpirogov commented Jan 12, 2024

densamoilov commented Jan 23, 2024 • edited Loading

hdelan commented Jan 23, 2024 • edited Loading

hdelan commented Jan 24, 2024

hdelan commented Dec 5, 2023 •

edited

Loading

densamoilov commented Jan 23, 2024 •

edited

Loading

hdelan commented Jan 23, 2024 •

edited

Loading