Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic OpenCL kernels are broken #1960

Open
nwnk opened this issue Jun 13, 2024 · 4 comments
Open

Generic OpenCL kernels are broken #1960

nwnk opened this issue Jun 13, 2024 · 4 comments
Assignees
Labels
enhancement A feature or an optimization request help wanted platform:intel-gpu

Comments

@nwnk
Copy link

nwnk commented Jun 13, 2024

The build documentation claims that generic OpenCL kernels are always available. I wanted to verify that they worked, and the straightforward way to do that seemed to be this:

commit 221691fab2a936267c0cf352e9b9b64ebf813973 (HEAD)
Author: Adam Jackson <[email protected]>
Date:   Mon Jun 10 22:03:30 2024 -0400

    cmake: Allow building no gen-specific OpenCL kernels

diff --git a/cmake/configuring_primitive_list.cmake b/cmake/configuring_primitive_list.cmake
index 3524f17107..75333fd1e6 100644
--- a/cmake/configuring_primitive_list.cmake
+++ b/cmake/configuring_primitive_list.cmake
@@ -55,6 +55,8 @@ message(STATUS "Enabled primitive CPU ISA: ${DNNL_ENABLE_PRIMITIVE_CPU_ISA}")
 
 if (DNNL_ENABLE_PRIMITIVE_GPU_ISA STREQUAL "ALL")
     set(BUILD_PRIMITIVE_GPU_ISA_ALL TRUE)
+elseif (DNNL_ENABLE_PRIMITIVE_GPU_ISA STREQUAL "NONE")
+    #
 else()
     foreach(isa ${DNNL_ENABLE_PRIMITIVE_GPU_ISA})
         string(TOUPPER ${isa} uisa)

And that builds! And it works more than it doesn't! With the Intel oneAPI 2024.1 DPC++ compiler, I built 3c0e1f1635c81ae9074f2deeff9977a2a8ef149d with the above patch, SYCL CPU and GPU backends. (I am not using the OpenCL driver from the oneAPI release. I am using Fedora 40's build of the Intel Compute Runtime, intel-compute-runtime-24.09.28717.17-1.fc40.x86_64. I don't expect that matters much here, but I can try with a different version if it helps.)

With the normal build, ctest says:

99% tests passed, 6 tests failed out of 453
        
Total Test time (real) = 6392.06 sec
        
The following tests FAILED:
        406 - test_benchdnn_modeC_concat_ci_gpu (Failed)
        408 - test_benchdnn_modeC_conv_gpu_ci_gpu (Failed)
        410 - test_benchdnn_modeC_deconv_ci_gpu (Failed)
        416 - test_benchdnn_modeC_graph_ci_gpu (Failed)
        432 - test_benchdnn_modeC_reorder_ci_gpu (Failed)
        450 - test_benchdnn_modeC_sum_ci_gpu (Failed)

Then, I rebuilt with DNNL_ENABLE_PRIMITIVE_GPU_ISA set to NONE, and ctest said:

78% tests passed, 99 tests failed out of 453

Total Test time (real) = 4957.57 sec

The following tests FAILED:
	  4 - gpu-cnn-inference-f32-cpp (Failed)
	  6 - gpu-cnn-inference-int8-cpp (Failed)
	  8 - gpu-cnn-training-bf16-cpp (Failed)
	 10 - gpu-cnn-training-f32-cpp (Failed)
	 15 - gpu-graph-sycl-getting-started-cpp (Failed)
	 16 - cpu-graph-sycl-single-op-partition-cpp (Failed)
	 17 - gpu-graph-sycl-single-op-partition-cpp (Failed)
	 19 - gpu-matmul-perf-cpp (Failed)
	 21 - gpu-memory-format-propagation-cpp (Failed)
	 23 - gpu-performance-profiling-cpp (Failed)
	 33 - gpu-primitives-convolution-cpp (Failed)
	 39 - gpu-primitives-inner-product-cpp (Failed)
	 43 - gpu-primitives-lbr-gru-cpp (Failed)
	 47 - gpu-primitives-lstm-cpp (SEGFAULT)
	 49 - gpu-primitives-matmul-cpp (Failed)
	 61 - gpu-primitives-shuffle-cpp (Failed)
	 65 - gpu-primitives-sum-cpp (Failed)
	 67 - gpu-primitives-vanilla-rnn-cpp (Failed)
	 69 - gpu-rnn-training-f32-cpp (Failed)
	 75 - gpu-tutorials-matmul-inference-int8-matmul-cpp (Failed)
	 84 - test_binary_gpu (Failed)
	 86 - test_binary_buffer_gpu (Failed)
	 88 - test_concat_gpu (Failed)
	 90 - test_concat_buffer_gpu (Failed)
	 92 - test_concurrency_gpu (Failed)
	 94 - test_concurrency_buffer_gpu (Failed)
	 96 - test_convolution_backward_data_f32_gpu (Failed)
	 98 - test_convolution_backward_data_f32_buffer_gpu (Failed)
	100 - test_convolution_backward_weights_f32_gpu (Failed)
	102 - test_convolution_backward_weights_f32_buffer_gpu (Failed)
	104 - test_convolution_eltwise_forward_f32_gpu (Failed)
	106 - test_convolution_eltwise_forward_f32_buffer_gpu (Failed)
	108 - test_convolution_eltwise_forward_x8s8f32s32_gpu (Failed)
	110 - test_convolution_eltwise_forward_x8s8f32s32_buffer_gpu (Failed)
	112 - test_convolution_forward_f32_gpu (Failed)
	114 - test_convolution_forward_f32_buffer_gpu (Failed)
	123 - test_cross_engine_reorder_buffer (Failed)
	125 - test_deconvolution_gpu (Failed)
	127 - test_deconvolution_buffer_gpu (Failed)
	177 - test_inner_product_backward_data_gpu (Failed)
	179 - test_inner_product_backward_data_buffer_gpu (Failed)
	181 - test_inner_product_backward_weights_gpu (Failed)
	183 - test_inner_product_backward_weights_buffer_gpu (Failed)
	185 - test_inner_product_forward_gpu (Failed)
	187 - test_inner_product_forward_buffer_gpu (Failed)
	197 - test_matmul_gpu (Failed)
	199 - test_matmul_buffer_gpu (Failed)
	201 - test_persistent_cache_api_gpu (Failed)
	203 - test_persistent_cache_api_buffer_gpu (Failed)
	209 - test_pooling_forward_gpu (Failed)
	211 - test_pooling_forward_buffer_gpu (Failed)
	217 - test_primitive_cache_mt_gpu (Failed)
	219 - test_primitive_cache_mt_buffer_gpu (Failed)
	225 - test_reorder_gpu (Failed)
	227 - test_reorder_buffer_gpu (Failed)
	237 - test_shuffle_gpu (Failed)
	239 - test_shuffle_buffer_gpu (Failed)
	245 - test_sum_gpu (Failed)
	247 - test_sum_buffer_gpu (Failed)
	298 - test_api (Failed)
	299 - test_api_buffer (Failed)
	304 - test_api_sycl (Failed)
	317 - test_graph_c_api_compile_usm_gpu (Failed)
	319 - test_graph_c_api_compile_parametrized_usm_gpu (Failed)
	321 - test_graph_cpp_api_compile_usm_gpu (Failed)
	323 - test_graph_cpp_api_partition_usm_gpu (Failed)
	325 - test_graph_cpp_api_compiled_partition_sycl_usm_gpu (Failed)
	353 - test_graph_unit_dnnl_batch_norm_usm_gpu (Failed)
	355 - test_graph_unit_dnnl_binary_op_usm_gpu (Failed)
	357 - test_graph_unit_dnnl_bmm_usm_gpu (Failed)
	359 - test_graph_unit_dnnl_compiled_partition_usm_gpu (Failed)
	361 - test_graph_unit_dnnl_concat_usm_gpu (Failed)
	363 - test_graph_unit_dnnl_conv_usm_gpu (Failed)
	365 - test_graph_unit_dnnl_convtranspose_usm_gpu (Failed)
	367 - test_graph_unit_dnnl_dequantize_usm_gpu (Failed)
	369 - test_graph_unit_dnnl_eltwise_usm_gpu (Failed)
	373 - test_graph_unit_dnnl_large_partition_usm_gpu (Failed)
	377 - test_graph_unit_dnnl_matmul_usm_gpu (Failed)
	381 - test_graph_unit_dnnl_pool_usm_gpu (Failed)
	385 - test_graph_unit_dnnl_quantize_usm_gpu (Failed)
	387 - test_graph_unit_dnnl_reduce_usm_gpu (Failed)
	389 - test_graph_unit_dnnl_reorder_usm_gpu (Failed)
	393 - test_graph_unit_dnnl_softmax_usm_gpu (Failed)
	406 - test_benchdnn_modeC_concat_ci_gpu (Failed)
	408 - test_benchdnn_modeC_conv_gpu_ci_gpu (Failed)
	410 - test_benchdnn_modeC_deconv_ci_gpu (Failed)
	412 - test_benchdnn_modeC_eltwise_ci_gpu (Failed)
	416 - test_benchdnn_modeC_graph_ci_gpu (Subprocess aborted)
	418 - test_benchdnn_modeC_ip_ci_gpu (Failed)
	424 - test_benchdnn_modeC_matmul_ci_gpu (Failed)
	426 - test_benchdnn_modeC_pool_ci_gpu (Failed)
	432 - test_benchdnn_modeC_reorder_ci_gpu (Failed)
	437 - test_benchdnn_modeC_gru_ci_gpu (SEGFAULT)
	438 - test_benchdnn_modeC_lstm_ci_gpu (SEGFAULT)
	439 - test_benchdnn_modeC_rnn_ci_gpu (SEGFAULT)
	444 - test_benchdnn_modeC_self_ci_gpu (Failed)
	446 - test_benchdnn_modeC_shuffle_ci_gpu (Failed)
	448 - test_benchdnn_modeC_softmax_ci_gpu (Failed)
	450 - test_benchdnn_modeC_sum_ci_gpu (Failed)

So 93 new failures. 107 GPU tests did pass, though, so it seems like this should work. This is on a gen9 GPU, specifically:

% lspci -vnn -s 0:2
00:02.0 Display controller [0380]: Intel Corporation CometLake-S GT2 [UHD Graphics 630] [8086:9bc5] (rev 05)

Since GEN9 is the lowest ISA specifically supported this suggests that some of the generic OpenCL kernels are broken.

@nwnk nwnk added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Jun 13, 2024
@nwnk
Copy link
Author

nwnk commented Jun 13, 2024

For additional data, with the OMP and OCL backends, the same baseline tests fail without the NONE setting; with it set, the OCL backend seems to be in better shape than SYCL:

76% tests passed, 78 tests failed out of 322

Total Test time (real) = 5526.59 sec

The following tests FAILED:
	  7 - test_binary_gpu (Failed)
	  8 - test_binary_buffer_gpu (Failed)
	 10 - test_concat_gpu (Failed)
	 11 - test_concat_buffer_gpu (Failed)
	 13 - test_concurrency_gpu (Failed)
	 14 - test_concurrency_buffer_gpu (Failed)
	 16 - test_convolution_backward_data_f32_gpu (Failed)
	 17 - test_convolution_backward_data_f32_buffer_gpu (Failed)
	 19 - test_convolution_backward_weights_f32_gpu (Failed)
	 20 - test_convolution_backward_weights_f32_buffer_gpu (Failed)
	 22 - test_convolution_eltwise_forward_f32_gpu (Failed)
	 23 - test_convolution_eltwise_forward_f32_buffer_gpu (Failed)
	 25 - test_convolution_eltwise_forward_x8s8f32s32_gpu (Failed)
	 26 - test_convolution_eltwise_forward_x8s8f32s32_buffer_gpu (Failed)
	 28 - test_convolution_forward_f32_gpu (Failed)
	 29 - test_convolution_forward_f32_buffer_gpu (Failed)
	 36 - test_cross_engine_reorder (Failed)
	 37 - test_cross_engine_reorder_buffer (Failed)
	 39 - test_deconvolution_gpu (Failed)
	 40 - test_deconvolution_buffer_gpu (Failed)
	 78 - test_inner_product_backward_data_gpu (Failed)
	 79 - test_inner_product_backward_data_buffer_gpu (Failed)
	 81 - test_inner_product_backward_weights_gpu (Failed)
	 82 - test_inner_product_backward_weights_buffer_gpu (Failed)
	 84 - test_inner_product_forward_gpu (Failed)
	 85 - test_inner_product_forward_buffer_gpu (Failed)
	 93 - test_matmul_gpu (Failed)
	 94 - test_matmul_buffer_gpu (Failed)
	 96 - test_persistent_cache_api_gpu (Failed)
	 97 - test_persistent_cache_api_buffer_gpu (Failed)
	102 - test_pooling_forward_gpu (Failed)
	103 - test_pooling_forward_buffer_gpu (Failed)
	108 - test_primitive_cache_mt_gpu (Subprocess aborted)
	109 - test_primitive_cache_mt_buffer_gpu (Subprocess aborted)
	114 - test_reorder_gpu (Failed)
	115 - test_reorder_buffer_gpu (Failed)
	123 - test_shuffle_gpu (Failed)
	124 - test_shuffle_buffer_gpu (Failed)
	129 - test_sum_gpu (Failed)
	130 - test_sum_buffer_gpu (Failed)
	170 - test_api (Failed)
	188 - test_graph_c_api_compile_usm_gpu (Failed)
	190 - test_graph_c_api_compile_parametrized_usm_gpu (Failed)
	192 - test_graph_cpp_api_compile_usm_gpu (Failed)
	194 - test_graph_cpp_api_partition_usm_gpu (Failed)
	196 - test_graph_cpp_api_compiled_partition_ocl_gpu (Failed)
	221 - test_graph_unit_dnnl_batch_norm_usm_gpu (Failed)
	223 - test_graph_unit_dnnl_binary_op_usm_gpu (Failed)
	225 - test_graph_unit_dnnl_bmm_usm_gpu (Failed)
	227 - test_graph_unit_dnnl_compiled_partition_usm_gpu (Failed)
	229 - test_graph_unit_dnnl_concat_usm_gpu (Failed)
	231 - test_graph_unit_dnnl_conv_usm_gpu (Failed)
	233 - test_graph_unit_dnnl_convtranspose_usm_gpu (Failed)
	235 - test_graph_unit_dnnl_dequantize_usm_gpu (Failed)
	237 - test_graph_unit_dnnl_eltwise_usm_gpu (Failed)
	241 - test_graph_unit_dnnl_large_partition_usm_gpu (Failed)
	245 - test_graph_unit_dnnl_matmul_usm_gpu (Failed)
	249 - test_graph_unit_dnnl_pool_usm_gpu (Failed)
	253 - test_graph_unit_dnnl_quantize_usm_gpu (Failed)
	255 - test_graph_unit_dnnl_reduce_usm_gpu (Failed)
	257 - test_graph_unit_dnnl_reorder_usm_gpu (Failed)
	261 - test_graph_unit_dnnl_softmax_usm_gpu (Failed)
	274 - test_benchdnn_modeC_concat_ci_gpu (Failed)
	276 - test_benchdnn_modeC_conv_gpu_ci_gpu (Failed)
	278 - test_benchdnn_modeC_deconv_ci_gpu (Failed)
	280 - test_benchdnn_modeC_eltwise_ci_gpu (Failed)
	284 - test_benchdnn_modeC_graph_ci_gpu (Subprocess aborted)
	286 - test_benchdnn_modeC_ip_ci_gpu (Failed)
	292 - test_benchdnn_modeC_matmul_ci_gpu (Failed)
	294 - test_benchdnn_modeC_pool_ci_gpu (Failed)
	300 - test_benchdnn_modeC_reorder_ci_gpu (Failed)
	305 - test_benchdnn_modeC_gru_ci_gpu (SEGFAULT)
	306 - test_benchdnn_modeC_lstm_ci_gpu (SEGFAULT)
	307 - test_benchdnn_modeC_rnn_ci_gpu (SEGFAULT)
	312 - test_benchdnn_modeC_self_ci_gpu (Failed)
	314 - test_benchdnn_modeC_shuffle_ci_gpu (Failed)
	316 - test_benchdnn_modeC_softmax_ci_gpu (Failed)
	318 - test_benchdnn_modeC_sum_ci_gpu (Failed)

88 GPU tests passed, so again, more working than not, but still not really working.

@vpirogov
Copy link
Member

Intel(R) UHD Graphics 630 support was discontinued and the last driver update published in the end of 2022. oneDNN dropped support for GEN9 in v3.4 release. Looks like we neglected to drop GEN9 from the ISA list though.

Trying your patch on newer architecture (Xe-HPC) I see 'could not create a primitive' errors for some tests. This looks like empty ISA list results in issues with platform detection and/or kernel dispatching. If you want to make DNNL_ENABLE_PRIMITIVE_GPU_ISA=NONE work likely additional implementation changes would be needed.

@vpirogov vpirogov added enhancement A feature or an optimization request help wanted and removed sighting Suspicious library behavior. Should be promoted to a bug when confirmed labels Jun 21, 2024
@densamoilov
Copy link
Contributor

densamoilov commented Jul 3, 2024

@nwnk,

The build documentation claims that generic OpenCL kernels are always available.

The documentation doesn't claim that, it says that ONEDNN_ENABLE_PRIMITIVE_GPU_ISA knob controls the just-in-time kernel generation based implementations and that the OpenCL based kernels and implementations are always available. It doesn't imply that the OpenCL kernels are generic even though some of them may be.

If there is a need to introduce generic OpenCL kernels then I believe that best way to do that would be via introducing a generic GPU vendor (ONEDNN_GPU_VENDOR=GENERIC). We have a plan to do that for SYCL GPU runtime.

The ONEDNN_ENABLE_PRIMITIVE_GPU_ISA knob should be used to control implementations within a particular vendor if there is such a need.

@vpirogov
Copy link
Member

vpirogov commented Jul 9, 2024

It's also important to note that there are no "generic OpenCL kernels" in oneDNN. These are relying on Intel vendor extensions. We are working on SYCL-based cross-platform implementation currently as part of UXL Foundation initiative.

@vpirogov vpirogov assigned nwnk and unassigned vpirogov Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A feature or an optimization request help wanted platform:intel-gpu
Projects
None yet
Development

No branches or pull requests

4 participants