@@ -123,10 +123,15 @@ should be used.
123123
124124There is experimental support for DPC++ for CUDA devices.
125125
126- To enable support for CUDA devices, the following arguments need to be added to
127- the CMake command when building the DPC++ compiler.
126+ To enable support for CUDA devices, follow the instructions for the Linux
127+ DPC++ toolchain, but replace the cmake command with the following one:
128+
128129
129130```
131+ cmake -DCMAKE_BUILD_TYPE=Release \
132+ -DLLVM_EXTERNAL_PROJECTS="llvm-spirv;sycl" \
133+ -DLLVM_EXTERNAL_SYCL_SOURCE_DIR=$SYCL_HOME/llvm/sycl \
134+ -DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR=$SYCL_HOME/llvm/llvm-spirv \
130135-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda/ \
131136-DLLVM_ENABLE_PROJECTS="clang;llvm-spirv;sycl;libclc" \
132137-DSYCL_BUILD_PI_CUDA=ON \
@@ -145,6 +150,24 @@ above.
145150
146151# Use DPC++ toolchain
147152
153+ ## Using the SYCL toolchain on CUDA platforms
154+
155+ The SYCL toolchain support on CUDA platforms is still in an experimental phase.
156+ Currently, the SYCL toolchain relies on having a recent OpenCL implementation
157+ on the system in order to link applications to the SYCL runtime.
158+ The OpenCL implementation is not used at runtime if only the CUDA backend is
159+ used in the application, but must be installed.
160+
161+ The OpenCL implementation provided by the CUDA SDK is OpenCL 1.2, which is
162+ too old to link with the SYCL runtime and lacks some symbols.
163+
164+ We recommend installing the low level CPU runtime, following the instructions
165+ in the next section.
166+
167+ Instead of installing the low level CPU runtime, it is possible to build and
168+ install the [ Khronos ICD loader] ( https://github.com/KhronosGroup/OpenCL-ICD-Loader ) ,
169+ which contains all the symbols required.
170+
148171## Install low level runtime
149172
150173To run DPC++ applications on OpenCL devices, OpenCL implementation(s) must be
@@ -262,6 +285,9 @@ ninja check-all
262285If no OpenCL GPU/CPU runtimes are available, the corresponding tests are
263286skipped.
264287
288+ If CUDA support has been built, it is tested only if there are CUDA devices
289+ available.
290+
265291### Run Khronos\* SYCL\* conformance test suite (optional)
266292
267293Khronos\* SYCL\* conformance test suite (CTS) is intended to validate
@@ -394,6 +420,19 @@ clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice \
394420This ` simple-sycl-app.exe ` application doesn't specify SYCL device for
395421execution, so SYCL runtime will use ` default_selector ` logic to select one
396422of accelerators available in the system or SYCL host device.
423+ In this case, the behaviour of the ` default_selector ` can be altered
424+ using the ` SYCL_BE ` environment variable, setting ` PI_CUDA ` forces
425+ the usage of the CUDA backend (if available), ` PI_OPENCL ` will
426+ force the usage of the OpenCL backend.
427+
428+ ``` bash
429+ SYCL_BE=PI_CUDA ./simple-sycl-app-cuda.exe
430+ ```
431+
432+ The default is the OpenCL backend if available.
433+ If there are no OpenCL or CUDA devices available, the SYCL host device is used.
434+ The SYCL host device executes the SYCL application directly in the host,
435+ without using any low-level API.
397436
398437Note: ` nvptx64-nvidia-cuda-sycldevice ` is usable with ` -fsycl-targets `
399438if clang was built with the cmake option ` SYCL_BUILD_PI_CUDA=ON ` .
@@ -403,6 +442,7 @@ if clang was built with the cmake option `SYCL_BUILD_PI_CUDA=ON`.
403442./simple-sycl-app.exe
404443The results are correct!
405444```
445+
406446** Note** :
407447Currently, when the application has been built with the CUDA target, the CUDA
408448backend must be selected at runtime using the ` SYCL_BE ` environment variable.
@@ -411,7 +451,7 @@ backend must be selected at runtime using the `SYCL_BE` environment variable.
411451SYCL_BE=PI_CUDA ./simple-sycl-app-cuda.exe
412452```
413453
414- NOTE: DPC++/SYCL developer can specify SYCL device for execution using device
454+ NOTE: DPC++/SYCL developers can specify SYCL device for execution using device
415455selectors (e.g. ` cl::sycl::cpu_selector ` , ` cl::sycl::gpu_selector ` ,
416456[ Intel FPGA selector(s)] ( extensions/IntelFPGA/FPGASelector.md ) ) as
417457explained in following section [ Code the program for a specific
0 commit comments