[SYCL][CUDA] Add IPSCCP pass to O0 by default by JackAKirk · Pull Request #5900 · intel/llvm

JackAKirk · 2022-03-28T12:55:29Z

The IPSCCP pass can set branch conditions to ConstInt and swap conditional branches to unconditional branches.
This is necessary at O0 in the nvptx backend in cases where the nvvm_reflect function is used: after the nvvm-reflect pass is called, dead branches containing unused instructions aimed at a different architecture generation (SM version) to the one compiled for can remain.

A solution only targeting branches that are using the nvvm_reflect function was initially explored by adding a patch to the existing nvvm-reflect pass. This solution would require considering several cases and was abandoned in favour of a simple comprehensive solution of simply adding the IPSCCP pass to OO: since after discussions it turned out that other backends face a corresponding issue, it was decided that a simple temporary DPC++ solution is favoured and that later on in the year a permanent general solution will be worked on.

New backend flag use-ipsccp-nvptx-O0 can remove the IPSCCP pass from O0 when set false, at the users discretion.

new flag `use-ipsccp-nvptx-O0` can remove the IPSCCP pass from O0 when set false.

* sycl: (3343 commits) [SYCL][L0] Disable round-robin submissions to multiple CCSs (intel#5945) [SYCL][CUDA] Don't link pi_cuda against libsycl (intel#5908) [CI] Disable -Werror by default (intel#5889) [BuildBot] Uplift CPU/FPGAEMU RT version to 2022.13.3.0.16 (intel#5883) [SYCL][CUDA][libclc] Add support for atomic fp exchange and compare exchange (intel#5937) [SYCL] Fix device code outlining for static local variables (intel#5915) [SYCL][NFC] Refactor plugin CMakeLists.txt (intel#5799) [SPIR-V][Doc] Add JointMatrixWorkItemLengthINTEL instruction to joint matrix extension (intel#5781) [SYCL] Expand device_global map and make initialization order agnostic (intel#5902) [SYCL][CUDA] Add IPSCCP pass to O0 by default (intel#5900) [ESIMD] Disable ABI changes warnings in host compiler. (intel#5931) [SYCL] Make properties constructor constexpr (intel#5928) [NFC][SYCL] Fix static analysis warning (intel#5933) [CODEOWNERS][NFC] Assign code owners for CI scripts (intel#5873) [SYCL] Store the kernel object size in the integration header (intel#5862) [SYCL][ESIMD] Change esimd-verifier logic for detecting valid SYCL calls (intel#5914) [SYCL][CUDA][DOC] GettingStartedGuide.md to recommend cuda 11.6 (intel#5917) [SYCL][L0] Move command list cache usage under mutex (intel#5874) [SYCL][FPGA] Prepare future implementation of experimental pipe properties (intel#5886) [CI] Roll back intel driver to the latest version (intel#5925) ...

#5921) The libclc remangler handles function overloads with e.g. `long long` `long` and `int`, ensuring consistency with OpenCL C primitives. Previously, this was achieved by creating a `GlobalAlias` for each of the various overloads. However, the NVPTX target does not work with function aliases. Normally, an optimization pass removes these aliases, but the present approach prevents compiling with DPC++ for CUDA with `-O0`. This PR changes the behaviour of the remangler to emit function clones (a copy of the function with a different name). There is a risk that this bloats the compiled code, but optimization should remove unneeded clones, as it did with unneeded aliases. There is an additional barrier to `-O0` compilation for NVPTX relating to `nvvm_reflect`, addressed here: #5900 **Note:** this PR is best reviewed as separate commits. The first commit makes the (small) functional change. The second commit is simply renaming all 'Alias*' variables to 'Clone*'.

Return back additional switch for test, that was introduced in intel#5900

Added IPSCCP pass to O0 by default.

ddebb0e

new flag `use-ipsccp-nvptx-O0` can remove the IPSCCP pass from O0 when set false.

JackAKirk requested review from a team as code owners March 28, 2022 12:55

JackAKirk added 2 commits March 28, 2022 13:58

format

02dd4ca

remove IPSCCP from param-load-store test using O0.

68aa738

JackAKirk changed the title ~~[SYCL][CUDA] Added IPSCCP pass to O0 by default.~~ [SYCL][CUDA] Added IPSCCP pass to O0 by default Mar 28, 2022

AlexeySachkov approved these changes Mar 29, 2022

View reviewed changes

bader approved these changes Mar 29, 2022

View reviewed changes

joeatodd mentioned this pull request Mar 29, 2022

[SYCL][CUDA][libclc] Clone functions rather than aliasing in remangler #5921

Merged

bader merged commit 537e51b into intel:sycl Mar 31, 2022

bader changed the title ~~[SYCL][CUDA] Added IPSCCP pass to O0 by default~~ [SYCL][CUDA] Add IPSCCP pass to O0 by default Mar 31, 2022

AerialMantis mentioned this pull request Apr 4, 2022

[CUDA] CTS usm_atomic_access_atomic64 is failed in function cuda_piQueueFinish and cuda_piextUSMFree for unspecified launch failure #5210

Closed

pvchupin pushed a commit to pvchupin/llvm that referenced this pull request May 7, 2022

[SYCL][NVPTX] Fix wrong pulldown merge

f6739d1

Return back additional switch for test, that was introduced in intel#5900

t4c1 mentioned this pull request May 11, 2022

[CUDA] Regression/unoptimized_stream.cpp fails in pre-ci-cuda. intel/llvm-test-suite#387

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA] Add IPSCCP pass to O0 by default#5900

[SYCL][CUDA] Add IPSCCP pass to O0 by default#5900
bader merged 3 commits intointel:syclfrom
JackAKirk:nvvm_reflect_O0

JackAKirk commented Mar 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JackAKirk commented Mar 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants