[SYCL] Add -fsycl-fp32-prec-sqrt flag#5309
Conversation
This flag enables correctly rounded `sycl::sqrt` (the default precision requirement is 3 ULP). And enables the flag for CUDA and HIP targets.
elizabethandrews
left a comment
There was a problem hiding this comment.
FE changes LGTM. Thanks!
smanna12
left a comment
There was a problem hiding this comment.
FE changes look good to me.
premanandrao
left a comment
There was a problem hiding this comment.
Just a couple minor nits.
Co-authored-by: premanandrao <premanand.m.rao@intel.com>
5cebae7
Co-authored-by: premanandrao <premanand.m.rao@intel.com>
Co-authored-by: Artem Gindinson <artem.gindinson@intel.com>
There was a problem hiding this comment.
The implementation/tests LGTM, thanks!
I would prefer to leave the actual @intel/dpcpp-clang-driver-reviewers approval to @mdtoguchi or @hchilama in view of the new option being added.
bader
left a comment
There was a problem hiding this comment.
Regression/kernel_name_class.cpp from llvm-test-suite fails with:
Memory access fault by GPU node-1 (Agent handle: 0x828770) on address 0x7fe65b61e000. Reason: Page not present or supervisor privilege.
I don't think it's directly related to this patch, but it might be an issue in the runtime library or hip plug-in. Any ideas what's going on here?
elizabethandrews
left a comment
There was a problem hiding this comment.
FE changes LGTM!
This is strange, I'm unable to reproduce these failures with this branch and the latest |
|
Based on the history of pre-commit checks in this pull request, the issue seems to be sporadic, but still the log suggests there is a bug somewhere as test program accesses wrong memory location. |
|
@npmiller, I'll restart GitHub Actions jobs. |
Thanks for the suggestion. I will update my example to include double-precision data type. |
Thanks for adding fp64! If we do support fp64, do we need to rename the option a bit, e.g. remove fp32 given the CUDA compiler option does not have "type" in the name. |
The SYCL spec already requires double precision
So, for SYCL, under -O3, do we have (or need to have) same behavior as NVCC for SYCL compiler? |
With the verification result of the updated example, I found that nvcc calls the fast sqrt for double precision. The optimization option is just "-O3". However, the sycl compiler calls the correctly rounded sqrt. Thanks. |
|
@bader looks like all of the testing passed on this after re-running the CI |
|
Great. Let's check that @xtian-github's concerns are resolved. |
|
@xtian-github, ping. |
* upstream/sycl: (3571 commits) [ESIMD] Doxygen update part III - core APIs. (intel#5472) [SYCL][DOC] Move proposed FPGA extensions (intel#5453) [SYCL] Add -fsycl-fp32-prec-sqrt flag (intel#5309) [SYCL] Emit program build logs for warning levels >= 2 (intel#5319) [SYCL] Add clang support for code_location in KernelInfo (intel#5335) [SYCL][Doc] Move FPGA extensions (intel#5470) [ESIMD] Fix public simd and simd_view APIs. (intel#5465) [SYCL] Deprecate sycl::atomics in SYCL 2020 mode (intel#5440) [SYCL] Add unit test for PR 5414 (intel#5450) [XPTI] Allow arbitrary data types in metadata (intel#4998) [SYCL][DOC] Move discard queue events to supported (intel#5452) [Driver][SYCL] Initial support for allowing fat static -lname processing (intel#5413) [SYCL] Fix dead pointer usage if leaf buffer overflows (intel#5417) [SYCL][L0] Fix memory leak in USM prefetch (intel#5461) [SYCL][Doc] Add new free function queries proposal (intel#5106) [SYCL][ESIMD] Update vc-intrinsics deps to the top of the trunk (intel#5460) [SYCL][DOC] Move old spec constant extension spec (intel#5456) [SYCL][DOC] Move deprecated extensions (intel#5458) [SYCL][DOC] Fix links to old SubGroupMask doc (intel#5459) [ESIMD] Doxygen update part II - memory APIs. (intel#5443) ...
It follows the approach from intel#5141 and intel#5309 adding intermediate fcuda-prec-div flag. Signed-off-by: Sidorov, Dmitry <dmitry.sidorov@intel.com>
This flag enables correctly rounded
sycl::sqrt(the default precisionrequirement is 3 ULP).
And enables the flag for CUDA and HIP targets.
This is a follow up from #5141, to have a proper fix for #4041.