Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cmake/BuildFlags.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ macro(set_build_flags)
# gcc -shared host.o kernel.o device-code.o -o libxxx.so
set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-sycl-unnamed-lambda)
set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -sycl-std=2020)
set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} -fno-reciprocal-math)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's study the NVCC behaviors first. I think reciprocal-math is a general optimization, and most the compilers enable it by default.

Copy link
Contributor Author

@SanityRemnants SanityRemnants Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EikanWang I believe --prec-div https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=reciprocal#prec-div-true-false-prec-div is the corresponding NVCC flag by default set to True unless --use_fast_math (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html?highlight=reciprocal#use-fast-math-use-fast-math) is set to True. I do not believe pytorch is built with --use_fast_math correct me if I'm wrong.

if(CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")
set(SYCL_KERNEL_OPTIONS ${SYCL_KERNEL_OPTIONS} /fp:strict)
# Suppress warnings about dllexport.
Expand Down