Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue] Incorrect mapping nvcc flags to hipcc flags #3764

Closed
Qubitium opened this issue Mar 13, 2025 · 16 comments
Closed

[Issue] Incorrect mapping nvcc flags to hipcc flags #3764

Qubitium opened this issue Mar 13, 2025 · 16 comments
Assignees
Labels
CUDA compatibility CUDA compatibility-related

Comments

@Qubitium
Copy link

Qubitium commented Mar 13, 2025

Problem Description

When there is an obvious nvcc flag that has 1-to-1 mapping to hipcc flag, hipify should do the right thing and convert the flags to hip flag. I think hipify does the correct job most of the time but I have found a case where it doesn't, causing cryptic and strange compile errors and putting the burden on the cuda devs who have no idea what hipify is doing for the most part.

nvcc flags in question:

# nvcc
"-U__CUDA_NO_HALF_OPERATORS__", # <-- NVCC
"-U__CUDA_NO_HALF_CONVERSIONS__", # <-- NVCC

Expectation from hipify

# expected correct mapping, 1-to-1 
"-U__HIP_NO_HALF_OPERATORS__", # <-- ROCm/HIP FIX
"-U__HIP_NO_HALF_CONVERSIONS__", # <-- ROCm/HIP FIX

Actual mapping: Not performed. Causing the following to be injected for hipcc

"-D__HIP_NO_HALF_OPERATORS__", # <-- ROCm/HIP BUG
"-D__HIP_NO_HALF_CONVERSIONS__", # <-- ROCm/HIP BUG

Full Output

/opt/rocm-6.3.3/bin/hipcc -I/root/miniconda3/lib/python3.12/site-packages/torch/include -I/root/miniconda3/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/lib/python3.12/site-packages/torch/include/THH -I/opt/rocm-6.3.3/include -I/root/miniconda3/lib/python3.12/site-packages/nvidia/cuda_runtime/include -I/root/miniconda3/include/python3.12 -I/root/miniconda3/include/python3.12 -c rock_kernel/q_gemm.hip -o build/temp.linux-x86_64-cpython-312/rock_kernel/q_gemm.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -D_GLIBCXX_USE_CXX11_ABI=1 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -DTORCH_EXTENSION_NAME=rockthem_kernel -D_GLIBCXX_USE_CXX11_ABI=1 --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx942 --offload-arch=gfx1030 --offload-arch=gfx1100 --offload-arch=gfx1101 --offload-arch=gfx1102 -fno-gpu-rdc

Operating System

Ubuntu 22.04

CPU

AMD ZEN5 EPYC

GPU

MI300X

ROCm Version

6.3.3

ROCm Component

HIP

Steps to Reproduce

I have pushed reproducing code into a repo: https://github.com/ModelCloud/rockthem

git clone https://github.com/ModelCloud/rockthem
cd rockthem
pip install -e . --no-build-isolation -v 

Fix is quite simple: Correctly map nvcc to hip flags when 1-to-1 match is found or whenever applicable to alleviate cuda migration pain.

Image

Image

Image

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@searlmc1
Copy link
Contributor

Hi @Qubitium - thanks for reporting the issue.
@emankov - can you look into this?

@searlmc1
Copy link
Contributor

Using the reproducer, with a few extra setup steps and a tweak to workaround a compiler minimum version check, I am able to produce the hipcc cmd containing ....-D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1.... as depicted in the screenshot above

@emankov emankov changed the title [Issue]: Hipify is incorrectly mapping nvcc flags to hipcc flags [Issue] Incorrect mapping nvcc flags to hipcc flags Mar 14, 2025
@emankov
Copy link
Contributor

emankov commented Mar 14, 2025

It is not a HIPIFY issue, as HIPIFY tools only transform the CUDA source without considering how the hipified source should be compiled further. And it also doesn't know anything about how to compile ("correctly") the CUDA source.

@emankov emankov added CUDA compatibility CUDA compatibility-related hipcc hipcc-related labels Mar 14, 2025
@searlmc1
Copy link
Contributor

Hi @Qubitium - can you share the output of clang --version? While I am able to reproduce the hipcc cmd line, I initially hit an error where a compiler version check in py_envs/lib/python3.12/site-packages/torch/utils/cpp_extension.py failed. My clang's version looks roughly like so: AMD clang version 19.0.0git ; the '0git' tripped up the check as it expected an integer value. I assume you did not hit this, hence the question. Thanks,

@searlmc1
Copy link
Contributor

-D__HIP_NO_HALF_OPERATORS__=1 is not mapped from the nvcc flag per se. It is explicitly added by https://github.com/pytorch/pytorch/blob/main/torch/utils/cpp_extension.py#L279 ; now, the next question is why is it explicitly added. I will loop in ROCm-Pytorch folks

jeffdaily pushed a commit to ROCm/pytorch that referenced this issue Mar 14, 2025
@Qubitium
Copy link
Author

Qubitium commented Mar 14, 2025

Hi @Qubitium - can you share the output of clang --version? While I am able to reproduce the hipcc cmd line, I initially hit an error where a compiler version check in py_envs/lib/python3.12/site-packages/torch/utils/cpp_extension.py failed. My clang's version looks roughly like so: AMD clang version 19.0.0git ; the '0git' tripped up the check as it expected an integer value. I assume you did not hit this, hence the question. Thanks,

@searlmc1 Clang is not installed on my system.

clang --version
Command 'clang' not found, but can be installed with:
apt install clang

OS: Ubuntu 22.04
Torch 6.7 nightly was instaleld via pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3
ROCM 6.3.3 rocm/jammy,now 6.3.3.60303-74~22.04 amd64 [installed] was instaleld via AMD doc using AMD repo:

@Qubitium
Copy link
Author

Qubitium commented Mar 14, 2025

Oh my. Now it appears I have hit a pip critical bug caused by clang + hipcc? It has completedly destroryed my ability to use pip after install clang and trying to build from my reproducing repo.

(base) root@gpu-xl:~/rockthem# clang --version
Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

After installing clang and doing pip intall -e . --no-build-isolation -v (forgot to remove build folder), my pip env is now unusable. I have no idea if clang is direct cause or now and if hipcc and clang is also partly at fault.

(base) root@gpu-xl:~/rockthem# pip install -U pip
Requirement already satisfied: pip in /root/miniconda3/lib/python3.12/site-packages (25.0.1)
ERROR: Error while checking for conflicts. Please file an issue on pip's issue tracker: https://github.com/pypa/pip/issues/new
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.12/site-packages/pip/_internal/commands/install.py", line 585, in _determine_conflicts
    return check_install_conflicts(to_install)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/pip/_internal/operations/check.py", line 117, in check_install_conflicts
    package_set, _ = create_package_set_from_installed()
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/pip/_internal/operations/check.py", line 59, in create_package_set_from_installed
    package_set[name] = PackageDetails(dist.version, dependencies)
                                       ^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/pip/_internal/metadata/importlib/_dists.py", line 175, in version
    return parse_version(self._dist.version)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/pip/_vendor/packaging/version.py", line 56, in parse
    return Version(version)
           ^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/pip/_vendor/packaging/version.py", line 200, in __init__
    match = self._regex.search(version)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'NoneType'

@kzhuravl kzhuravl removed the hipcc hipcc-related label Mar 14, 2025
@Qubitium
Copy link
Author

Qubitium commented Mar 14, 2025

It is not a HIPIFY issue, as HIPIFY tools only transform the CUDA source without considering how the hipified source should be compiled further. And it also doesn't know anything about how to compile ("correctly") the CUDA source.

hipify the actual tool only does the cu and cuh translation correct? Which tool is in charge of translating the nvcc flags to hipcc? It would be nice to know how hipify and hipcc ties together. Thanks.

I think there is confusion about the wording of hipify as a tool or a process within the ROCm eco system. As an external developer, I view everything that goes from cuda -> hip as hipify. Even the AMD devs do this:

ROCm/pytorch@cd95095

  def _hipify_compile_flags(self, extension):
        # Simple hipify, map CUDA->HIP
        if isinstance(extension.extra_compile_args, dict):
            extension.extra_compile_args['nvcc'] = [
                flag.replace("CUDA", "HIP") for flag in extension.extra_compile_args['nvcc']]

@searlmc1
Copy link
Contributor

searlmc1 commented Mar 15, 2025

Correct, hipify does src -> src translation; it does not translate build flags; we currently do not provide a tool that translates the nvcc flags, however, we have started discussions around that.

hipify is both a tool and a process . When referring to the tool, it refers to one of hipify-clang, hipify-perl, or hipify-torch. However, it is often referred to simply as hipify. When referring to the process, it references the process of taking CUDA code and translating - or hipify'ing it - into HIP code.

hipcc is a thin compiler driver; it invokes clang or nvcc depending on the environment. It also may add include/library options for the target compiler . In the early days, hipcc did more than it does today. In the early days, clang itself was missing functionality and so it was baked into hipcc. By and large, that functionality has been lifted out of hipcc.

I understand your confusion. I often see see hipcc referred to as a compiler. It is not a compiler. E.g., hipcc unto itself cannot compile anything; it formulates the clang, or nvcc, invocation and then invokes the compiler. You mentioned that you did not have clang installed. Because your environment is ROCm 6.3.3 + mi300x, I assumed that you would have clang via that ROCm installation.

If interested
hipify - https://github.com/ROCm/HIPIFY ; https://github.com/ROCm/hipify_torch
hipcc - https://github.com/ROCm/llvm-project/tree/amd-staging/amd/hipcc

@Qubitium
Copy link
Author

Qubitium commented Mar 15, 2025

-D__HIP_NO_HALF_OPERATORS__=1 is not mapped from the nvcc flag per se. It is explicitly added by https://github.com/pytorch/pytorch/blob/main/torch/utils/cpp_extension.py#L279 ; now, the next question is why is it explicitly added. I will loop in ROCm-Pytorch folks

@searlmc1

For a little context, the kernel I was testing and used in the reproducer is a kernel from the vLLM project and in-directly SGLang, since SGlang imports vLLM kernels. I also contribute to these projects.

This context is important becaue the kernel compiles correctly using vllm setup as they do not use pytorch's cudaExension hook for compilation but cmake instead. This was another headache that I had where I was asking myself "vLLM" compiles correctly for pytroch, but I can't?

Now the answer is becoming clear. Beause I, and most pytorch projects use from torch.utils import cpp_extension for cuda kernel compilation, the env generate by ROCm by pytorch is different vs cmake: the three flags you pointed out in pytorch commit:

https://github.com/pytorch/pytorch/blob/main/torch/utils/cpp_extension.py#L279

I think it will great if default flags in the cmake vs pytorch build cpp extension are sycned, as much as possible for the rocm eco system. This way, any compiler issues can be correctly pushed down to the developers. Otherwise, us devs are like headless chickens since hunting for compiler flag difference is the last thing on our minds or frankly, good at.

@Qubitium
Copy link
Author

Hi @Qubitium - can you share the output of clang --version? While I am able to reproduce the hipcc cmd line, I initially hit an error where a compiler version check in py_envs/lib/python3.12/site-packages/torch/utils/cpp_extension.py failed. My clang's version looks roughly like so: AMD clang version 19.0.0git ; the '0git' tripped up the check as it expected an integer value. I assume you did not hit this, hence the question. Thanks,

So installing ubuntu 22.04 clang and doing rocm compilation absolutely destroyed my pip env. This likely separate bug unrelated to this.

Found that amd/rocm bundle it's own clang.

(base) root@gpu-xl:~/rockthem# /opt/rocm-6.3.3/bin/amdclang --version
AMD clang version 18.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.3.3 25012 e5bf7e55c91490b07c49d8960fa7983d864936c4)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.3.3/lib/llvm/bin
Configuration file: /opt/rocm-6.3.3/lib/llvm/bin/clang.cfg

@Qubitium
Copy link
Author

@searlmc1 I made PR to merge some changes I believe would improve the existing PR/fix by @naromero77amd. Please check and compile. I did not compile because I am not sure I am capable of handling the depend hell that it requires. But I did unit test the method with code of unit test in the PR

ROCm/pytorch#1964

@searlmc1
Copy link
Contributor

K / thanks much / I'll take a look

@Qubitium
Copy link
Author

K / thanks much / I'll take a look

Thanks. BTW, ROCm jenken CI is broken. 😅 I was hoping it can compile the PR for me and run through the tests but...dependency hell gut punched that idea.

pytorchmergebot pushed a commit to ROCm/pytorch that referenced this issue Mar 17, 2025
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this issue Mar 18, 2025
…tensions (#149245)

Fixes ROCm/hip#3764.

Fixes and improvements to CUDA->HIP flag conversion for CPP extensions

- Log flag conversion for debugging purposes.
- Fix cases where it should not touch the -I flags or cases where CUDA appears more than once by replacing only the first instance.
- Fix case where nvcc key may not exist
- Fix case where hipify should ignore flag values and only touch the flag itself

Pull Request resolved: #149245
Approved by: https://github.com/jeffdaily

Co-authored-by: Qubitium-ModelCloud <[email protected]>
pytorchbot pushed a commit to pytorch/pytorch that referenced this issue Mar 18, 2025
…tensions (#149245)

Fixes ROCm/hip#3764.

Fixes and improvements to CUDA->HIP flag conversion for CPP extensions

- Log flag conversion for debugging purposes.
- Fix cases where it should not touch the -I flags or cases where CUDA appears more than once by replacing only the first instance.
- Fix case where nvcc key may not exist
- Fix case where hipify should ignore flag values and only touch the flag itself

Pull Request resolved: #149245
Approved by: https://github.com/jeffdaily

Co-authored-by: Qubitium-ModelCloud <[email protected]>
(cherry picked from commit c0566e0)
@jeffdaily
Copy link
Contributor

Attempting to get this into release/2.7. pytorch/pytorch#149044 (comment).

@jeffdaily
Copy link
Contributor

This issue did not close automatically with the PR merged. Closing now.

malfet pushed a commit to pytorch/pytorch that referenced this issue Mar 27, 2025
…tensions (#149432)

[ROCm] Fixes and improvements to CUDA->HIP flag conversion for CPP extensions (#149245)

Fixes ROCm/hip#3764.

Fixes and improvements to CUDA->HIP flag conversion for CPP extensions

- Log flag conversion for debugging purposes.
- Fix cases where it should not touch the -I flags or cases where CUDA appears more than once by replacing only the first instance.
- Fix case where nvcc key may not exist
- Fix case where hipify should ignore flag values and only touch the flag itself

Pull Request resolved: #149245
Approved by: https://github.com/jeffdaily

Co-authored-by: Qubitium-ModelCloud <[email protected]>
(cherry picked from commit c0566e0)

Co-authored-by: Nichols A. Romero <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA compatibility CUDA compatibility-related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants