Add support for ROCm in Studio setup#4390
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c5b8ac5fa
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| GPU_BACKEND="rocm" | ||
| fi | ||
|
|
||
| if [ "$GPU_BACKEND" = "cuda" ]; then |
There was a problem hiding this comment.
Restore the CUDA selection path when nvcc is found
In studio/setup.sh, NVCC_PATH is still populated above, but this branch now keys off GPU_BACKEND == "cuda" even though the CUDA probe never assigns that variable. On any host where nvcc is installed, setup falls through to the CPU-only or ROCm paths instead of adding -DGGML_CUDA=ON, so the commit regresses NVIDIA builds from GPU-enabled to non-CUDA.
Useful? React with 👍 / 👎.
| echo " Could not detect AMD GPU arch -- building for common gfx targets (gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100)" | ||
| CMAKE_ARGS="$CMAKE_ARGS -DAMDGPU_TARGETS=gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100" |
There was a problem hiding this comment.
Expand the ROCm fallback target list to supported GPUs
This fallback AMDGPU_TARGETS list only covers gfx900/gfx906/gfx908/gfx90a/gfx1030/gfx1100, but the repo already treats other ROCm architectures such as gfx940, gfx941, gfx942, and gfx1101 as supported in unsloth/kernels/utils.py:81-96. When hipcc is available but rocminfo is missing or returns nothing, those GPUs will be compiled out of the llama.cpp binary, which typically shows up at runtime as a ROCm “no binary for GPU” failure on otherwise supported hardware.
Useful? React with 👍 / 👎.
|
Thanks for the ROCm support -- this is great to have. I pushed two follow-up commits to fix a couple of issues: 1. The original refactor introduced 2. ROCm probe overwrites CUDA on mixed-toolchain hosts (second commit) On systems with both 3. Removed stale hardcoded fallback target list (second commit) The fallback Also factored the duplicated |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d755fec02
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if command -v rocminfo &>/dev/null; then | ||
| # Extract gfx architectures from all AMD GPUs | ||
| # Look for lines like " Name: gfx90a" or "gfx90a" | ||
| _gfx_list=$(rocminfo 2>/dev/null | grep -oP 'gfx[0-9]{3,4}[a-z]?' | sort -u || true) |
There was a problem hiding this comment.
Accept gfx90a when parsing ROCm targets
The ROCm target parser currently requires 3–4 digits (gfx[0-9]{3,4}[a-z]?), which excludes valid architectures like gfx90a even though the surrounding comments cite gfx90a as an expected match. When rocminfo returns multiple architectures, this can generate AMDGPU_TARGETS without gfx90a, so the resulting llama.cpp binary is built without that code object and fails on those GPUs at runtime (typically as a ROCm "no binary for GPU" error).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d647576744
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ROCM_ROOT="$(cd "$(dirname "$ROCM_PATH")/.." 2>/dev/null && pwd)" | ||
| echo " Building with ROCm support (AMD GPU, hipcc: $ROCM_PATH)..." | ||
| CMAKE_ARGS="$CMAKE_ARGS -DGGML_HIP=ON" | ||
| export ROCM_PATH="$ROCM_ROOT" |
There was a problem hiding this comment.
Resolve
hipcc symlink before exporting ROCM_PATH
When hipcc is discovered via command -v, this code derives the ROCm root from the literal executable path without resolving symlinks first. On systems where hipcc is exposed as /usr/bin/hipcc -> /opt/rocm*/bin/hipcc, ROCM_ROOT becomes /usr, and the exported ROCM_PATH points CMake at the wrong prefix during the subsequent cmake call. In that setup, HIP package discovery can fail or link against the wrong ROCm install, causing ROCm builds to break even though hipcc is available.
Useful? React with 👍 / 👎.
|
This is a fantastic addition, and timely — I've just spent the last few hours getting Unsloth Studio running on an 8x MI300X AMD host, and I ran into many of the issues this PR addresses. The ROCm support in Based on my experience, there are a few more things that will be needed to get this working out-of-the-box for AMD users. I'm happy to open a follow-up PR to address these, but I wanted to share my findings here for context: 1. The
|
Doh, great catches thank you so much! |
pip resolves torch from PyPI during base package installation, pulling CPU-only wheels regardless of the host GPU. AMD ROCm users end up with a venv that cannot use their GPU for training. Add _ensure_rocm_torch() which runs immediately after base packages: - detects ROCm via $ROCM_PATH / /opt/rocm / hipcc - reads the installed version from /opt/rocm/.info/version - maps (major, minor) to the correct PyTorch wheel index via tuple comparison - skips if torch is already GPU-enabled (checks both torch.version.hip and torch.version.cuda to avoid clobbering CUDA torch on mixed hosts) - force-reinstalls torch + torchvision + torchaudio from the matched index URL Tested on 8×AMD MI355X (ROCm 7.1) — version detection, wheel mapping, and no-op behaviour all verified. Fixes the issue raised by andyluo7 in unslothai#4390. Co-authored-by: billishyahao <bill.he@amd.com>
pip resolves torch from PyPI during base package installation, pulling CPU-only wheels regardless of the host GPU. AMD ROCm users end up with a venv that cannot use their GPU for training. Add _ensure_rocm_torch() which runs immediately after base packages: - detects ROCm via $ROCM_PATH / /opt/rocm / hipcc - reads the installed version from /opt/rocm/.info/version - maps (major, minor) to the correct PyTorch wheel index via tuple comparison - skips if torch is already GPU-enabled (checks both torch.version.hip and torch.version.cuda to avoid clobbering CUDA torch on mixed hosts) - force-reinstalls torch + torchvision + torchaudio from the matched index URL Tested on 8×AMD MI355X (ROCm 7.1) — version detection, wheel mapping, and no-op behaviour all verified. Fixes the issue raised by andyluo7 in unslothai#4390. Co-authored-by: billishyahao <bill.he@amd.com>
pip resolves torch from PyPI during base package installation, pulling CPU-only wheels regardless of the host GPU. AMD ROCm users end up with a venv that cannot use their GPU for training. Add _ensure_rocm_torch() which runs immediately after base packages: - detects ROCm via $ROCM_PATH / /opt/rocm / hipcc - reads the installed version from /opt/rocm/.info/version - maps (major, minor) to the correct PyTorch wheel index via tuple comparison - skips if torch is already GPU-enabled (checks both torch.version.hip and torch.version.cuda to avoid clobbering CUDA torch on mixed hosts) - force-reinstalls torch + torchvision + torchaudio from the matched index URL Tested on 8×AMD MI355X (ROCm 7.1) — version detection, wheel mapping, and no-op behaviour all verified. Fixes the issue raised by andyluo7 in unslothai#4390. Co-authored-by: billishyahao <bill.he@amd.com>
danielhanchen
left a comment
There was a problem hiding this comment.
Thank you for the PR! The goal of this PR is to add AMD ROCm GPU support to the Studio setup script so that llama.cpp can be built with HIP acceleration on AMD GPUs. As a summary, this PR refactors the GPU detection in studio/setup.sh from "CUDA or CPU" to "CUDA, ROCm, or CPU" by introducing a GPU_BACKEND variable, adding ROCm/hipcc detection with the same fallback pattern as CUDA, and passing -DGGML_HIP=ON plus optional GPU_TARGETS to cmake.
Testing performed:
- Verified all 6 code paths (CUDA-only, ROCm-only, both, neither, CUDA-driver-only, ROCm-driver-only) produce correct branch selection via isolated mock harness
- Full CUDA build of llama.cpp succeeded on NVIDIA B200 (compute cap 10.0) with the PR's detection logic
- Binary linked correctly against CUDA libs, llama-server + llama-quantize both built
- CUDA path is not broken by this PR -- cmake flags match main exactly
| Reviewers | Severity | Finding |
|---|---|---|
| 8/8 | High | ROCM_PATH derived from unresolved hipcc path; symlinked installs export wrong root |
| 7/8 | High | Forcing hipcc as C/C++ compiler enters llama.cpp legacy path, making GPU_TARGETS detection dead code |
| 2/8 | Medium | Backend selection based on tool presence not actual GPU; mixed-toolchain hosts may pick wrong backend |
The NVIDIA/CUDA and CPU-only paths are confirmed safe. The ROCm-specific issues above should be addressed before merge.
Concrete suggestions for each finding below.
| ROCM_ROOT="$(cd "$(dirname "$ROCM_PATH")/.." 2>/dev/null && pwd)" | ||
| echo " Building with ROCm support (AMD GPU, hipcc: $ROCM_PATH)..." | ||
| CMAKE_ARGS="$CMAKE_ARGS -DGGML_HIP=ON" | ||
| export ROCM_PATH="$ROCM_ROOT" |
There was a problem hiding this comment.
[8/8 reviewers] ROCM_ROOT is derived from the raw command -v hipcc path without resolving symlinks. If hipcc is a symlink or wrapper (common on packaged installs, e.g., /usr/bin/hipcc -> /opt/rocm/bin/hipcc), this computes ROCM_ROOT=/usr instead of the real ROCm prefix, causing find_package(hip) to fail.
Use readlink -f to resolve the real path, and prefer hipconfig -R when available:
| export ROCM_PATH="$ROCM_ROOT" | |
| # Resolve hipcc symlinks and derive the real ROCm root | |
| HIPCC_REALPATH="$(readlink -f "$ROCM_PATH" 2>/dev/null || printf '%s' "$ROCM_PATH")" | |
| ROCM_ROOT="" | |
| if command -v hipconfig &>/dev/null; then | |
| ROCM_ROOT="$(hipconfig -R 2>/dev/null || true)" | |
| fi | |
| if [ -z "$ROCM_ROOT" ]; then | |
| ROCM_ROOT="$(cd "$(dirname "$HIPCC_REALPATH")/.." 2>/dev/null && pwd)" | |
| fi | |
| echo " Building with ROCm support (AMD GPU, hipcc: $HIPCC_REALPATH)..." | |
| CMAKE_ARGS="$CMAKE_ARGS -DGGML_HIP=ON" | |
| export ROCM_PATH="$ROCM_ROOT" | |
| export HIP_PATH="$ROCM_ROOT" |
| [ -n "$_valid_gfx" ] && GPU_TARGETS="$_valid_gfx" | ||
| fi | ||
|
|
||
| CMAKE_ARGS="$CMAKE_ARGS -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc" |
There was a problem hiding this comment.
[7/8 reviewers] Forcing -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc has two problems:
-
In upstream
ggml/src/ggml-hip/CMakeLists.txt, whenCMAKE_CXX_COMPILERis hipcc, llama.cpp enters its legacyCXX_IS_HIPCCbranch and skips forwardingGPU_TARGETStoCMAKE_HIP_ARCHITECTURES. This means the rocminfo detection above is dead code -- targets are never actually limited. -
hipcc may fail to compile plain C sources on some ROCm versions, breaking the build.
Upstream llama.cpp docs recommend using HIPCXX/HIP_PATH env vars instead:
| CMAKE_ARGS="$CMAKE_ARGS -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc" | |
| # Follow upstream llama.cpp HIP build path (do not force hipcc as C/C++ compiler) | |
| if command -v hipconfig &>/dev/null; then | |
| HIP_CLANG_DIR="$(hipconfig -l 2>/dev/null || true)" | |
| [ -n "$HIP_CLANG_DIR" ] && export HIPCXX="$HIP_CLANG_DIR/clang" | |
| fi |
| else | ||
| echo " Could not detect AMD GPU arch -- building for default targets (cmake will auto-detect)" | ||
| fi | ||
| elif [ -d /usr/local/cuda ] || command -v nvidia-smi &>/dev/null; then |
There was a problem hiding this comment.
[2/8 reviewers] Minor: command -v nvidia-smi instead of bare nvidia-smi is actually a good improvement over main (avoids invoking the binary just to check existence). Nice catch.
The ROCm driver-only fallback message is helpful for users who have the kernel driver but not the userspace tools.
Additional findings from extended review (8 independent reviewers)The inline review above covers the GPU detection changes (the actual PR diff). However, the PR branch is significantly behind Recommendation: Rebase onto current ROCm-specific findings (from the PR diff itself)
Regressions from being behind main (resolved by rebasing)
|
d647576 to
efedbe9
Compare
Currently, the studio setup only checks for Nvidia:
unsloth/studio/setup.sh
Line 309 in e138a3d
Tested on MI300X with ROCm 7.2: