Skip to content

Add support for ROCm in Studio setup#4390

Closed
edamamez wants to merge 0 commit into
unslothai:mainfrom
edamamez:ez-amd-studio-support
Closed

Add support for ROCm in Studio setup#4390
edamamez wants to merge 0 commit into
unslothai:mainfrom
edamamez:ez-amd-studio-support

Conversation

@edamamez
Copy link
Copy Markdown
Contributor

@edamamez edamamez commented Mar 18, 2026

Currently, the studio setup only checks for Nvidia:

# Detect CUDA: check nvcc on PATH, then common install locations

Tested on MI300X with ROCm 7.2:

# unsloth studio setup
╔══════════════════════════════════════╗
║     Unsloth Studio Setup Script      ║
╚══════════════════════════════════════╝
⚠️  Node/npm not found. Installing via nvm...
Installing nvm...
Installing Node LTS...
✅ Node v24.14.0 | npm 11.9.0
✅ Frontend built to frontend/dist
finished finding best python
✅ Using python3 (3.12.3) — compatible (3.11.x – 3.13.x)
[====================] 11/11  finalizing
✅ Python dependencies installed

   Pre-installing transformers 5.x for newer model support...
✅ Transformers 5.x pre-installed to /root/.unsloth/studio/.venv_t5/

Building llama-server for GGUF inference...
   Building with ROCm support (AMD GPU, hipcc: /opt/rocm/bin/hipcc)...
   AMD GPU architectures: gfx942 -- limiting build to detected targets
✅ llama-server built at /root/.unsloth/llama.cpp/build/bin/llama-server
✅ llama-quantize available for GGUF export

╔══════════════════════════════════════╗
║           Setup Complete!            ║
╠══════════════════════════════════════╣
║ Launch with:                         ║
║                                      ║
║ unsloth studio -H 0.0.0.0 -p 8000    ║
╚══════════════════════════════════════╝

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c5b8ac5fa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/setup.sh Outdated
GPU_BACKEND="rocm"
fi

if [ "$GPU_BACKEND" = "cuda" ]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore the CUDA selection path when nvcc is found

In studio/setup.sh, NVCC_PATH is still populated above, but this branch now keys off GPU_BACKEND == "cuda" even though the CUDA probe never assigns that variable. On any host where nvcc is installed, setup falls through to the CPU-only or ROCm paths instead of adding -DGGML_CUDA=ON, so the commit regresses NVIDIA builds from GPU-enabled to non-CUDA.

Useful? React with 👍 / 👎.

Comment thread studio/setup.sh Outdated
Comment on lines +404 to +405
echo " Could not detect AMD GPU arch -- building for common gfx targets (gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100)"
CMAKE_ARGS="$CMAKE_ARGS -DAMDGPU_TARGETS=gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expand the ROCm fallback target list to supported GPUs

This fallback AMDGPU_TARGETS list only covers gfx900/gfx906/gfx908/gfx90a/gfx1030/gfx1100, but the repo already treats other ROCm architectures such as gfx940, gfx941, gfx942, and gfx1101 as supported in unsloth/kernels/utils.py:81-96. When hipcc is available but rocminfo is missing or returns nothing, those GPUs will be compiled out of the llama.cpp binary, which typically shows up at runtime as a ROCm “no binary for GPU” failure on otherwise supported hardware.

Useful? React with 👍 / 👎.

@danielhanchen
Copy link
Copy Markdown
Member

Thanks for the ROCm support -- this is great to have. I pushed two follow-up commits to fix a couple of issues:

1. GPU_BACKEND was never set to "cuda" when nvcc was found (first commit)

The original refactor introduced GPU_BACKEND but only set it to "rocm", so the CUDA branch (if [ "$GPU_BACKEND" = "cuda" ]) was unreachable. Added GPU_BACKEND="cuda" in all three nvcc detection paths.

2. ROCm probe overwrites CUDA on mixed-toolchain hosts (second commit)

On systems with both nvcc and hipcc installed, the ROCm detection ran unconditionally and overwrote GPU_BACKEND="cuda" with "rocm". Wrapped the ROCm detection in if [ -z "$GPU_BACKEND" ] so CUDA takes priority when both are present.

3. Removed stale hardcoded fallback target list (second commit)

The fallback gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100 was already missing current GPUs like MI300X (gfx942) and RDNA3 variants (gfx1101/gfx1102). Upstream llama.cpp supports omitting GPU_TARGETS entirely and will auto-detect, which is safer than shipping a curated list that goes stale. Changed the fallback to leave targets unset and let cmake handle it.

Also factored the duplicated -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc into a single assignment above the target conditional.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d755fec02

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/setup.sh Outdated
if command -v rocminfo &>/dev/null; then
# Extract gfx architectures from all AMD GPUs
# Look for lines like " Name: gfx90a" or "gfx90a"
_gfx_list=$(rocminfo 2>/dev/null | grep -oP 'gfx[0-9]{3,4}[a-z]?' | sort -u || true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept gfx90a when parsing ROCm targets

The ROCm target parser currently requires 3–4 digits (gfx[0-9]{3,4}[a-z]?), which excludes valid architectures like gfx90a even though the surrounding comments cite gfx90a as an expected match. When rocminfo returns multiple architectures, this can generate AMDGPU_TARGETS without gfx90a, so the resulting llama.cpp binary is built without that code object and fails on those GPUs at runtime (typically as a ROCm "no binary for GPU" error).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d647576744

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/setup.sh Outdated
Comment on lines +380 to +383
ROCM_ROOT="$(cd "$(dirname "$ROCM_PATH")/.." 2>/dev/null && pwd)"
echo " Building with ROCm support (AMD GPU, hipcc: $ROCM_PATH)..."
CMAKE_ARGS="$CMAKE_ARGS -DGGML_HIP=ON"
export ROCM_PATH="$ROCM_ROOT"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve hipcc symlink before exporting ROCM_PATH

When hipcc is discovered via command -v, this code derives the ROCm root from the literal executable path without resolving symlinks first. On systems where hipcc is exposed as /usr/bin/hipcc -> /opt/rocm*/bin/hipcc, ROCM_ROOT becomes /usr, and the exported ROCM_PATH points CMake at the wrong prefix during the subsequent cmake call. In that setup, HIP package discovery can fail or link against the wrong ROCm install, causing ROCm builds to break even though hipcc is available.

Useful? React with 👍 / 👎.

@andyluo7
Copy link
Copy Markdown

This is a fantastic addition, and timely — I've just spent the last few hours getting Unsloth Studio running on an 8x MI300X AMD host, and I ran into many of the issues this PR addresses. The ROCm support in setup.sh is much needed!

Based on my experience, there are a few more things that will be needed to get this working out-of-the-box for AMD users. I'm happy to open a follow-up PR to address these, but I wanted to share my findings here for context:

1. The rocminfo command fails during Docker builds

The changes to setup.sh that use rocminfo to detect the GPU architecture will fail during a docker build because the build process doesn't have access to the host's GPUs. When I was building a ROCm-enabled Docker image, I had to add a check to see if rocminfo was available and only use it if it was. The fallback to let cmake auto-detect is good, but the script should probably not fail if rocminfo itself isn't working.

2. The studio's venv installs a CPU-only PyTorch

The unsloth studio setup script creates a virtual environment at ~/.unsloth/studio/.venv. During the pip install phase, it installs a CPU-only version of PyTorch, which overwrites any existing ROCm-enabled PyTorch. This causes the studio's backend to report 'CPU (no GPU backend available)' even if the system PyTorch is correctly installed.

The fix is to force-reinstall the ROCm version of PyTorch into the venv after the setup is complete:

source ~/.unsloth/studio/.venv/bin/activate
pip install --force-reinstall torch --index-url https://download.pytorch.org/whl/rocm7.0

This could be added to the end of the setup.sh script in the ROCm path.

3. The frontend path is incorrect when running from the venv

The studio's launch script, studio/backend/run.py, calculates the path to the frontend files relative to its own location. When the studio is launched from the venv, run.py is in .../site-packages/studio/backend/, so it looks for the frontend at .../site-packages/studio/frontend/dist. However, the frontend is built to /app/unsloth/studio/frontend/dist (in a Docker context) or a similar location in a local install.

This results in a 404 error when trying to access the studio's web UI.

The fix is to either hardcode the correct path in run.py or to create a symlink from the expected location to the actual location. I found the symlink to be the most robust solution:

ln -sf /path/to/unsloth/studio/frontend/dist ~/.unsloth/studio/.venv/lib/python3.12/site-packages/studio/frontend/dist

4. The hardware.py script is missing functions

The studio/backend/utils/hardware/hardware.py script is missing several functions that are imported in studio/backend/utils/hardware/__init__.py. This causes the server to crash on startup with an ImportError.

I had to add stubs for the following functions to get it to work:

  • log_gpu_memory
  • get_gpu_summary
  • get_package_versions
  • get_visible_gpu_count
  • safe_num_proc

And also define the following global variables:

  • DEVICE
  • CHAT_ONLY

Again, this is a great PR, and I'm happy to help with a follow-up to address these additional issues. Let me know what you think.

@edamamez
Copy link
Copy Markdown
Contributor Author

Thanks for the ROCm support -- this is great to have. I pushed two follow-up commits to fix a couple of issues:

1. GPU_BACKEND was never set to "cuda" when nvcc was found (first commit)

The original refactor introduced GPU_BACKEND but only set it to "rocm", so the CUDA branch (if [ "$GPU_BACKEND" = "cuda" ]) was unreachable. Added GPU_BACKEND="cuda" in all three nvcc detection paths.

2. ROCm probe overwrites CUDA on mixed-toolchain hosts (second commit)

On systems with both nvcc and hipcc installed, the ROCm detection ran unconditionally and overwrote GPU_BACKEND="cuda" with "rocm". Wrapped the ROCm detection in if [ -z "$GPU_BACKEND" ] so CUDA takes priority when both are present.

3. Removed stale hardcoded fallback target list (second commit)

The fallback gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100 was already missing current GPUs like MI300X (gfx942) and RDNA3 variants (gfx1101/gfx1102). Upstream llama.cpp supports omitting GPU_TARGETS entirely and will auto-detect, which is safer than shipping a curated list that goes stale. Changed the fallback to leave targets unset and let cmake handle it.

Also factored the duplicated -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc into a single assignment above the target conditional.

Doh, great catches thank you so much!

GoldenGrapeGentleman added a commit to GoldenGrapeGentleman/unsloth that referenced this pull request Mar 19, 2026
pip resolves torch from PyPI during base package installation, pulling
CPU-only wheels regardless of the host GPU.  AMD ROCm users end up with
a venv that cannot use their GPU for training.

Add _ensure_rocm_torch() which runs immediately after base packages:
- detects ROCm via $ROCM_PATH / /opt/rocm / hipcc
- reads the installed version from /opt/rocm/.info/version
- maps (major, minor) to the correct PyTorch wheel index via tuple comparison
- skips if torch is already GPU-enabled (checks both torch.version.hip and
  torch.version.cuda to avoid clobbering CUDA torch on mixed hosts)
- force-reinstalls torch + torchvision + torchaudio from the matched index URL

Tested on 8×AMD MI355X (ROCm 7.1) — version detection, wheel mapping,
and no-op behaviour all verified.

Fixes the issue raised by andyluo7 in unslothai#4390.

Co-authored-by: billishyahao <bill.he@amd.com>
GoldenGrapeGentleman added a commit to GoldenGrapeGentleman/unsloth that referenced this pull request Mar 19, 2026
pip resolves torch from PyPI during base package installation, pulling
CPU-only wheels regardless of the host GPU.  AMD ROCm users end up with
a venv that cannot use their GPU for training.

Add _ensure_rocm_torch() which runs immediately after base packages:
- detects ROCm via $ROCM_PATH / /opt/rocm / hipcc
- reads the installed version from /opt/rocm/.info/version
- maps (major, minor) to the correct PyTorch wheel index via tuple comparison
- skips if torch is already GPU-enabled (checks both torch.version.hip and
  torch.version.cuda to avoid clobbering CUDA torch on mixed hosts)
- force-reinstalls torch + torchvision + torchaudio from the matched index URL

Tested on 8×AMD MI355X (ROCm 7.1) — version detection, wheel mapping,
and no-op behaviour all verified.

Fixes the issue raised by andyluo7 in unslothai#4390.

Co-authored-by: billishyahao <bill.he@amd.com>
GoldenGrapeGentleman added a commit to GoldenGrapeGentleman/unsloth that referenced this pull request Mar 19, 2026
pip resolves torch from PyPI during base package installation, pulling
CPU-only wheels regardless of the host GPU.  AMD ROCm users end up with
a venv that cannot use their GPU for training.

Add _ensure_rocm_torch() which runs immediately after base packages:
- detects ROCm via $ROCM_PATH / /opt/rocm / hipcc
- reads the installed version from /opt/rocm/.info/version
- maps (major, minor) to the correct PyTorch wheel index via tuple comparison
- skips if torch is already GPU-enabled (checks both torch.version.hip and
  torch.version.cuda to avoid clobbering CUDA torch on mixed hosts)
- force-reinstalls torch + torchvision + torchaudio from the matched index URL

Tested on 8×AMD MI355X (ROCm 7.1) — version detection, wheel mapping,
and no-op behaviour all verified.

Fixes the issue raised by andyluo7 in unslothai#4390.

Co-authored-by: billishyahao <bill.he@amd.com>
Copy link
Copy Markdown
Member

@danielhanchen danielhanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The goal of this PR is to add AMD ROCm GPU support to the Studio setup script so that llama.cpp can be built with HIP acceleration on AMD GPUs. As a summary, this PR refactors the GPU detection in studio/setup.sh from "CUDA or CPU" to "CUDA, ROCm, or CPU" by introducing a GPU_BACKEND variable, adding ROCm/hipcc detection with the same fallback pattern as CUDA, and passing -DGGML_HIP=ON plus optional GPU_TARGETS to cmake.

Testing performed:

  • Verified all 6 code paths (CUDA-only, ROCm-only, both, neither, CUDA-driver-only, ROCm-driver-only) produce correct branch selection via isolated mock harness
  • Full CUDA build of llama.cpp succeeded on NVIDIA B200 (compute cap 10.0) with the PR's detection logic
  • Binary linked correctly against CUDA libs, llama-server + llama-quantize both built
  • CUDA path is not broken by this PR -- cmake flags match main exactly
Reviewers Severity Finding
8/8 High ROCM_PATH derived from unresolved hipcc path; symlinked installs export wrong root
7/8 High Forcing hipcc as C/C++ compiler enters llama.cpp legacy path, making GPU_TARGETS detection dead code
2/8 Medium Backend selection based on tool presence not actual GPU; mixed-toolchain hosts may pick wrong backend

The NVIDIA/CUDA and CPU-only paths are confirmed safe. The ROCm-specific issues above should be addressed before merge.

Concrete suggestions for each finding below.

Comment thread studio/setup.sh Outdated
ROCM_ROOT="$(cd "$(dirname "$ROCM_PATH")/.." 2>/dev/null && pwd)"
echo " Building with ROCm support (AMD GPU, hipcc: $ROCM_PATH)..."
CMAKE_ARGS="$CMAKE_ARGS -DGGML_HIP=ON"
export ROCM_PATH="$ROCM_ROOT"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[8/8 reviewers] ROCM_ROOT is derived from the raw command -v hipcc path without resolving symlinks. If hipcc is a symlink or wrapper (common on packaged installs, e.g., /usr/bin/hipcc -> /opt/rocm/bin/hipcc), this computes ROCM_ROOT=/usr instead of the real ROCm prefix, causing find_package(hip) to fail.

Use readlink -f to resolve the real path, and prefer hipconfig -R when available:

Suggested change
export ROCM_PATH="$ROCM_ROOT"
# Resolve hipcc symlinks and derive the real ROCm root
HIPCC_REALPATH="$(readlink -f "$ROCM_PATH" 2>/dev/null || printf '%s' "$ROCM_PATH")"
ROCM_ROOT=""
if command -v hipconfig &>/dev/null; then
ROCM_ROOT="$(hipconfig -R 2>/dev/null || true)"
fi
if [ -z "$ROCM_ROOT" ]; then
ROCM_ROOT="$(cd "$(dirname "$HIPCC_REALPATH")/.." 2>/dev/null && pwd)"
fi
echo " Building with ROCm support (AMD GPU, hipcc: $HIPCC_REALPATH)..."
CMAKE_ARGS="$CMAKE_ARGS -DGGML_HIP=ON"
export ROCM_PATH="$ROCM_ROOT"
export HIP_PATH="$ROCM_ROOT"

Comment thread studio/setup.sh Outdated
[ -n "$_valid_gfx" ] && GPU_TARGETS="$_valid_gfx"
fi

CMAKE_ARGS="$CMAKE_ARGS -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[7/8 reviewers] Forcing -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc has two problems:

  1. In upstream ggml/src/ggml-hip/CMakeLists.txt, when CMAKE_CXX_COMPILER is hipcc, llama.cpp enters its legacy CXX_IS_HIPCC branch and skips forwarding GPU_TARGETS to CMAKE_HIP_ARCHITECTURES. This means the rocminfo detection above is dead code -- targets are never actually limited.

  2. hipcc may fail to compile plain C sources on some ROCm versions, breaking the build.

Upstream llama.cpp docs recommend using HIPCXX/HIP_PATH env vars instead:

Suggested change
CMAKE_ARGS="$CMAKE_ARGS -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc"
# Follow upstream llama.cpp HIP build path (do not force hipcc as C/C++ compiler)
if command -v hipconfig &>/dev/null; then
HIP_CLANG_DIR="$(hipconfig -l 2>/dev/null || true)"
[ -n "$HIP_CLANG_DIR" ] && export HIPCXX="$HIP_CLANG_DIR/clang"
fi

Comment thread studio/setup.sh Outdated
else
echo " Could not detect AMD GPU arch -- building for default targets (cmake will auto-detect)"
fi
elif [ -d /usr/local/cuda ] || command -v nvidia-smi &>/dev/null; then
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2/8 reviewers] Minor: command -v nvidia-smi instead of bare nvidia-smi is actually a good improvement over main (avoids invoking the binary just to check existence). Nice catch.

The ROCm driver-only fallback message is helpful for users who have the kernel driver but not the userspace tools.

@danielhanchen
Copy link
Copy Markdown
Member

Additional findings from extended review (8 independent reviewers)

The inline review above covers the GPU detection changes (the actual PR diff). However, the PR branch is significantly behind main and carries regressions from 4 merged PRs (#4413, #4427, #4447, #4489). These regressions are not part of the ROCm changes themselves -- they exist because the PR was branched before those fixes landed. A rebase onto current main would resolve all of them.

Recommendation: Rebase onto current main, then the ROCm changes (lines 309-413) can be cleanly applied with the fixes from the inline review above. The diff against main after rebase would be much smaller -- just the GPU detection block.

ROCm-specific findings (from the PR diff itself)

Reviewers Severity Finding
8/8 High ROCM_PATH derived from unresolved hipcc path; symlinked installs (e.g., /usr/bin/hipcc) export wrong root, causing find_package(hip) to fail
7/8 High Forcing hipcc as CMAKE_C_COMPILER/CMAKE_CXX_COMPILER enters llama.cpp legacy CXX_IS_HIPCC path, making GPU_TARGETS detection dead code; upstream recommends HIPCXX/HIP_PATH env vars
3/8 High rocminfo grep does not filter by Device Type: GPU context -- APU/iGPU gfx entries may be incorrectly included in GPU_TARGETS
2/8 Medium Backend selection based on tool presence, not actual GPU; hipcc exists on HIP-for-NVIDIA hosts too
2/8 Medium HIP_PATH env var (required by cmake's find_package(HIP)) is never set
1/8 Low ccache CMAKE_CUDA_COMPILER_LAUNCHER flag applied to ROCm builds where it is irrelevant

Regressions from being behind main (resolved by rebasing)

Severity Missing from main
Critical run_quiet_no_exit -- PR uses run_quiet which hard-exits on build failure, making || BUILD_OK=false dead code. Any transient build error kills the entire setup instead of gracefully degrading.
High _SKIP_GGUF_BUILD guard -- WSL users who decline sudo or lack it get a hard script abort
High uv / fast_install helper -- all .venv_t5 installs use bare pip install instead of faster uv
High .venv_t5 package versions stale: huggingface_hub==1.3.0 (main has 1.7.1), hf_xet and tiktoken missing
High Colab venv isolation and broken-ensurepip fallback missing
Medium Frontend caching logic reverted to PyPI-path-only check
Medium REQUESTED_PYTHON_VERSION support removed
Medium CSS output validation check removed
Medium oxc-validator npm install moved inside frontend block without directory guard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants