fix(studio): ensure ROCm-enabled torch in venv on AMD hosts#4448
Closed
GoldenGrapeGentleman wants to merge 5 commits into
Closed
fix(studio): ensure ROCm-enabled torch in venv on AMD hosts#4448GoldenGrapeGentleman wants to merge 5 commits into
GoldenGrapeGentleman wants to merge 5 commits into
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
60fc627 to
2f87648
Compare
pip resolves torch from PyPI during base package installation, pulling CPU-only wheels regardless of the host GPU. AMD ROCm users end up with a venv that cannot use their GPU for training. Add _ensure_rocm_torch() which runs immediately after base packages: - detects ROCm via $ROCM_PATH / /opt/rocm / hipcc - reads the installed version from /opt/rocm/.info/version - maps (major, minor) to the correct PyTorch wheel index via tuple comparison - skips if torch is already GPU-enabled (checks both torch.version.hip and torch.version.cuda to avoid clobbering CUDA torch on mixed hosts) - force-reinstalls torch + torchvision + torchaudio from the matched index URL Tested on 8×AMD MI355X (ROCm 7.1) — version detection, wheel mapping, and no-op behaviour all verified. Fixes the issue raised by andyluo7 in unslothai#4390. Co-authored-by: billishyahao <bill.he@amd.com>
e46884c to
e5d58f5
Compare
7 tasks
This was referenced Apr 8, 2026
Closed
Closed
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
During Studio venv setup,
pipresolvestorchfrom PyPI and installs CPU-only wheels regardless of the host GPU. AMD ROCm users end up with a venv that cannot use their GPU for training.First reported by @andyluo7 in #4390.
Fix
Add
_ensure_rocm_torch()called immediately after base packages:$ROCM_PATH//opt/rocm/hipcc/opt/rocm/.info/version(major, minor)→ PyTorch wheel index (ROCm 6.0–7.x covered)torch + torchvision + torchaudiofrom the matched index URLTesting
Verified on 8×AMD MI355X (ROCm 7.1, 288 GB HBM3e):
_rocm_version()returns(7, 1)✅(7,1) → rocm6.3correct ✅Co-authored-by: billishyahao bill.he@amd.com