unsloth support AMD radeon cards#4256
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa005cbb4e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _BOOTSTRAP_COMPLETED = True | ||
| _perform_rocm_bootstrap(runtime) |
There was a problem hiding this comment.
Mark ROCm bootstrap complete only after success
_ensure_rocm_bootstrap sets _BOOTSTRAP_COMPLETED = True before running _perform_rocm_bootstrap(runtime), so any failure during git/pip install leaves the flag permanently true and all later calls in the same process incorrectly skip retrying bootstrap. This can leave ROCm dependencies missing while subsequent install steps proceed as if bootstrap already succeeded.
Useful? React with 👍 / 👎.
| ) | ||
|
|
||
| with _pushd(Path("bitsandbytes")): | ||
| subprocess.check_call(["git", "checkout", "main"]) |
There was a problem hiding this comment.
Pin bitsandbytes to 0.48.2 instead of
main
This block announces and records a 0.48.2 install, but it actually checks out main, which makes builds non-reproducible and can silently pick commits that are incompatible with the rest of the ROCm bootstrap flow. Checking out the specific 0.48.2 tag/commit is needed to match the declared version and avoid drift.
Useful? React with 👍 / 👎.
| pass | ||
|
|
||
| if not rocm_arch: | ||
| rocm_arch = "gfx1201" |
There was a problem hiding this comment.
Respect BNB_ROCM_ARCH before forcing gfx1201 fallback
When rocminfo is unavailable, the code hardcodes rocm_arch = "gfx1201" and later passes that value to CMake, so non-Radeon ROCm systems (for example gfx90a/gfx942) can compile bitsandbytes for the wrong architecture and fail at runtime. Because the fallback is forced, a user-provided BNB_ROCM_ARCH cannot effectively override this path.
Useful? React with 👍 / 👎.
aa005cb to
c5df65d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c5df65d199
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| class RocmExtraInstallCommand(install): # type: ignore[misc] | ||
| def run(self): |
There was a problem hiding this comment.
Register ROCm bootstrap hooks in setuptools config
This commit defines compute_version_string and RocmExtraInstallCommand, but nothing wires either into the package build/install flow (no cmdclass registration and no version hook references these functions), so the ROCm bootstrap path is never invoked during normal installs. In this commit, these symbols are only defined in _build_support.py, which means the new AMD support logic is effectively dead code.
Useful? React with 👍 / 👎.
| refreshed = detect_runtime() | ||
| if not refreshed.has_torch or not refreshed.has_hip: | ||
| raise RuntimeError( |
There was a problem hiding this comment.
Validate ROCm torch support without forced-runtime flags
The post-install guard in _ensure_rocm_torch trusts refreshed.has_hip, but detect_runtime can set that flag from UNSLOTH_FORCE_RUNTIME=hip even when the installed torch wheel has no HIP backend. In the bootstrap flow where forcing HIP is required when torch is initially missing, an incorrect UNSLOTH_BOOTSTRAP_TORCH_ARGS can install a non-ROCm torch and still pass this check, causing later ROCm-dependent builds to fail.
Useful? React with 👍 / 👎.
Replacement for #3740 due to Studio rebasing
Partial recovery note: this replacement preserves only the final visible PR payload that could be safely recovered after Studio rebasing. It does not fully recreate the broader AMD Radeon support work originally described in #3740, such as the wider training, testing, and script coverage.
Original PR description follows.
Gating operator accuracy and performance testing;
SparseMoe-FFN operator accuracy and performance testing;
6.Attempted FP8 precision, which failed on both NVIDIA and AMD GPUs with consistent error messages. Preliminary investigation indicates it's an issue with Unsloth compatibility.