Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
184 commits
Select commit Hold shift + click to select a range
0539621
fix(studio): set HIP_VISIBLE_DEVICES in apply_gpu_ids for ROCm traini…
LeoBorcherding May 5, 2026
14fccde
test: tighten apply_gpu_ids ROCm fallback assertions
LeoBorcherding May 5, 2026
e87c90f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 6, 2026
74c871d
fix: detect ROCm unified memory (Strix Halo / AMD iGPU) via torch fal…
LeoBorcherding May 6, 2026
0d58e42
Apply unified-memory reconciliation in get_gpu_utilization too
danielhanchen May 6, 2026
cb0edfc
Use 'is not None' and log debug on torch.version.hip probe failures
danielhanchen May 6, 2026
9a83a74
fix(studio): honour HIP_VISIBLE_DEVICES in _get_parent_visible_gpu_sp…
LeoBorcherding May 6, 2026
8332bb7
Merge fix/5180-hip-visible-devices-worker into fix/rocm-strix-halo-un…
LeoBorcherding May 6, 2026
22a0d6b
Merge remote-tracking branch 'origin/fix/rocm-strix-halo-unified-memo…
LeoBorcherding May 6, 2026
4e7a083
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 6, 2026
9bcd0ed
fix(install): harden AMD ROCm GPU detection for multi-GPU and env-fil…
LeoBorcherding May 6, 2026
3241eb3
Fix KFD sysfs awk fallback to read properties file
danielhanchen May 6, 2026
d2da8ce
fix(setup.ps1): detect AMD ROCm GPU on Windows, bring to parity with …
LeoBorcherding May 6, 2026
f84a723
fix(install.ps1): detect AMD ROCm GPU on Windows, bring to parity wit…
LeoBorcherding May 6, 2026
5d5ae56
fix(install.ps1): suppress 'No NVIDIA GPU detected' when AMD GPU is p…
LeoBorcherding May 6, 2026
270b2dd
feat: add Windows AMD ROCm PyTorch wheel installation
LeoBorcherding May 6, 2026
14f6559
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 6, 2026
8299842
fix: also install torchvision and torchaudio from AMD Windows repo
LeoBorcherding May 6, 2026
ec40e9f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 6, 2026
7e29435
feat: add ROCm 7.1.1 Windows wheel mapping
LeoBorcherding May 6, 2026
e53b543
fix: install rocm_sdk_core and rocm_sdk_libraries_custom alongside torch
LeoBorcherding May 6, 2026
a74cb8b
fix: expand ROCm wheel array to scalars for Invoke-InstallCommand
LeoBorcherding May 6, 2026
79670ab
fix: use --no-deps for AMD Windows torch wheel install
LeoBorcherding May 6, 2026
b550948
fix: setup.ps1 and install_python_stack.py now install ROCm torch on …
LeoBorcherding May 7, 2026
6f79213
fix: suppress manual-install warning when ROCm torch already present;…
LeoBorcherding May 7, 2026
b9c6882
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 7, 2026
67d8b74
feat: add rocm step display in setup.ps1; fix warning and progress co…
LeoBorcherding May 7, 2026
f036ee0
fix: detect AMD SDK ROCm torch via __version__ when torch.version.hip…
LeoBorcherding May 7, 2026
7d3de8b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 7, 2026
77f7ade
perf: drop --no-cache-dir from AMD ROCm torch wheel installs
LeoBorcherding May 7, 2026
9439657
fix: use install-state flag instead of subprocess probe for AMD Windo…
LeoBorcherding May 7, 2026
4b2f7fb
fix: hoist global declaration to top of _ensure_rocm_torch
LeoBorcherding May 7, 2026
7fbdce1
fix: pass AMD torch install status via env var to suppress false warning
LeoBorcherding May 7, 2026
f09a424
fix: register ROCm DLL directory before torch import on Windows
LeoBorcherding May 7, 2026
05f5cda
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 7, 2026
0facfbc
fix: remove hardcoded non-standard ROCm paths from DLL directory scan
LeoBorcherding May 7, 2026
cc77737
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 7, 2026
efcaccb
fix: prevent torchao overrides step from overwriting AMD ROCm torch
LeoBorcherding May 7, 2026
301d6c0
fix: add rocm_sdk namespace tarball to Windows ROCm wheel installs
LeoBorcherding May 7, 2026
6fe91e7
feat: enable ROCm 7.2 torch install + warn on gfx1151 with ROCm < 7.2
LeoBorcherding May 7, 2026
1680dac
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 7, 2026
550317d
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 7, 2026
5deb230
fix: prefer Python 3.12 for AMD ROCm users when 3.13 is also installed
LeoBorcherding May 8, 2026
bafb3f5
fix: also check uv-managed Python 3.12 for AMD ROCm #5301
LeoBorcherding May 8, 2026
2de2c29
fix: hide amd-smi console popups on Windows, guard torch.distributed.…
LeoBorcherding May 8, 2026
ba6b279
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 8, 2026
1707169
fix: suppress remaining console popups on Windows, patch torch.distri…
LeoBorcherding May 8, 2026
a704722
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 8, 2026
739e5d5
fix: stub all missing torch.distributed attrs for ROCm Windows wheel …
LeoBorcherding May 8, 2026
18690c6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 8, 2026
bfac7c1
fix: inject torch.distributed stub when C backend missing in ROCm Win…
LeoBorcherding May 8, 2026
a288ff2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 8, 2026
4d401e7
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 10, 2026
fe5546d
fix(rocm/windows): pre-stub torch._C._distributed_c10d + raise amd-sm…
LeoBorcherding May 11, 2026
85841b5
fix(rocm): guard c10d stub, fix TorchIndexFamily for 7.1, clean dead …
LeoBorcherding May 11, 2026
f892b65
fix(tests): match windows AMD warning assertion to actual source string
LeoBorcherding May 11, 2026
42c5d98
chore: trim verbose comment blocks across all ROCm-related files
LeoBorcherding May 11, 2026
265a09a
fix: guard reconcile call against None numeric_ids; add torchvision l…
LeoBorcherding May 11, 2026
643a797
fix(install.ps1): recreate venv with Python 3.12 after ROCm switch
LeoBorcherding May 11, 2026
7b38f0a
ux: detect AMD GPU before Python selection to avoid double venv creation
LeoBorcherding May 11, 2026
1ba91d7
fix(rocm/win): auto-stub all _distributed_c10d symbols via PEP-562 __…
LeoBorcherding May 11, 2026
fac005b
chore: trim c10d stub comment
LeoBorcherding May 11, 2026
a3d9bac
fix(rocm/win): auto-stub missing torch.distributed attrs (Store, Proc…
LeoBorcherding May 11, 2026
73ae40c
fix(rocm/win): pre-stub fsdp submodules in sys.modules; fix __getattr…
LeoBorcherding May 11, 2026
ea510b5
feat(rocm/win): arch-aware wheel selector always picks newest ROCm re…
LeoBorcherding May 11, 2026
4d09cbb
fix(rocm/win): stub class metaclass for ProcessGroup.BackendType; amd…
LeoBorcherding May 11, 2026
e64c196
fix: stub __members__ so torchao float8 enum check doesn't crash on R…
LeoBorcherding May 11, 2026
26f073d
fix: stub distributed tensor/functional_collectives to prevent missin…
LeoBorcherding May 11, 2026
b073201
fix: give mod stubs __path__ and pre-stub _tensor to fix 'not a packa…
LeoBorcherding May 11, 2026
ce9098a
fix: stub torch.ops._c10d_functional namespace with hashable op senti…
LeoBorcherding May 11, 2026
e778e0e
fix: stub entire torchao package on ROCm Windows instead of individua…
LeoBorcherding May 11, 2026
3e57133
fix: set __spec__ on mod stubs so importlib.util.find_spec doesn't raise
LeoBorcherding May 11, 2026
cf10215
fix: add meta path finder to auto-stub subpackages of stub modules
LeoBorcherding May 11, 2026
3264319
fix: use _unsloth_stub sentinel instead of loader=None for stub detec…
LeoBorcherding May 11, 2026
d731c5f
refactor(rocm/win): switch to repo.amd.com arch-aware index, remove s…
LeoBorcherding May 14, 2026
9c9d462
fix(rocm/win): restore _distributed_c10d + torchao stubs; fix BNB ins…
LeoBorcherding May 14, 2026
48406ad
worker: remove _distributed_c10d stub; stub only torchao
LeoBorcherding May 14, 2026
f5278de
fix: BNB AMD wheel skipped + torch.compile segfault on Windows ROCm
LeoBorcherding May 14, 2026
a4483df
fix: BNB AMD wheel install fails uv wheel filename check
LeoBorcherding May 14, 2026
a3c94e7
worker: patch _grouped_mm CUDA dispatch on Windows ROCm (gfx1200 null…
LeoBorcherding May 14, 2026
a87077e
worker: fix torchao stub — return stub classes not modules for isinst…
LeoBorcherding May 14, 2026
769790e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 14, 2026
1d30fb5
Merge branch 'main' into fix/rocm-strix-halo-unified-memory
Imagineer99 May 14, 2026
d91fced
tests: add coverage for Windows ROCm install paths and worker patches
LeoBorcherding May 15, 2026
324f56c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2026
4d41efc
tests: fix encoding, IS_WINDOWS patching, and wrong assertion
LeoBorcherding May 15, 2026
5b6adbe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2026
d5c3c7b
fix: pin BNB_ROCM_VERSION=72 for torch==2.11.0+rocm7.13.0 compatibility
LeoBorcherding May 15, 2026
f95cb20
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2026
c55aaa5
fix: detect BNB ROCm DLL suffix dynamically instead of hardcoding '72'
LeoBorcherding May 15, 2026
b313a48
fix: patch torch.distributed stubs in server process for Windows ROCm
LeoBorcherding May 15, 2026
b33a90e
fix: gate _grouped_mm dispatch patch on HIP < 7.13
LeoBorcherding May 15, 2026
75ef599
fix: stub is_torchelastic_launched on torch.distributed for Windows ROCm
LeoBorcherding May 15, 2026
1db8e49
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 15, 2026
370debe
fix: explicit warnings on AMD ROCm arch/version fallbacks + Fast-Inst…
LeoBorcherding May 16, 2026
7a5e93b
fix: robust gfx arch detection for Strix Halo / HIP-runtime-only inst…
LeoBorcherding May 16, 2026
ffa16f0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2026
2befb57
fix: resolve hipinfo/hipconfig via HIP_PATH/ROCM_PATH when not on PATH
LeoBorcherding May 16, 2026
116fa6e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2026
ae6d042
feat: print HIP SDK path and full hipconfig version in terminal on AM…
LeoBorcherding May 16, 2026
bbf004c
fix: Strix rocm7.1 segfault bypass + Ubuntu 24.04 HIP gcc-install-dir
LeoBorcherding May 16, 2026
f3ac63f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2026
cc36e4b
fix: BNB_ROCM_VERSION in server process + torch._C._distributed_c10d …
LeoBorcherding May 16, 2026
f0ec030
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2026
6831c2a
fix(win32): populate distributed c10d stub with dummy symbols
LeoBorcherding May 16, 2026
39ae2e8
fix(win32): distinguish HIP SDK installed vs GPU not ROCm-accessible
LeoBorcherding May 16, 2026
4e75d42
fix(win32): scope ROCm workarounds to AMD hosts only
LeoBorcherding May 16, 2026
84b8456
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 16, 2026
d89d9b1
fix(linux): route Strix + ROCm 7.1 to AMD arch-specific index
LeoBorcherding May 16, 2026
692e876
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 18, 2026
0c2020d
fix(studio/rocm): gate ROCm-only side-effects on active torch runtime
danielhanchen May 19, 2026
0b0b8df
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 19, 2026
76137b2
fix(studio/rocm): worker.py parity + don't roll back ROCm torch on bn…
danielhanchen May 19, 2026
0be9749
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 19, 2026
2177321
fix(studio/rocm): robustness pass - rocm tag normalisation, Strix rou…
danielhanchen May 19, 2026
96b9e46
fix(studio/rocm): multi-GPU selection, Strix sibling handling, defens…
danielhanchen May 19, 2026
825cbf5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 19, 2026
8c30241
fix(studio/rocm): worker BNB/grouped_mm broad gate, install.sh Strix …
danielhanchen May 19, 2026
e6cc98e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 19, 2026
06f28e4
fix(studio/rocm): code review hardening pass
LeoBorcherding May 19, 2026
bb37bf4
Merge remote-tracking branch 'upstream/main' into fix/rocm-strix-halo…
LeoBorcherding May 19, 2026
47fdc85
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 19, 2026
d663d12
fix(studio/training): GPU OOM guard to prevent system freeze on VRAM …
LeoBorcherding May 19, 2026
888f91d
Merge branch 'fix/rocm-strix-halo-unified-memory' of github.com:LeoBo…
LeoBorcherding May 19, 2026
e953b90
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 20, 2026
536a54d
fix(studio/rocm): OOM guard ROCm-only + unified memory, multi-GPU arc…
LeoBorcherding May 21, 2026
ec021a0
fix(tests): update ROCm version cap expectations from rocm7.1 to rocm7.2
LeoBorcherding May 21, 2026
90f6cd4
fix(tests): correct MLX smoke test losses_per_step assertion
LeoBorcherding May 21, 2026
792c3a0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 21, 2026
5d84704
fix(studio/worker): detect unified-memory APU by GPU name not VRAM/RA…
LeoBorcherding May 21, 2026
67ab0a6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 21, 2026
3245793
fix(install/setup.ps1): force array on hipinfo gcnArchName parse to f…
LeoBorcherding May 21, 2026
89140c5
Merge remote-tracking branch 'upstream/main' into fix/rocm-strix-halo…
LeoBorcherding May 21, 2026
9c50ee3
fix(studio/rocm): classify unified-memory APU via VRAM/RAM ratio, not…
LeoBorcherding May 21, 2026
9393fff
fix(studio/rocm): revert to gcnArchName for unified-memory APU classi…
LeoBorcherding May 21, 2026
86d8ff0
fix(studio/llama-prebuilt): resolve hipinfo via HIP_PATH/ROCM_PATH on…
LeoBorcherding May 21, 2026
143f6f3
fix(studio/llama-prebuilt): pass --has-rocm from setup.ps1 to skip re…
LeoBorcherding May 21, 2026
d0864e8
fix(studio/llama-prebuilt): add HIP asset to simple-policy Windows path
LeoBorcherding May 21, 2026
a0baf8f
fix(studio/setup.ps1): auto-remove mismatched llama.cpp install kind
LeoBorcherding May 21, 2026
2bca6ee
fix(studio/setup.ps1): show live PyTorch install output in verbose mo…
LeoBorcherding May 21, 2026
c6a90de
fix(rocm/windows): set ROCBLAS_TENSILE_LIBPATH for bundled rocblas.dll
LeoBorcherding May 22, 2026
2712a6d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 22, 2026
eeee665
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 23, 2026
889b33c
fix(install.sh): restore gfx token dedup in Strix multi-GPU awk indexer
LeoBorcherding May 24, 2026
0fda1e2
fix(studio/install): correct _TOTAL progress count on Windows
LeoBorcherding May 24, 2026
688c508
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 24, 2026
284145a
fix(install.ps1): enforce torch>=2.11.0 for gfx120X and Strix on Windows
LeoBorcherding May 24, 2026
8983afd
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 26, 2026
85bbb03
fix(rocm/windows): address Codex nits - deterministic DLL suffix, CUD…
LeoBorcherding May 26, 2026
69b582c
fix(rocm): misleading amd-smi log, BNB spec consistency, torch ceilin…
LeoBorcherding May 26, 2026
5c72e64
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 26, 2026
0763a99
fix(rocm): torch floor in setup.ps1, torchvision pin for Strix, rocms…
LeoBorcherding May 26, 2026
ad9ea00
fix(rocm): warn on OOB HIP_VISIBLE_DEVICES, bail on empty numeric_ids…
LeoBorcherding May 26, 2026
94a7a03
fix(rocm): gate StubSubpackageFinder on win32 ROCm, add gcnArchName f…
LeoBorcherding May 26, 2026
38acd5b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 26, 2026
80dd40e
fix(rocm): pin torchvision/torchaudio in setup.ps1, remove -Unique fr…
LeoBorcherding May 26, 2026
59825be
fix(rocm): add 8060s/8050s to OOM guard device-name fallback, extract…
LeoBorcherding May 26, 2026
4ecf797
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 26, 2026
62e18d8
fix(rocm): pass explicit dtype on bf16-unsupported hardware (RDNA2)
LeoBorcherding May 26, 2026
3244537
fix: reduce log noise for expected non-issues on Windows ROCm
LeoBorcherding May 26, 2026
30eb2d9
Merge remote-tracking branch 'upstream/main' into fix/rocm-strix-halo…
LeoBorcherding May 27, 2026
f5c2e8a
[AMD] FIx installation of bitsandbytes when it's from .dev and skip r…
Erland366 May 27, 2026
927a9a6
Merge Erland/studio-amd-installer-fixes-redone: fix bnb ROCm install …
LeoBorcherding May 27, 2026
2ec5d00
fix: use force_pip for Windows ROCm bitsandbytes prebuilt wheel install
LeoBorcherding May 27, 2026
7be61bb
fix: three small correctness fixes found in PR review
LeoBorcherding May 28, 2026
c074848
Merge remote-tracking branch 'upstream/main' into fix/rocm-strix-halo…
LeoBorcherding May 28, 2026
51d82da
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 28, 2026
bd1162a
fix: stub torchao in export subprocess on Windows ROCm
LeoBorcherding May 28, 2026
b3a8792
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 28, 2026
3a19990
install.sh, setup.sh: add GPU arch step logging to match PS1 scripts
LeoBorcherding May 29, 2026
c8c60ab
Fix BNB_ROCM_VERSION gate, ROCm GPU mask preference, APU unified memo…
shimmyshimmer May 29, 2026
57165b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 29, 2026
afb343f
fix: guard recompile_limit + fix AMD VRAM monitor fallback
LeoBorcherding May 29, 2026
3c65ef6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 29, 2026
a0a8b02
fix: Windows VRAM monitor via Performance Counter API
LeoBorcherding May 29, 2026
82dad78
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 29, 2026
8519d41
fix: rename to _rocm_windows_perf_counter_vram_gb, scope to IS_ROCM
LeoBorcherding May 29, 2026
0c2f582
fix: AMD VRAM monitor — Linux DRM sysfs + Windows perf counter
LeoBorcherding May 29, 2026
32fd3c4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 29, 2026
b8b230c
Merge branch 'unslothai:main' into fix/rocm-strix-halo-unified-memory
LeoBorcherding May 30, 2026
f5a4e3c
fix: AMD GPU monitor — utilization, temperature, and power for Window…
LeoBorcherding May 30, 2026
da6469c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 30, 2026
2711607
fix: remove ADL ctypes — does not support AMD iGPU (Strix Halo)
LeoBorcherding May 30, 2026
5e99a47
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
264 changes: 259 additions & 5 deletions install.ps1

Large diffs are not rendered by default.

237 changes: 203 additions & 34 deletions install.sh

Large diffs are not rendered by default.

97 changes: 97 additions & 0 deletions studio/backend/core/export/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -439,6 +439,103 @@ def run_export_process(
'Install for better performance: pip install "triton-windows<3.7"'
)

# ── 1c. Stub torchao on Windows ROCm ──
# torchao (pulled in by transformers.quantizers) imports
# torch.distributed._functional_collectives at module level, which imports
# distributed_c10d.py unconditionally — that file crashes on Windows ROCm
# because torch._C._distributed_c10d (the RCCL backend) is absent.
# Stubbing torchao short-circuits the crash entirely.
# Must run before any import of transformers / unsloth_zoo.
import types as _types
import importlib.machinery as _ilm
import importlib.abc as _ilabc

_STUB_SENTINEL = object()

class _StubTypeMeta(type):
def __instancecheck__(cls, instance):
return False

def __subclasscheck__(cls, subclass):
return False

def __getattr__(cls, attr):
if attr.startswith("__"):
raise AttributeError(attr)
child = _StubTypeMeta(attr, (), {})
setattr(cls, attr, child)
return child

def __call__(cls, *args, **kwargs):
return None

def _make_stub_type(name):
return _StubTypeMeta(name, (), {})

def _make_mod_stub(mod_name):
m = _types.ModuleType(mod_name)
m.__path__ = []
m.__package__ = mod_name
m._unsloth_stub = _STUB_SENTINEL
m.__spec__ = _ilm.ModuleSpec(mod_name, loader = None, is_package = True)

def _ga(attr, _m = m, _n = mod_name):
if attr.startswith("__"):
raise AttributeError(attr)
child = _make_stub_type(f"{_n}.{attr}")
setattr(_m, attr, child)
return child

m.__getattr__ = _ga
return m

class _StubSubpackageLoader(_ilabc.Loader):
def __init__(self, mod_name):
self._mod_name = mod_name

def create_module(self, spec):
return _make_mod_stub(self._mod_name)

def exec_module(self, module):
pass

class _StubSubpackageFinder(_ilabc.MetaPathFinder):
def find_spec(self, fullname, path, target = None):
if "." not in fullname:
return None
parent = sys.modules.get(fullname.rsplit(".", 1)[0])
if parent is None:
return None
if getattr(parent, "_unsloth_stub", None) is not _STUB_SENTINEL:
return None
return _ilm.ModuleSpec(
fullname, _StubSubpackageLoader(fullname), is_package = True
)

_is_win32_rocm = False
if sys.platform == "win32":
try:
import torch as _torch_probe

_is_win32_rocm = bool(
getattr(getattr(_torch_probe, "version", None), "hip", None)
or "rocm" in getattr(_torch_probe, "__version__", "").lower()
)
del _torch_probe
except Exception:
pass
if _is_win32_rocm:
sys.meta_path.append(_StubSubpackageFinder())
for _tao_name in (
"torchao",
"torchao.quantization",
"torchao.dtypes",
"torchao.float8",
"torchao.utils",
):
if _tao_name not in sys.modules:
sys.modules[_tao_name] = _make_mod_stub(_tao_name)

# ── 2. Import ML libraries (fresh in this clean process) ──
try:
_send_response(
Expand Down
53 changes: 53 additions & 0 deletions studio/backend/core/inference/llama_cpp.py
Original file line number Diff line number Diff line change
Expand Up @@ -1238,6 +1238,33 @@ def _get_gguf_size_bytes(model_path: str) -> int:

return total

@staticmethod
def _amd_apu_wants_unified_memory() -> bool:
"""True only for AMD unified-memory APUs (gfx1150/gfx1151), where
GGML_CUDA_ENABLE_UNIFIED_MEMORY lets llama.cpp use shared system RAM.
False for discrete AMD, NVIDIA, CPU and macOS (the env hurts discrete
GPUs). ROCm reuses torch.cuda.*; the gcnArchName suffix is stripped."""
try:
import torch

if getattr(torch.version, "hip", None) is None:
return False
if not (hasattr(torch, "cuda") and torch.cuda.is_available()):
return False
for _i in range(torch.cuda.device_count()):
try:
_arch = (
getattr(torch.cuda.get_device_properties(_i), "gcnArchName", "")
or ""
)
except Exception:
continue
if _arch.split(":")[0].strip().lower() in {"gfx1150", "gfx1151"}:
return True
except Exception:
return False
return False

@staticmethod
def _get_gpu_free_memory() -> list[tuple[int, int]]:
"""Query free memory per GPU.
Expand Down Expand Up @@ -3158,6 +3185,14 @@ def load_model(
env = child_env_without_native_path_secret()
binary_dir = str(Path(binary).parent)

# AMD unified-memory APUs (gfx1150/gfx1151): let llama.cpp use
# shared system RAM. setdefault so a user value wins.
if self._amd_apu_wants_unified_memory():
env.setdefault("GGML_CUDA_ENABLE_UNIFIED_MEMORY", "1")
logger.info(
"AMD unified-memory APU: set GGML_CUDA_ENABLE_UNIFIED_MEMORY=1"
)

if sys.platform == "win32":
# See _build_windows_path_dirs for ordering. #5106.
path_dirs = self._build_windows_path_dirs(
Expand All @@ -3167,6 +3202,24 @@ def load_model(
)
existing_path = env.get("PATH", "")
env["PATH"] = ";".join(path_dirs) + ";" + existing_path

# ROCm: the llama.cpp prebuilt bundles its own rocblas.dll
# but NOT the Tensile kernel library files it needs
# (rocblas/library/TensileLibrary*.dat + *.hsaco). The
# bundled DLL searches relative to its own location by
# default (i.e. <binary_dir>/rocblas/library/) which does
# not exist, causing a silent crash on the first GEMM.
# ROCBLAS_TENSILE_LIBPATH overrides that search to point at
# the ROCm installation where the kernel files actually are.
_hip_path = os.environ.get(
"HIP_PATH", os.environ.get("ROCM_PATH", "")
)
if _hip_path:
_rocblas_lib = os.path.join(
_hip_path, "bin", "rocblas", "library"
)
if os.path.isdir(_rocblas_lib):
env.setdefault("ROCBLAS_TENSILE_LIBPATH", _rocblas_lib)
else:
# Linux: set LD_LIBRARY_PATH for shared libs next to the binary
# and CUDA runtime libs (libcudart, libcublas, etc.)
Expand Down
26 changes: 19 additions & 7 deletions studio/backend/core/training/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,10 @@
get_visible_gpu_count,
)

torch._dynamo.config.recompile_limit = 64
# recompile_limit was removed in some ROCm torch builds (e.g. pytorch.org/whl/rocm6.2).
# Guard so training doesn't crash on RDNA2/RDNA3 with older ROCm torch wheels.
if hasattr(torch._dynamo.config, "recompile_limit"):
torch._dynamo.config.recompile_limit = 64
from unsloth import FastLanguageModel, FastVisionModel, is_bfloat16_supported
from unsloth.chat_templates import get_chat_template

Expand Down Expand Up @@ -657,6 +660,15 @@ def load_model(
f"Using device_map='{device_map}' ({get_visible_gpu_count()} GPU(s) visible)"
)

# On hardware without native bfloat16 support (e.g. RDNA2 / gfx103x),
# passing dtype=None lets unsloth auto-detect and incorrectly choose
# bf16, triggering an LLVM error at the first bf16 kernel dispatch.
# Explicitly pass float16 as the fallback so unsloth never reaches
# that path. Modern NVIDIA (Ampere+) and RDNA3+ return True here so
# they are unaffected — dtype stays None and unsloth picks bf16 as
# before.
_auto_dtype = None if is_bfloat16_supported() else torch.float16

# Branch based on model type
if self._audio_type == "csm":
# CSM: FastModel + auto_model=CsmForConditionalGeneration + load_in_4bit=False
Expand All @@ -666,7 +678,7 @@ def load_model(
self.model, self.tokenizer = FastModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = None,
dtype = _auto_dtype,
auto_model = CsmForConditionalGeneration,
load_in_4bit = False,
device_map = device_map,
Expand All @@ -683,7 +695,7 @@ def load_model(

self.model, self.tokenizer = FastModel.from_pretrained(
model_name = model_name,
dtype = None,
dtype = _auto_dtype,
load_in_4bit = False,
device_map = device_map,
full_finetuning = full_finetuning,
Expand All @@ -705,7 +717,7 @@ def load_model(
self.model, self.tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = None,
dtype = _auto_dtype,
load_in_4bit = load_in_4bit,
device_map = device_map,
full_finetuning = full_finetuning,
Expand Down Expand Up @@ -777,7 +789,7 @@ def load_model(
self.model, self.tokenizer = FastModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = None,
dtype = _auto_dtype,
load_in_4bit = load_in_4bit,
device_map = device_map,
full_finetuning = full_finetuning,
Expand All @@ -791,7 +803,7 @@ def load_model(
self.model, self.tokenizer = FastVisionModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = None, # Auto-detect
dtype = _auto_dtype,
load_in_4bit = load_in_4bit,
device_map = device_map,
full_finetuning = full_finetuning,
Expand Down Expand Up @@ -824,7 +836,7 @@ def load_model(
self.model, self.tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = None, # Auto-detect
dtype = _auto_dtype,
load_in_4bit = load_in_4bit,
device_map = device_map,
full_finetuning = full_finetuning,
Expand Down
Loading
Loading