[Studio] Fix GPU detection for AMD/Intel — add Vulkan VRAM fallback#4874
[Studio] Fix GPU detection for AMD/Intel — add Vulkan VRAM fallback#4874HellBoxyz wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a fallback mechanism for detecting free GPU memory using vulkaninfo, enabling support for AMD, Intel, and other Vulkan-compatible hardware when nvidia-smi is unavailable. The review feedback identifies a logic error in the parsing of vulkaninfo output, where multiple memory heaps are incorrectly treated as distinct GPUs, and provides a more robust implementation that groups heaps by physical device.
| # Split output into per-heap blocks at each " memoryHeaps[N]:" | ||
| # marker, then check each block for DEVICE_LOCAL flag and budget. | ||
| heap_sections = re.split(r"(?=\tmemoryHeaps\[\d+\]:)", output) | ||
| budget_re = re.compile(r"budget\s*=\s*(\d+)") | ||
|
|
||
| gpus: list[tuple[int, int]] = [] | ||
| gpu_idx = 0 | ||
| for section in heap_sections: | ||
| if not section.strip().startswith("memoryHeaps["): | ||
| continue | ||
| if "MEMORY_HEAP_DEVICE_LOCAL_BIT" not in section: | ||
| continue | ||
| budget_m = budget_re.search(section) | ||
| if not budget_m: | ||
| continue | ||
| budget_bytes = int(budget_m.group(1)) | ||
| free_mib = budget_bytes // (1024 * 1024) | ||
| if free_mib > 0: | ||
| gpus.append((gpu_idx, free_mib)) | ||
| gpu_idx += 1 |
There was a problem hiding this comment.
The current parsing logic for vulkaninfo output is not robust for all systems. It treats every device-local memory heap as a separate GPU, which is incorrect for multi-GPU systems or single GPUs that expose multiple device-local heaps. This can lead to misreporting the number of GPUs and their available memory, causing issues with GPU selection and model offloading.
A more robust approach is to group memory heaps by physical device and report the largest available memory budget for each. This ensures that each physical GPU is represented as a single entry with its correct available VRAM.
# Split output by physical device. vulkaninfo typically separates devices
# with headers like "GPU0", "GPU1", etc. on their own lines.
# The lookahead (?=...) keeps the delimiter.
device_sections = re.split(r"(?=^GPU\\d+\\n)", output, flags=re.MULTILINE)
if len(device_sections) > 1:
# Filter out any non-GPU sections (like the header before GPU0)
device_sections = [s for s in device_sections if s.strip().startswith("GPU")]
# If no GPUn headers, device_sections contains the whole output as one element.
budget_re = re.compile(r"budget\\s*=\\s*(\\d+)")
gpus: list[tuple[int, int]] = []
for gpu_idx, device_section in enumerate(device_sections):
# For each physical device, find the largest device-local memory heap budget.
# A single GPU can have multiple device-local heaps.
max_free_mib = 0
heap_sections = re.split(r"(?=\\tmemoryHeaps\\[\\d+\\]:)", device_section)
for section in heap_sections:
if "MEMORY_HEAP_DEVICE_LOCAL_BIT" in section:
budget_m = budget_re.search(section)
if budget_m:
budget_bytes = int(budget_m.group(1))
free_mib = budget_bytes // (1024 * 1024)
if free_mib > max_free_mib:
max_free_mib = free_mib
if max_free_mib > 0:
gpus.append((gpu_idx, max_free_mib))_get_gpu_free_memory() relied exclusively on nvidia-smi, returning an empty list on non-NVIDIA systems. This caused the VRAM-aware context auto-reduction logic to be skipped entirely: models launched with full native context (e.g. 128K+), KV caches spilled into system RAM, and inference performance degraded significantly. Add a vulkaninfo fallback that parses VK_EXT_memory_budget heap data to detect DEVICE_LOCAL VRAM budget on AMD, Intel, and any Vulkan-capable GPU. Handles multi-GPU systems (split by GPU device headers) and GPUs with multiple DEVICE_LOCAL heaps (takes largest budget per device). nvidia-smi retains priority — zero impact on NVIDIA setups. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
23d2dc0 to
a73223d
Compare
|
This is duplicate of #4720 |
* Studio: probe AMD GPUs in llama-server VRAM detection _get_gpu_free_memory in studio/backend/core/inference/llama_cpp.py only queried nvidia-smi. On AMD ROCm hosts that returns nothing, so the GPU list is empty, the auto-fit logic falls into the no-gpus branch, and llama-server gets --fit on with no -ngl to anchor it. The model loads on CPU even though the GPU is detected elsewhere in Studio. Addresses #5106. Add a torch-based fallback that runs after nvidia-smi fails or returns empty: import torch if torch.cuda.is_available() and hasattr(torch.cuda, "mem_get_info"): for ordinal in range(torch.cuda.device_count()): free, _total = torch.cuda.mem_get_info(ordinal) gpus.append((ordinal, free // (1024 * 1024))) Works on AMD because the ROCm torch wheels Studio installs reuse the entire torch.cuda.* namespace via HIP. Also rescues NVIDIA hosts where nvidia-smi is missing from PATH (a secondary cause of the bug on Windows). Matches the convention studio/backend/utils/hardware/hardware.py:412 already uses for the same fallback purpose. Verified locally: nvidia-smi path returns the expected GPU and free MiB; torch fallback returns valid VRAM when nvidia-smi is forced to fail. Note: PR #4874 is a draft taking a different approach (parsing vulkaninfo); the two are complementary. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review feedback on PR #5172 torch.cuda.device_count() enumerates GPUs RELATIVE to the current CUDA_VISIBLE_DEVICES (or HIP_VISIBLE_DEVICES on ROCm). Returning those visible ordinals directly lets _select_gpus rewrite CUDA_VISIBLE_DEVICES with the wrong physical IDs: a process started with CUDA_VISIBLE_DEVICES=2,3 would get its child llama-server relaunched with CUDA_VISIBLE_DEVICES=0,1, targeting the wrong GPUs and violating any scheduler pinning. Translate visible ordinals back through the active CVD/HIP/ROCR mask before returning. Falls through to bare ordinal when no mask is set. Also drop the redundant int() cast on // -- bytes // 2**20 already returns int. Verified: with CUDA_VISIBLE_DEVICES=6 and nvidia-smi forced to fail, the torch fallback now returns (6, free_mib) instead of (0, free_mib). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: fix ROCm visibility precedence + narrow ROCm child env Two reviewer-flagged correctness bugs in the AMD GPU probe path. 1) ROCm visibility precedence was reversed. torch.cuda enumerates GPUs relative to HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES on ROCm builds, but the probe's env-var lookup checked CUDA_VISIBLE_DEVICES first. With CUDA_VISIBLE_DEVICES=0,1 and HIP_VISIBLE_DEVICES=6,7 the probe returned [(0, ...), (1, ...)] when torch's view was actually [(6, ...), (7, ...)]. The wrong physical IDs flowed downstream into CUDA_VISIBLE_DEVICES for the llama-server subprocess, pinning it to GPUs 0,1 instead of 6,7. Fix: branch on torch.version.hip. On ROCm, prefer HIP > ROCR > CUDA (matches torch's own ordering). On NVIDIA, use CUDA only -- ignoring any HIP/ROCR vars the parent happens to have set. 2) Child env narrowing only set CUDA_VISIBLE_DEVICES. On ROCm, llama-server honors HIP/ROCR; if the parent shell exported HIP_VISIBLE_DEVICES=4,5 and the selector picked just GPU 4, the child still saw both because we never narrowed HIP/ROCR. Now we set all three on ROCm so the AMD subprocess actually sees the planned subset. Both branches verified via temp/pr_simulation/sim_5172_rocm_precedence.py (7/7 cases pass), including the reviewer's verbatim R5 case (CVD=0,1 + HIP/ROCR=6,7). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: sort GPU probe result + honor explicitly empty ROCm masks Two reviewer-flagged correctness nits on top of eff55fb. 1) Gemini medium: the torch fallback returned an unsorted list when the visibility mask was non-sequential (e.g. CUDA_VISIBLE_DEVICES=5,2,9), diverging from the docstring guarantee and the nvidia-smi path. Now sorted by physical id. 2) Codex P2: an explicitly empty HIP_VISIBLE_DEVICES="" should mean "no GPUs" per the codebase convention in utils/hardware/hardware.py::_get_parent_visible_gpu_spec. The previous `or` chain treated empty string as falsy and silently fell through to ROCR / CUDA, producing wrong physical IDs. Switch to `is not None` checks to match. Verified via sim_5172_rocm_precedence.py (9/9 cases pass) including the two new R8 (sort) and R9 (empty-HIP honored) cases. * Studio: align nvidia-smi probe with torch fallback (sort + robust CVD) Two follow-up Gemini-medium nits on PR #5172. 1) Fragile CVD parsing on the nvidia-smi path: `cvd.split(",")` would raise ValueError on a trailing comma like "0,1," because the empty trailing token is not skipped. The torch fallback already filters empty tokens via `if x.strip()`; mirror that here. 2) Missing sort guarantee on the nvidia-smi path: the docstring promises sort-by-id, the torch fallback now sorts, but the nvidia-smi path relied on driver enumeration order. Add an explicit sort. Both changes match what shipped in 6b1cccd for the torch fallback, so the two probe paths now have identical CVD parsing + ordering semantics. * Studio: drop cvd.strip() truthiness so empty CVD filters all GPUs Reviewer-flagged correctness bug. The previous `if cvd is not None and cvd.strip():` guard treated `CUDA_VISIBLE_DEVICES=""` as if the variable were unset, leaving `allowed=None` (and `physical_ids=None` on the torch path). On the nvidia-smi path that mattered: nvidia-smi ignores CVD entirely, so the probe's `allowed` filter is the only thing that respects the parent's "no GPUs" intent. Pre-fix the probe returned every physical GPU when the parent had explicitly hidden them. Drop the `.strip()` truthiness check on both paths. The downstream `if x.strip()` token filter still keeps trailing-comma masks like "0,1," safe, and an empty mask now produces an empty allowed/physical set as expected (matching utils/hardware/hardware.py convention). Verified via sim_5172_rocm_precedence.py R10 + R11 (now 11/11 cases pass): nvidia-smi path with `CUDA_VISIBLE_DEVICES=""` now returns [] instead of leaking the hidden GPUs. * Studio: log ROCm env-var failures instead of silently swallowing Reviewer-flagged defensive logging gap. The bare `except Exception: pass` around the HIP/ROCR env-var assignment would mask anything from a missing torch import to an unexpected version object shape. Log at debug so a failed AMD child-env narrowing is at least traceable. Behavior is unchanged: torch missing or version probe failing still leaves the child with only CUDA_VISIBLE_DEVICES set. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Problem
Unsloth Studio doesn't detect GPU on AMD/Intel systems. The VRAM detection (
_get_gpu_free_memory()) uses onlynvidia-smi, so on non-NVIDIA hardware it returns an empty list. This means:Fix
Add a
vulkaninfofallback that kicks in whennvidia-smiis not available:VK_EXT_memory_budget)Before / After
Before (AMD GPU):
After (AMD GPU):
Tested on
-ngl -1