enable studio for intel GPU by leizhenyuan · Pull Request #4724 · unslothai/unsloth

leizhenyuan · 2026-03-31T11:19:25Z

No description provided.

chatgpt-codex-connector · 2026-03-31T11:19:29Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

gemini-code-assist

Code Review

This pull request implements support for Intel XPU (Intel GPUs) across the backend by replacing hardcoded CUDA logic with hardware-agnostic utility functions. It adds XPU detection, utilization tracking via xpu-smi, and environment variable handling for device visibility. Feedback highlights a logic error in parsing xpu-smi output where memory utilization is incorrectly assigned to temperature, a lack of multi-GPU support in the monitoring command, and an inconsistency in the version reporting format for XPU.

gemini-code-assist · 2026-03-31T11:21:38Z

+def _get_xpu_utilization() -> Dict[str, Any]:
+    """Return a live snapshot of Intel XPU GPU utilization via ``xpu-smi`` or torch.xpu."""
+    try:
+        import subprocess
+
+        result = subprocess.run(
+            ["xpu-smi", "dump", "-d", "0", "-m", "0,1,2,18"],
+            capture_output = True,
+            text = True,
+            timeout = 5,
+        )
+        if result.returncode == 0 and result.stdout.strip():
+            # xpu-smi dump outputs CSV: Timestamp, DeviceId, GPU Utilization (%), ...
+            lines = result.stdout.strip().splitlines()
+            for line in reversed(lines):
+                if line.startswith("Timestamp") or line.startswith("#"):
+                    continue
+                parts = [p.strip() for p in line.split(",")]
+                if len(parts) >= 4:
+                    gpu_util = float(parts[2]) if parts[2] not in ("", "N/A") else None
+                    temp = float(parts[3]) if parts[3] not in ("", "N/A") else None
+                    break
+            else:
+                gpu_util = None
+                temp = None
+        else:
+            gpu_util = None
+            temp = None
+    except Exception:
+        gpu_util = None
+        temp = None
+
+    # Get VRAM from torch.xpu
+    vram_used_gb = None
+    vram_total_gb = None
+    try:
+        import torch
+
+        idx = torch.xpu.current_device()
+        props = torch.xpu.get_device_properties(idx)
+        vram_total_gb = round(props.total_memory / (1024**3), 2)
+        vram_used_gb = round(torch.xpu.memory_allocated(idx) / (1024**3), 2)
+    except Exception:
+        pass
+
+    vram_pct = (
+        round((vram_used_gb / vram_total_gb) * 100, 1)
+        if vram_used_gb is not None and vram_total_gb and vram_total_gb > 0
+        else None
+    )
+
+    has_any = any(v is not None for v in [gpu_util, temp, vram_used_gb])
+    if not has_any:
+        return {"available": False, "backend": "xpu"}
+
+    return {
+        "available": True,
+        "backend": "xpu",
+        "gpu_utilization_pct": gpu_util,
+        "temperature_c": temp,
+        "vram_used_gb": vram_used_gb,
+        "vram_total_gb": vram_total_gb,
+        "vram_utilization_pct": vram_pct,
+        "power_draw_w": None,
+        "power_limit_w": None,


The _get_xpu_utilization function contains a logic error in metric parsing and lacks support for multi-GPU configurations:

Metric Index Bug: With the requested metrics -m 0,1,2,18, the CSV output indices are: 2 (GPU Util), 3 (GPU Mem Util), 4 (GPU Temp), and 5 (GPU Power). The current code incorrectly assigns parts[3] (Memory Util) to temp.

Hardcoded Device: It always queries physical device 0 via xpu-smi -d 0. This should be mapped from the current torch device index, especially when ZE_AFFINITY_MASK is active to restrict visibility.

Power Data: Metric 18 (Power) is requested in the command but not parsed into the return dictionary.

gemini-code-assist · 2026-03-31T11:21:39Z

+        if hasattr(torch, "xpu") and torch.xpu.is_available():
+            versions["xpu"] = True


Setting versions["xpu"] = True is inconsistent with other entries in the versions dictionary, which typically contain version strings (e.g., from importlib.metadata). It would be more useful to provide a string indicating support or the version of the Intel Extension for PyTorch (IPEX) if applicable.

danielhanchen

Thank you for the PR! The goal of this PR is to enable Unsloth Studio on Intel XPU GPUs. As a summary, this PR replaces hardcoded CUDA-only code paths in Studio audio inference, training, and cache clearing with device-agnostic helpers (get_torch_device_str(), clear_gpu_cache()), adds XPU detection/telemetry in hardware.py, and updates error messaging to mention Intel GPUs.

I pushed a commit fixing the issues below. Here is a summary of findings after reviewing:

Reviewers	Severity	Finding
10/10	High	`_get_xpu_utilization()` used `-m 0,1,2,18` which mapped `parts[3]` to temperature incorrectly -- it was GPU Memory Utilization, not temperature. Fixed to use `-m 0,2,3` (Utilization, Power, Core Temperature) with correct column parsing.
10/10	High	`xpu-smi dump` without `-n 1` runs indefinitely until the timeout kills it. Added `-n 1` to request exactly one sample.
10/10	Medium	Hardcoded `-d 0` in xpu-smi call ignores which device is active. Fixed to use `torch.xpu.current_device()`.
10/10	Medium	Dead code in `get_visible_gpu_count()` -- XPU re-check at line 1357 was unreachable because the XPU block above always returns first. Removed the dead branch.
8/10	Medium	`versions["xpu"] = True` was a `bool` in a `Dict[str, Optional[str]]`. Changed to `getattr(torch.version, "xpu", "available")` to return the actual XPU version string.
7/10	Low	`power_draw_w` was always `None` in the returned dict even though power data was being requested from xpu-smi. Now correctly populated.

All fixes are in the latest commit. Concrete suggestions for each finding above.

danielhanchen · 2026-04-03T19:24:00Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for Intel XPU (Intel GPUs) across the inference and training modules by abstracting device-specific logic into utility functions like get_torch_device_str and clear_gpu_cache. Key changes include updated hardware detection, XPU utilization monitoring via xpu-smi, and safety measures to disable multiprocessing on XPU devices to prevent memory context corruption. Feedback is provided regarding the use of broad, silent exception handling in the XPU utilization logic, which should be refined to include explicit error handling and logging.

gemini-code-assist · 2026-04-03T19:30:07Z

+        result = subprocess.run(
+            ["xpu-smi", "dump", "-d", str(dev_idx), "-m", "0,2,3", "-n", "1"],
+            capture_output = True,
+            text = True,
+            timeout = 10,
+        )


The subprocess.run call for xpu-smi might raise a FileNotFoundError. While the broad except Exception block at line 430 catches this, it is better to explicitly handle FileNotFoundError. Additionally, ensure the exception is logged (even at a debug level) rather than silently ignored, as per repository rules to aid in future debugging.

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

danielhanchen · 2026-04-03T19:44:53Z

/gemini review

gemini-code-assist · 2026-04-03T19:44:57Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

savvadesogle · 2026-04-05T18:56:25Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for Intel XPU hardware by abstracting device-specific logic into hardware-agnostic utility functions like get_torch_device_str and clear_gpu_cache. Key updates include enhanced hardware detection, XPU utilization tracking via xpu-smi, and improved handling of device visibility and multiprocessing on Intel GPUs. Review feedback focuses on improving code organization by moving local imports to the module level and enhancing observability by logging exceptions in broad try-except blocks instead of using silent pass statements.

gemini-code-assist · 2026-04-05T19:00:31Z

+            from utils.hardware import get_torch_device_str
+


For better code organization and to avoid repeated imports within a function, please move the import of get_torch_device_str to the top of the file. You can add it to the existing from utils.hardware import ... block at line 21.

gemini-code-assist · 2026-04-05T19:00:31Z

+                from utils.hardware import clear_gpu_cache
+


To improve code organization and adhere to Python's best practices, please move this import to the top of the file. This will make dependencies clearer and avoid re-importing within methods.

This also applies to the local imports of get_torch_device_str on lines 3020 and 3096.

gemini-code-assist · 2026-04-05T19:00:31Z

+        from utils.hardware import get_torch_device_str
+


To improve code organization and avoid redundant imports, please move the import of get_torch_device_str to the top of the file. It can be added to the existing from utils.hardware import ... block at line 36.

Additionally, clear_gpu_cache is already imported at the top, so its local imports (e.g., at lines 1713, 1945, and 2162) can be removed. This change should be applied to all local imports from utils.hardware throughout this file.

References

Avoid redundant inline imports if the module is already imported at the top of the file.

gemini-code-assist · 2026-04-05T19:00:31Z

+    except Exception:
+        pass


This broad except Exception: pass block silently ignores all errors that might occur when running xpu-smi, including timeouts or parsing errors. This can make debugging difficult. Please log the exception to aid in future troubleshooting, even if it's at a debug level. This aligns with the general rule to avoid silent exception handlers.

Suggested change

except Exception:

pass

except Exception as e:

logger.debug("Failed to get XPU utilization from xpu-smi: %s", e)

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

gemini-code-assist · 2026-04-05T19:00:31Z

+    except Exception:
+        pass


Similar to the xpu-smi call, this except Exception: pass block silently ignores errors when querying VRAM from torch.xpu. Logging these errors would be beneficial for debugging potential issues with the Intel GPU tooling or PyTorch's XPU support.

Suggested change

except Exception:

pass

except Exception as e:

logger.debug("Failed to get XPU VRAM info from torch.xpu: %s", e)

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

gemini-code-assist · 2026-04-05T19:00:32Z

+            import torch
+
+            _visible_gpu_count = torch.xpu.device_count()
+        except Exception:


This except Exception: block silently catches any error from torch.xpu.device_count() and proceeds to a fallback. To improve debuggability, please log the exception that was caught. This will help diagnose issues if torch.xpu is available but device_count() fails unexpectedly.

For example:

except Exception as e: logger.debug("torch.xpu.device_count() failed: %s", e) if xpu_visible: # ...

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

Adds first-class Intel XPU support to unsloth studio across hardware detection, GPU selection, visibility reporting, telemetry, cache clearing, OOM messaging, and the DAC autocast path, matching feature parity with the existing CUDA and ROCm backends. This is a rebased linear version of the leizhenyuan/zhenyuan_enable_studio branch with three rounds of review-loop fixes layered in. Originally authored by leizhenyuan. Rebased and review fixes by danielhanchen. Highlights: hardware.py - detect_hardware: promote DeviceType.XPU when torch.xpu is available and CUDA is not, preserving CPU / MLX / ROCm paths. - New helpers: _parse_ze_mask_roots, _resolve_xpu_smi_device_id, _get_xpu_utilization (xpu-smi -m 0,1,3 with cached binary path resolution and robust N/A parsing). - _get_parent_visible_gpu_spec adds an XPU branch that honors ZE_AFFINITY_MASK, including subdevice syntax (0.0,0.1) and wildcard / unparseable masks, and mirrors the CUDA UUID / MIG path by returning numeric_ids=None for subdevice masks so the rest of the stack enumerates torch-visible ordinals and can still shard. - _backend_visible_devices_env routes XPU through ZE_AFFINITY_MASK so /system-info and get_backend_visible_gpu_info report the active mask instead of a stale or None CUDA_VISIBLE_DEVICES. - get_visible_gpu_utilization and get_backend_visible_gpu_info short-circuit on "no devices visible" masks (ZE_AFFINITY_MASK="" or CUDA_VISIBLE_DEVICES="" / "-1") so the torch-ordinal enumeration fallback does not report hidden devices. - get_visible_gpu_count and get_device_map add XPU branches that prefer torch.xpu.device_count() and fall back to ZE mask roots (str.isdecimal() to reject Unicode superscripts). - apply_gpu_ids routes XPU through ZE_AFFINITY_MASK but leaves inherited CUDA_VISIBLE_DEVICES alone so hybrid NVIDIA+Intel hosts that hid CUDA via CUDA_VISIBLE_DEVICES="" stay on XPU. - auto_select_gpu_ids and prepare_gpu_selection allow XPU in the accelerator guard and use "non_accelerator" as the selection mode label for CPU / MLX. - resolve_requested_gpu_ids error message covers "CUDA and Intel XPU devices" and mentions ZE_AFFINITY_MASK for XPU. - clear_gpu_cache XPU branch guards synchronize and empty_cache with hasattr so older torch-xpu builds do not propagate AttributeError. - dataset_map_num_proc returns None only when torch.xpu.is_initialized exists and returns True, so older torch-xpu builds still get pre-init CPU dataset parallelism. - get_package_versions reports torch.version.xpu alongside cuda. utils/utils.py - format_error_message recognises Intel XPU OOM strings including out_of_device_memory, out_of_host_memory, memory allocation failed, and bare MemoryError instances, labelling messages as "Intel GPU" when DeviceType is XPU. core/inference/inference.py - _generate_dac derives autocast device from model.device.type instead of the global backend, clamps to cuda/xpu/mps/cpu, and falls through to nullcontext on cpu/xpu float32 (unsupported by torch.amp.autocast). core/inference/llama_cpp.py - _start_process pins the llama-server subprocess via ZE_AFFINITY_MASK on XPU hosts and CUDA_VISIBLE_DEVICES elsewhere so SYCL builds of llama-server respect the selected GPUs. - unload_model clears the XPU cache via the backend-neutral clear_gpu_cache helper. - init_audio_codec / generate_audio_response retain the pre-PR "cuda if torch.cuda.is_available() else cpu" fallback for the SNAC / BiCodec / DAC codecs because those upstream codecs are not yet validated on Intel XPU. core/training/trainer.py - clear_gpu_cache is hoisted to the module-level import block. - _preprocess_snac_dataset / _preprocess_bicodec_dataset / _preprocess_dac_dataset keep the pre-PR CPU fallback on non-CUDA hosts for Spark-TTS BiCodec, SNAC, and OuteTTS DAC / Whisper preprocessing. tests/test_gpu_selection.py - Replace TestXpuRejection with TestXpuSelection (positive tests for auto_select_gpu_ids and prepare_gpu_selection on XPU). - Update non-accelerator rejection assertion to the new "CUDA and Intel XPU" wording. - Update test_explicit_ids_are_rejected_for_uuid_parent_visibility regex to match the broadened "uses non-numeric or subdevice" error message. tests/test_gpu_selection_sandbox.py - Rename test_non_cuda_returns_none to test_non_accelerator_returns_none and assert selection_mode="non_accelerator".

for more information, see https://pre-commit.ci

…vation, FLAT hierarchy, backend-aware GGUF discovery Six targeted changes following the 10-reviewer pass on this PR: 1. studio/backend/core/inference/llama_cpp.py Remove env.pop("CUDA_VISIBLE_DEVICES", None) in the XPU llama-server spawn branch. The pop contradicted the documented preservation in apply_gpu_ids() for hybrid NVIDIA+Intel hosts where the parent sets CUDA_VISIBLE_DEVICES="" to force Studio onto Intel/SYCL. Added a comment referencing that design note. 2. studio/backend/core/inference/llama_cpp.py Make _get_gpu_free_memory backend-aware. NVIDIA keeps the cheap nvidia-smi fast path; AMD ROCm and Intel XPU fall through to get_visible_gpu_utilization so GGUF offload selection has per-GPU free memory on all backends, not just NVIDIA. Returns an empty list only when no telemetry is available. 3. studio/backend/utils/hardware/hardware.py detect_hardware now honors a non-empty ZE_AFFINITY_MASK as an explicit XPU hint. On a hybrid host with both torch.cuda and torch.xpu available, ZE_AFFINITY_MASK=... picks XPU before CUDA so users can pin Intel without also setting CUDA_VISIBLE_DEVICES="". 4. studio/backend/utils/hardware/hardware.py Add _xpu_hierarchy_is_composite() and gate the Level Zero mask parsing on it. In the FLAT hierarchy (oneAPI 2024+ default) numeric ZE_AFFINITY_MASK entries are tile or device handles, not root GPU IDs. _get_parent_visible_gpu_spec now returns supports_explicit_gpu_ids=False in FLAT so explicit gpu_ids are rejected; _resolve_xpu_smi_device_id returns None so callers skip "xpu-smi -d <wrong_tile>" and fall back to torch.xpu VRAM telemetry. COMPOSITE retains the original root-ID mapping. 5. studio/backend/utils/hardware/hardware.py Conservative XPU used_gb fallback. When torch.xpu lacks mem_get_info, _torch_get_per_device_info now returns used_gb=None instead of torch.xpu.memory_allocated (process-local and misleading for multi-tenant placement). get_visible_gpu_utilization guards against None during percent computation; downstream auto_select_gpu_ids already filters None entries. 6. studio/backend/models/inference.py, studio/backend/models/training.py gpu_ids Field descriptions now mention the Intel XPU ZE_AFFINITY_MASK subdevice and FLAT-hierarchy restrictions alongside the existing CUDA UUID/MIG note. Testing: - 96 PR-authored tests still pass (test_gpu_selection, test_gpu_selection_sandbox, test_utils). - 234 additional simulation tests across two isolated uv venvs cover fix-fix interactions, malformed inputs, cross-platform behavior (Linux, macOS, Windows), real subprocess env propagation, JSON/Pydantic schema roundtrips, concurrency, and fuzz. - 6/6 CUDA scenarios on 8xB200 hardware are byte-identical to the pre-fix baseline. Zero CUDA regression. - ROCm, MLX, and CPU regression tests all pass. AMD HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES propagation is unchanged. Backend support after these changes: NVIDIA CUDA, AMD ROCm, Intel XPU (both COMPOSITE and FLAT hierarchies), CPU, and Apple Silicon MLX.

for more information, see https://pre-commit.ci

…s, FLAT numerics Follow-up to 6c55664 addressing the four findings from the second 10-reviewer pass. Each fix is narrow and additive; the CUDA, ROCm, MLX, and CPU paths remain byte-identical to the pre-round-2 baseline. A. studio/backend/core/inference/llama_cpp.py Replace `dict.get(a) or dict.get(b)` with explicit `is None` fallbacks when reading vram_total_gb / vram_used_gb in the generic free-memory path. An idle GPU with vram_used_gb=0.0 was being treated as missing telemetry and dropped, which pushed GGUF placement into the non-placement fallback whenever the best free card was fully idle. 9/10 reviewers in the round-2 pass flagged this; every suggestion was the same patch. B. studio/backend/utils/hardware/hardware.py Tighten the ZE_AFFINITY_MASK -> XPU detection hint from "any non-empty mask wins" to "non-empty mask plus one of: CUDA_VISIBLE_DEVICES explicitly hidden (empty / -1), or UNSLOTH_FORCE_XPU=1, or CUDA simply not available". Prevents a stray ZE_AFFINITY_MASK=0 inherited from unrelated Intel tooling from silently flipping existing hybrid CUDA deployments onto XPU. Intel-only hosts are unaffected (CUDA unavailable -> hint is honoured). C. studio/backend/core/inference/llama_cpp.py Refuse to use device indices returned from get_visible_gpu_utilization when index_kind != "physical". Relative ordinals (reported for subdevice / wildcard / UUID parent masks) are not safe to round-trip back into ZE_AFFINITY_MASK / CUDA_VISIBLE_DEVICES; returning [] lets the launcher skip placement and inherit the parent's visibility unchanged. D. studio/backend/utils/hardware/hardware.py Relax _get_parent_visible_gpu_spec in FLAT hierarchy. Per Intel's Level Zero device-hierarchy docs, numeric ZE_AFFINITY_MASK entries in FLAT mode ARE stable flat ordinals that torch.xpu honours 1-to-1 with the child process. Previously we blanket-rejected all FLAT numeric masks, which meant Studio could not do XPU auto-selection under the default oneAPI configuration. Now: - FLAT + numeric mask -> accept as numeric_ids - FLAT + unset mask -> expose range(device_count) - FLAT + subdevice "N.M" -> reject (invalid syntax in FLAT) - FLAT + wildcard / non-numeric -> reject - COMPOSITE semantics unchanged (numeric -> root IDs) _resolve_xpu_smi_device_id continues to return None in FLAT because xpu-smi -d addresses root IDs, not flat ordinals; the telemetry path falls back to torch.xpu VRAM. Testing: - 96 PR-authored tests still pass (test_gpu_selection_sandbox, test_gpu_selection, test_utils). - 264 simulation tests pass across two isolated uv venvs. The round-2 tests include explicit coverage for: Fix A: idle GPU preserved, None still skipped, 0.0 total ok Fix B: bare mask on hybrid -> CUDA wins; CVD="" or "-1" hint -> XPU; UNSLOTH_FORCE_XPU=1 -> XPU; CUDA unavailable -> XPU; UNSLOTH_FORCE_XPU=0 (literal) not activated Fix C: relative ordinals not returned; physical ordinals returned; missing index_kind treated as physical Fix D: FLAT numeric accepted; FLAT unset auto-enumerated; FLAT subdevice rejected; FLAT wildcard rejected; xpu-smi resolver still None in FLAT; COMPOSITE unchanged - 6/6 CUDA scenarios on 8xB200 remain byte-identical to baseline. - ROCm, MLX, CPU regression tests all pass.

for more information, see https://pre-commit.ci

danielhanchen · 2026-04-16T08:04:25Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces comprehensive support for Intel XPU (Intel GPUs) across the backend, including inference, training, and hardware utility modules. Key updates include backend-aware GPU memory querying, support for the ZE_AFFINITY_MASK environment variable for device visibility, and integrated telemetry via xpu-smi. Additionally, the changes address XPU-specific constraints in multiprocessing and improve error reporting for device memory issues. Feedback was provided regarding the logic for calculating visible GPU counts when parsing ZE_AFFINITY_MASK, specifically to better distinguish between wildcard and invalid masks.

gemini-code-assist · 2026-04-16T08:18:10Z

+                roots = _parse_ze_mask_roots(xpu_visible)
+                # Non-digit wildcards (e.g. "*") yield an empty roots list;
+                # treat those as "all physical XPUs visible".
+                _visible_gpu_count = (
+                    len(set(roots)) if roots else get_physical_gpu_count()
+                )


The fallback logic for _visible_gpu_count when roots is empty is incorrect. It treats any mask that results in an empty roots list (like invalid masks ",," or non-numeric wildcards *) as a signal to use all physical GPUs. This is only correct for a wildcard like "*". For invalid or empty-like masks, the count should be 0.

This can lead to an incorrect number of visible GPUs being reported, potentially causing issues with resource allocation.

I suggest refining the logic to explicitly handle the wildcard case and treat other non-parsing masks as yielding zero devices.

Suggested change

roots = _parse_ze_mask_roots(xpu_visible)

# Non-digit wildcards (e.g. "*") yield an empty roots list;

# treat those as "all physical XPUs visible".

_visible_gpu_count = (

len(set(roots)) if roots else get_physical_gpu_count()

)

# Non-digit wildcards (e.g. "*") should be treated as "all physical XPUs visible",

# while other invalid or empty masks should result in 0.

if xpu_visible.strip() == "*":

_visible_gpu_count = get_physical_gpu_count()

else:

roots = _parse_ze_mask_roots(xpu_visible)

_visible_gpu_count = len(set(roots))

…rd fallback Follow-up to eebf077 addressing three findings from the third 10-reviewer pass and Gemini review 4119296322. All changes are in studio/backend/utils/hardware/hardware.py; CUDA/ROCm/MLX/CPU paths remain byte-identical to the pre-round-3 baseline. E. _get_parent_visible_gpu_spec in FLAT hierarchy now refuses explicit gpu_ids. Per Intel's "flattening-gpu-tile-hierarchy" doc and the Level Zero spec, numeric ZE_AFFINITY_MASK entries in FLAT mode are tile / device-handle ordinals, not physical GPU IDs -- accepting them as "Physical GPU indices" breaks the API contract documented in models/inference.py and models/training.py. numeric_ids is still populated so display and auto-selection enumerate devices, but supports_explicit_gpu_ids is False in FLAT regardless of mask presence. Users who need explicit tile-level selection can opt in with ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE. COMPOSITE semantics are unchanged -- numeric masks still resolve to root GPU IDs. Flagged by 6/10 reviewers in the third pass. F. detect_hardware restructured so UNSLOTH_FORCE_XPU=1 is a standalone trigger. Previously the outer "if ze_mask:" gate swallowed the force knob unless ZE_AFFINITY_MASK was also set, which meant the documented override did not work on a common hybrid NVIDIA+Intel scenario. Now the preference check is: prefer_xpu = force_xpu or (ze_mask and (cuda_hidden or cuda_unavailable)) The "stray ZE_AFFINITY_MASK flips CUDA to XPU" protection from round 2 is preserved: a bare mask without CUDA-hidden still keeps CUDA. Flagged by 3/10 reviewers in the third pass. G. get_visible_gpu_count torch-fallback path: only "*" wildcard counts as "all physical XPUs visible". Other non-parseable masks (",,,", "GPU-uuid", random garbage) now return 0 visible devices rather than silently exposing the full fleet. The happy path (torch.xpu .device_count() succeeds) is unchanged. Suggested by the Gemini review bot. Testing: - 96 PR-authored tests still pass (test_gpu_selection_sandbox, test_gpu_selection, test_utils). - 284 simulation tests pass across two isolated uv venvs. New test_round3_fixes.py covers each round-3 finding explicitly: Fix E: FLAT rejects gpu_ids via prepare_gpu_selection; FLAT unset also rejects; COMPOSITE still accepts; FLAT auto-select gracefully falls to inherit_parent_visible mode. Fix F: force_xpu alone picks XPU; force_xpu=0 does not; no-xpu-torch falls back to CUDA; force_xpu + mask works; CVD="" + mask path unchanged; bare mask still keeps CUDA. Fix G: "*" -> all_physical; ",,," -> 0; "GPU-abc" -> 0; "0,1,*" -> 2 (partial wildcard); subdevice -> 1 root; no mask unchanged; torch happy path bypasses fallback. - 6/6 CUDA scenarios on 8xB200 byte-identical to pre-fix baseline. - ROCm, MLX, CPU regression tests all pass.

for more information, see https://pre-commit.ci

danielhanchen · 2026-04-16T08:41:41Z

/gemini review

gemini-code-assist

Code Review

This pull request implements comprehensive support for Intel XPU (Intel GPUs) across the inference and training backends. Key updates include backend-aware hardware detection, GPU utilization monitoring via xpu-smi, and the correct application of ZE_AFFINITY_MASK for device pinning. The changes also refine autocast logic for diverse hardware and update error handling to recognize XPU-specific memory issues. The review feedback focuses on improving observability by replacing silent exception handlers with debug logging in the hardware utility module.

gemini-code-assist · 2026-04-16T08:45:15Z

+        except Exception:
+            pass


This broad except Exception: pass could hide unexpected errors during the cache clearing process. It's better to log the exception, even at a debug level, to aid in future debugging. This aligns with the general rule of avoiding silent exception handlers.

Suggested change

except Exception:

pass

except Exception as e:

logger.debug("Failed to clear XPU cache: %s", e)

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

gemini-code-assist · 2026-04-16T08:45:16Z

+    except Exception:
+        pass


Silently passing on all exceptions here can obscure issues with xpu-smi. It would be beneficial to log the exception at a debug level to help with troubleshooting, in line with the general rule to avoid silent exception handlers.

Suggested change

except Exception:

pass

except Exception as e:

logger.debug("xpu-smi query failed: %s", e)

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

gemini-code-assist · 2026-04-16T08:45:16Z

+    except Exception:
+        pass


This except Exception: pass block could hide errors when querying VRAM information from torch.xpu. Logging the exception at a debug level would be helpful for diagnostics, following the general rule to avoid silent exception handlers.

Suggested change

except Exception:

pass

except Exception as e:

logger.debug("torch.xpu VRAM query failed: %s", e)

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

gemini-code-assist · 2026-04-16T08:45:16Z

+        except Exception:
+            if xpu_visible:
+                # Fallback: count unique root device IDs from the mask.
+                # ZE_AFFINITY_MASK can use "device.subdevice" notation,
+                # so "0.0,0.1" is 1 root device, not 2. Without torch we
+                # cannot know which hierarchy mode is active, so fall back
+                # to root-device counting (the more conservative choice).
+                if xpu_visible == "*":
+                    # Documented wildcard: all physical XPUs visible.
+                    _visible_gpu_count = get_physical_gpu_count()
+                else:
+                    roots = _parse_ze_mask_roots(xpu_visible)
+                    # Non-parseable masks (",,,", "GPU-abc", etc.) yield an
+                    # empty roots list and are treated as 0 visible devices,
+                    # not "all visible" -- we have no evidence the user
+                    # intended to expose the whole fleet.
+                    _visible_gpu_count = len(set(roots))
+            else:
+                _visible_gpu_count = get_physical_gpu_count()
+        return _visible_gpu_count


The except Exception block here correctly implements fallback logic, but it doesn't log the exception that occurred. It would be valuable for debugging to log the error from torch.xpu.device_count() at a debug level before proceeding with the fallback.

Suggested change

except Exception:

if xpu_visible:

# Fallback: count unique root device IDs from the mask.

# ZE_AFFINITY_MASK can use "device.subdevice" notation,

# so "0.0,0.1" is 1 root device, not 2. Without torch we

# cannot know which hierarchy mode is active, so fall back

# to root-device counting (the more conservative choice).

if xpu_visible == "*":

# Documented wildcard: all physical XPUs visible.

_visible_gpu_count = get_physical_gpu_count()

else:

roots = _parse_ze_mask_roots(xpu_visible)

# Non-parseable masks (",,,", "GPU-abc", etc.) yield an

# empty roots list and are treated as 0 visible devices,

# not "all visible" -- we have no evidence the user

# intended to expose the whole fleet.

_visible_gpu_count = len(set(roots))

else:

_visible_gpu_count = get_physical_gpu_count()

return _visible_gpu_count

except Exception as e:

logger.debug("torch.xpu.device_count() failed, falling back to mask parsing: %s", e)

if xpu_visible:

# Fallback: count unique root device IDs from the mask.

# ZE_AFFINITY_MASK can use "device.subdevice" notation,

# so "0.0,0.1" is 1 root device, not 2. Without torch we

# cannot know which hierarchy mode is active, so fall back

# to root-device counting (the more conservative choice).

if xpu_visible == "*":

# Documented wildcard: all physical XPUs visible.

_visible_gpu_count = get_physical_gpu_count()

else:

roots = _parse_ze_mask_roots(xpu_visible)

# Non-parseable masks (",,,", "GPU-abc", etc.) yield an

# empty roots list and are treated as 0 visible devices,

# not "all visible" -- we have no evidence the user

# intended to expose the whole fleet.

_visible_gpu_count = len(set(roots))

else:

_visible_gpu_count = get_physical_gpu_count()

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

gemini-code-assist · 2026-04-16T08:45:16Z

    except Exception:
        _visible_gpu_count = get_physical_gpu_count()


Similar to other fallbacks in this file, it would be beneficial to log the exception from torch.cuda.device_count() at a debug level before falling back to get_physical_gpu_count(). This aids in debugging scenarios where the PyTorch call fails unexpectedly.

Suggested change

except Exception:

_visible_gpu_count = get_physical_gpu_count()

except Exception as e:

logger.debug("torch.cuda.device_count() failed, falling back to physical count: %s", e)

_visible_gpu_count = get_physical_gpu_count()

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

gemini-code-assist · 2026-04-16T08:45:16Z

+            except Exception:
+                # Treat a failing probe as "runtime not touched yet" so
+                # pre-init CPU preprocessing can still parallelize.
+                pass


The comment explains the fallback behavior, but silently passing on the exception could hide underlying issues with the torch.xpu.is_initialized() call. Logging the exception at a debug level would provide more insight during troubleshooting, while still allowing the safe fallback to proceed.

Suggested change

except Exception:

# Treat a failing probe as "runtime not touched yet" so

# pre-init CPU preprocessing can still parallelize.

pass

except Exception as e:

# Treat a failing probe as "runtime not touched yet" so

# pre-init CPU preprocessing can still parallelize.

logger.debug("torch.xpu.is_initialized() check failed, assuming not initialized: %s", e)

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

Gemini's fourth review pass (pullrequestreview-4119484559) flagged six bare `except Exception: pass` blocks in hardware.py where silent failures could hide real driver or runtime problems. Each site now logs at DEBUG level with the exception message, so operators have a diagnostic trail when something goes wrong, while the control flow stays unchanged and the same fallback path is taken: 1. clear_gpu_cache() XPU branch: log "Failed to clear XPU cache" 2. _resolve_xpu_smi_device_id(): log "torch.xpu.current_device() probe failed" 3. _get_xpu_utilization() xpu-smi call: log "xpu-smi query failed" 4. _get_xpu_utilization() torch.xpu VRAM query: log "torch.xpu VRAM query failed" 5. get_visible_gpu_count() torch.xpu fallback: log and still count mask roots 6. get_visible_gpu_count() torch.cuda fallback: log and still use physical count 7. dataset_map_num_proc() is_initialized probe: log "torch.xpu.is_initialized() probe failed" These are logging-only edits; every test in the PR's own suite, the two simulation harnesses (284 tests), and the 8x B200 CUDA behavior matrix (6/6 scenarios byte-identical) pass unchanged.

Round-4 review (10 parallel reviewers on PR unslothai#4724) produced a unanimous consensus finding plus two isolated but real regressions. This commit addresses all three. [10/10] studio/backend/utils/hardware/hardware.py _get_parent_visible_gpu_spec() on Intel XPU with ZE_FLAT_DEVICE_HIERARCHY unset (the oneAPI default FLAT hierarchy) and no ZE_AFFINITY_MASK was internally inconsistent: it returned numeric_ids=[0..N) with supports_explicit_gpu_ids=False. Downstream, get_visible_gpu_utilization() treated those ordinals as numeric_ids!=None, labeled them index_kind="physical", and llama.cpp's _get_gpu_free_memory() would round-trip tile/device-handle ordinals back into ZE_AFFINITY_MASK as if they were stable root-GPU IDs. Collapse the FLAT+no-mask case to numeric_ids=None so the telemetry path falls into its relative-ordinal branch and auto-selection uniformly returns inherit_parent_visible. Users who need explicit Intel selection opt in via ZE_FLAT_DEVICE_HIERARCHY =COMPOSITE. [1/10] studio/backend/core/inference/llama_cpp.py LlamaCppBackend._get_gpu_free_memory() always probed nvidia-smi before falling back to the generic telemetry path. On a hybrid NVIDIA+Intel host running Studio in XPU mode, nvidia-smi returned NVIDIA indices and those ordinals were then used to build a ZE_AFFINITY_MASK for llama-server, pinning the server to the wrong device. Gate the nvidia-smi fast path on get_device() == CUDA and IS_ROCM=False so other backends skip straight to the backend-aware telemetry path. [1/10] studio/backend/utils/utils.py format_error_message() dropped the broad "memory"/"cuda" match in favor of specific substrings. That stopped matching common CUDA allocation failures such as "CUDA error: CUBLAS_STATUS_ALLOC_FAILED", so users saw the raw backend exception instead of the friendly "Not enough GPU memory" hint. Re-add the CUBLAS, cuda-error+alloc, and xpu-alloc patterns while keeping the matcher narrow enough that non-memory CUDA errors (e.g. "device-side assert triggered") still pass through unchanged. Verified: PR-authored tests (96 + 4 skipped), sim suite v1 (174), sim suite v2 (128 incl. 14 new round-5 coverage), 8x B200 CUDA matrix (6/6 scenarios byte-identical to pre-fix baseline).

for more information, see https://pre-commit.ci

Round-5 review (10 parallel reviewers on PR unslothai#4724) narrowed to two real correctness issues in the XPU path introduced by round-5's numeric_ids handling. This commit addresses both. [4/10] studio/backend/utils/hardware/hardware.py -- get_device_map() The "numeric_ids is None and visible_count > 1 -> balanced" heuristic was meant for CUDA UUID/MIG masks, but after round 5 collapsed FLAT no-mask XPU into numeric_ids=None, the heuristic started firing on the default Intel FLAT hierarchy and forced balanced sharding across tile handles even though the same module had already classified those ordinals as non-physical. Restrict the heuristic to CUDA so XPU stays on sequential loading unless the caller passes explicit gpu_ids (via ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE or prepare_gpu_selection). [3/10] studio/backend/utils/hardware/hardware.py -- get_visible_gpu_utilization, get_backend_visible_gpu_info FLAT numeric masks like ZE_AFFINITY_MASK="0,1" populate numeric_ids for telemetry display but have supports_explicit_gpu_ids=False because the tokens are tile/device-handle ordinals per Intel's Level Zero docs. Both helpers still labeled those ordinals index_kind="physical", which let LlamaCppBackend._get_gpu_free_memory() round-trip them back into ZE_AFFINITY_MASK as if they were stable root GPU IDs. Gate the "physical" label on parent_visible_spec["supports_explicit_gpu_ids"] and clear parent_visible_gpu_ids in the payload when it's False, so API consumers and llama.cpp skip the placement path. Verified: PR-authored tests (96 + 4 skipped), sim suite v1 (174, one updated), sim suite v2 (141 incl. 13 new round-6 tests), 8x B200 CUDA matrix (6/6 scenarios byte-identical to round-5 baseline).

for more information, see https://pre-commit.ci

Round-7 review (10 parallel reviewers on PR unslothai#4724) produced a strong convergent finding (7/10 independent reviewers on the same underlying issue): the PR is titled "enable Studio for Intel GPU" but on the default Intel oneAPI runtime (ZE_FLAT_DEVICE_HIERARCHY=FLAT, no ZE_AFFINITY_MASK set), Studio still couldn't auto-select or shard across multiple visible XPUs. Users had to manually opt into COMPOSITE hierarchy for the PR's advertised feature to actually work. The reviewers correctly noted that this is fixable without relaxing the "physical IDs only" contract of the public prepare_gpu_selection() API. torch.xpu ordinals are stable within a single worker process's inherited ZE_AFFINITY_MASK scope: Studio can safely generate them internally for auto-selection, narrow the mask via apply_gpu_ids(), and let the child process inherit the same torch.xpu device set. Round-trip stays 1:1 within the process boundary the ordinals were observed in. [7/10] studio/backend/utils/hardware/hardware.py -- auto_select_gpu_ids() When get_device()==XPU and supports_explicit_gpu_ids=False (FLAT hierarchy cases), instead of immediately bailing out with selection_mode="inherit_parent_visible" and selected=None, fall back to list(range(get_visible_gpu_count())) as the candidate set and continue into the VRAM-based selection logic. This path is flagged via a new metadata key xpu_relative_auto_select=True so telemetry stays introspectable. The public prepare_gpu_selection() API is unchanged and still refuses user-supplied explicit gpu_ids on these masks, because end users can't reason about FLAT tile handles; only Studio's own auto-selection (which controls the worker-process env) can use these ordinals safely. [3/10] studio/backend/core/inference/llama_cpp.py -- _get_gpu_free_memory() The index_kind="relative" guard was rejecting XPU ordinals alongside CUDA UUID/MIG fallbacks. CUDA relative indices stay unsafe (the parent has hidden the physical ID mapping, so re-exporting them into CUDA_VISIBLE_DEVICES would re-expose hidden GPUs). XPU relative ordinals are stable for the current worker scope, so accept them and let VRAM-based GGUF placement work on default Intel FLAT hosts. Verified: PR tests (96 + 4 skipped), sim_pr4724 (174), sim_fixes_v2 (153 incl. 11 new round-7 tests, 3 updated to new contract), 8x B200 CUDA matrix (6/6 scenarios byte-identical to round-5 baseline).

for more information, see https://pre-commit.ci

Round-7 added a relative-ordinal fallback so default Intel FLAT hosts could use multiple visible XPUs via auto-selection and VRAM-based llama.cpp placement. Round-8 review (6/10 convergent finding on 10 parallel reviewers of PR unslothai#4724) showed that fallback triggered even when the parent process had already narrowed visibility via ZE_AFFINITY_MASK, which silently retargeted child processes onto different Level Zero handles than the parent exposed. Example: parent sets ZE_AFFINITY_MASK="3,5" to expose tiles 3 and 5. torch.xpu in the parent sees ordinals 0 (=tile 3) and 1 (=tile 5). Round-7 auto_select returned [0, 1]; apply_gpu_ids() rewrote ZE_AFFINITY_MASK="0,1"; the child inherited that and saw tile 0 and tile 1, not the originally-exposed 3 and 5. Same regression on subdevice masks like "0.0,0.1". studio/backend/utils/hardware/hardware.py -- auto_select_gpu_ids() Gate xpu_relative_auto_select on parent_visible_spec["raw"] is None. When a parent mask is already set, defer to inherit_parent_visible so the child inherits the exact visibility the parent intended. studio/backend/core/inference/llama_cpp.py -- _get_gpu_free_memory() Gate the relative-XPU acceptance on os.environ.get("ZE_AFFINITY_MASK") is None. Inherited masks (numeric "3,5", subdevice "0.0,0.1", wildcard "*") are preserved via the existing "return []" fall-through. Verified: PR tests (96 + 4 skipped), sim_pr4724 (174), sim_fixes_v2 (162 incl. 9 new round-8 tests + 2 updated to the new inherited-mask contract), 8x B200 CUDA matrix (6/6 scenarios byte-identical).

danielhanchen · 2026-04-16T10:37:44Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces comprehensive support for Intel XPU (Intel GPUs) across the inference, training, and hardware utility modules. Key changes include backend-aware hardware detection, telemetry integration via xpu-smi and torch.xpu, and the implementation of ZE_AFFINITY_MASK for device visibility management. The PR also refactors GPU cache clearing and error formatting to be device-agnostic and addresses potential multiprocessing issues on XPU by disabling fork-based parallelism once the runtime is initialized. Feedback was provided to improve the robustness of environment variable parsing for GPU selection by filtering out empty tokens to prevent potential ValueErrors.

gemini-code-assist · 2026-04-16T10:40:53Z

+                cvd = os.environ.get("CUDA_VISIBLE_DEVICES")
+                if cvd is not None and cvd.strip():
+                    try:
+                        allowed = set(int(x.strip()) for x in cvd.split(","))


The list comprehension for parsing CUDA_VISIBLE_DEVICES should filter out empty strings to avoid a ValueError when calling int() on a trailing or double comma (e.g., "0,1,"). While the surrounding try...except block prevents a crash, it causes the entire filter to be ignored (allowed becomes None), which might lead to unexpected GPU selection. Filtering empty tokens ensures that numeric masks with trailing commas are still parsed correctly, consistent with the logic used in utils/hardware/hardware.py.

Suggested change

allowed = set(int(x.strip()) for x in cvd.split(","))

allowed = set(int(x.strip()) for x in cvd.split(",") if x.strip())

References

Avoid using broad, silent exception handlers like except Exception: pass. Instead, log the exception, even if at a debug level, to aid in future debugging.

Gemini review 4120214370 caught that LlamaCppBackend._get_gpu_free_memory() built its CUDA_VISIBLE_DEVICES filter with int(x.strip()) for x in cvd.split(","), which raises ValueError on an empty token from a shell- exported value like "0,1," (trailing comma) or "0,,1" (doubled comma). The surrounding try/except swallowed the crash but left ``allowed=None``, which silently disables the filter so every nvidia-smi GPU becomes a placement candidate even though the user clearly narrowed visibility. Skip empty tokens during the parse so trailing/doubled/leading commas still produce a valid filter, matching the "if token.strip()" pattern already used in utils/hardware/hardware.py.

Round-7 added a relative-ordinal fallback so default Intel FLAT hosts could auto-select multiple visible XPUs. Round-9 review (4/10 convergent plus related findings on 10 parallel reviewers of PR unslothai#4724) showed that on multi-tile Intel devices (Data Center GPU Max, 2 tiles per root), the synthesized torch.xpu ordinals enumerate tiles rather than distinct root GPUs. Writing them back via apply_gpu_ids() then rewrote ZE_AFFINITY_MASK with tile handles -- narrowing the worker onto a subset of one card instead of spreading across the parent-visible set. This revert + replacement keeps multi-XPU sharding working on default Intel FLAT without any environment-visible ordinal round-trip: studio/backend/utils/hardware/hardware.py -- auto_select_gpu_ids() Reverted R7-A. When the parent-visible spec cannot expose stable physical GPU IDs (FLAT no-mask, FLAT numeric, wildcard, subdevice), defer to inherit_parent_visible with selected_gpu_ids=None. No more synthesized worker-local ordinals written back into ZE_AFFINITY_MASK. studio/backend/core/inference/llama_cpp.py -- _get_gpu_free_memory() Reverted R7-B. XPU relative telemetry is rejected for placement just like CUDA UUID/MIG. llama-server inherits the parent's mask unchanged and uses its own multi-device machinery. studio/backend/utils/hardware/hardware.py -- get_device_map() New path: when device==XPU and caller did not pass gpu_ids, return "balanced" whenever more than one torch.xpu device is visible (either via unresolved mask or FLAT numeric mask that populates numeric_ids but does not support explicit selection). HF Transformers uses the torch ordinals directly -- scope-local within the worker process, no ZE_AFFINITY_MASK rewrite, so tile-vs-root ambiguity cannot leak out. Explicit gpu_ids=[0] continues to produce sequential so the caller's deliberate single-device choice is respected. Verified: PR tests (96 + 4 skipped), sim_pr4724 (174, one contract update), sim_fixes_v2 (183 incl. 15 new round-10 tests + 10 updated for the reverted contracts), 8x B200 CUDA matrix (6/6 scenarios byte-identical).

danielhanchen · 2026-04-16T11:46:16Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces comprehensive support for Intel XPU (Intel GPUs) across the backend. Key changes include updating hardware detection to recognize XPU devices via ZE_AFFINITY_MASK and UNSLOTH_FORCE_XPU environment variables, implementing backend-aware GPU memory and utilization tracking using xpu-smi and torch.xpu, and ensuring proper device pinning using ZE_AFFINITY_MASK. Additionally, the PR updates audio codec handling to maintain CPU fallbacks for non-validated backends, refactors GPU cache clearing to be device-agnostic, and adjusts multiprocessing logic to prevent XPU runtime corruption on fork. I have no feedback to provide as there are no review comments.

Shorten multi-line design-rationale comments to 1-3 lines each. The surrounding code and commit history carry the full context; inline comments should state the rule, not re-derive the reasoning. Net reduction: ~107 comment lines removed across 2 files. No behavior change.

nobody5050 · 2026-05-07T15:34:32Z

Any likelihood of this getting merged soon?

rolandtannous · 2026-05-09T10:57:58Z

@nobody5050 yes it's being reviewed and tessted with the help of community users before we merge.

leizhenyuan requested a review from rolandtannous as a code owner March 31, 2026 11:19

gemini-code-assist Bot reviewed Mar 31, 2026

View reviewed changes

danielhanchen reviewed Apr 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 5, 2026

View reviewed changes

savvadesogle mentioned this pull request Apr 7, 2026

[Question] Unsloth Studio: When would you add INTEL and AMD support to the new webui? #4452

Open

This was referenced Apr 8, 2026

enable studio for intel GPU danielhanchen/unsloth-staging-2#13

Closed

enable studio for intel GPU unslothai/unsloth-staging-1#43

Closed

danielhanchen added the auto-reviewing Auto-review in progress label Apr 11, 2026

danielhanchen force-pushed the zhenyuan_enable_studio branch from 9ce735d to b759a03 Compare April 11, 2026 12:50

[pre-commit.ci] auto fixes from pre-commit.com hooks

a1c2c4b

for more information, see https://pre-commit.ci

danielhanchen added auto-approved Auto-review approved the PR and removed auto-reviewing Auto-review in progress auto-approved Auto-review approved the PR labels Apr 11, 2026

danielhanchen and others added 4 commits April 16, 2026 07:24

[pre-commit.ci] auto fixes from pre-commit.com hooks

a293d25

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

eebf077

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

danielhanchen and others added 2 commits April 16, 2026 08:34

[pre-commit.ci] auto fixes from pre-commit.com hooks

cc04baa

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

danielhanchen and others added 8 commits April 16, 2026 09:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

24b2a30

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

3fa23d9

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

95ac005

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

danielhanchen added 2 commits April 16, 2026 10:46

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

danielhanchen and others added 2 commits April 16, 2026 12:06

Merge branch 'main' into zhenyuan_enable_studio

b3844e0

andomeder mentioned this pull request May 4, 2026

fix(install): route Linux Intel GPU hosts away from CPU-only torch #5274

Draft

oobabooga mentioned this pull request May 27, 2026

Studio: add Vulkan llama.cpp support #5819

Open

This was referenced Jun 8, 2026

Intel Arc GPU (XPU) support tracking issue #6105

Open

feat: Intel Arc / XPU GPU support for Studio and Gemma training #6106

Draft

		if hasattr(torch, "xpu") and torch.xpu.is_available():
		versions["xpu"] = True

-    except Exception:
-        pass
+    except Exception as e:
+        logger.debug("Failed to get XPU utilization from xpu-smi: %s", e)

		except Exception:
		_visible_gpu_count = get_physical_gpu_count()

	allowed = set(int(x.strip()) for x in cvd.split(","))
	allowed = set(int(x.strip()) for x in cvd.split(",") if x.strip())

Uh oh!

Conversation

leizhenyuan commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 31, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen left a comment

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 3, 2026

Uh oh!

gemini-code-assist Bot commented Apr 3, 2026

Uh oh!

savvadesogle commented Apr 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!