Skip to content

[tests] Review tests for PR #615#15

Closed
danielhanchen wants to merge 9 commits into
mainfrom
pr-615-tests
Closed

[tests] Review tests for PR #615#15
danielhanchen wants to merge 9 commits into
mainfrom
pr-615-tests

Conversation

@danielhanchen

Copy link
Copy Markdown
Collaborator

Automated test files from review process

andomeder and others added 5 commits April 28, 2026 21:31
…ackend

Add device_empty_cache() helper in device_type.py alongside the existing
device_synchronize(), and route every torch.cuda.empty_cache() / .synchronize()
call in saving_utils.py through these helpers so XPU and HIP builds no longer
crash or silently no-op during GGUF export. Concretely, this fixes:

- Unguarded torch.cuda.empty_cache() in the outer shard loop of
  merge_and_overwrite_lora and inside _merge_and_overwrite_lora_mxfp4, both
  of which raise "Torch not compiled with CUDA enabled" on XPU after the
  first shard / mxfp4 tensor is processed.
- Six guarded torch.cuda.empty_cache() / .synchronize() sites inside
  _merge_and_overwrite_lora and _merge_and_overwrite_lora_mxfp4 that
  silently no-op on XPU, leaving XPU VRAM unflushed mid-export.

Add a private _active_merge_device(W) helper that returns W.device when W is
already on the active backend, otherwise constructs torch.device(
DEVICE_TYPE_TORCH[, index]). Route _merge_lora and the five MoE expert merge
helpers (_merge_moe_gate_expert, _merge_moe_up_expert,
_merge_moe_down_proj_expert, _merge_moe_fused_gate_up_expert,
_merge_moe_fused_down_proj_expert) through it so MoE LoRA merges run on the
active accelerator instead of silently falling back to CPU on XPU.

CUDA/HIP behavior is unchanged because DEVICE_TYPE_TORCH equals "cuda" for
both backends and device_empty_cache() preserves the existing
torch.cuda.is_available() guard.
Add device_is_bf16_supported() to device_type.py alongside the existing
device_synchronize() and device_empty_cache() helpers, and route the three
torch.cuda.is_bf16_supported() callsites in llama_cpp.py's convert_to_gguf
mmproj/outtype branches through it. On XPU torch builds these calls would
otherwise raise "Torch not compiled with CUDA enabled" during VLM GGUF
export, mirroring the same crash class fixed in saving_utils.py.

CUDA and HIP behavior is unchanged (DEVICE_TYPE in ("cuda","hip") -> the
helper returns torch.cuda.is_bf16_supported() exactly as before).
Mirror the defensive hasattr pattern from device_is_bf16_supported in
device_empty_cache so that a torch.xpu module that exposes is_available
but not empty_cache (custom or partial XPU build) does not raise
AttributeError when the active backend cache is flushed.
Mirror the defensive hasattr pattern already applied to device_empty_cache
and device_is_bf16_supported so a torch.xpu module that exposes is_available
but not synchronize (custom or partial XPU build) does not raise
AttributeError when device_synchronize is invoked from the GGUF merge path.
Rename test_device_synchronize_partial_build.py to
test_backend_device_helpers.py so the file name reflects the actual
scope (dispatch and partial-build safety across all three backend
helpers: device_synchronize, device_empty_cache, device_is_bf16_supported).
@danielhanchen

Copy link
Copy Markdown
Collaborator Author

Fixes pushed to unslothai#615.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants