[tests] Review tests for PR #615#15
Closed
danielhanchen wants to merge 9 commits into
Closed
Conversation
…ackend Add device_empty_cache() helper in device_type.py alongside the existing device_synchronize(), and route every torch.cuda.empty_cache() / .synchronize() call in saving_utils.py through these helpers so XPU and HIP builds no longer crash or silently no-op during GGUF export. Concretely, this fixes: - Unguarded torch.cuda.empty_cache() in the outer shard loop of merge_and_overwrite_lora and inside _merge_and_overwrite_lora_mxfp4, both of which raise "Torch not compiled with CUDA enabled" on XPU after the first shard / mxfp4 tensor is processed. - Six guarded torch.cuda.empty_cache() / .synchronize() sites inside _merge_and_overwrite_lora and _merge_and_overwrite_lora_mxfp4 that silently no-op on XPU, leaving XPU VRAM unflushed mid-export. Add a private _active_merge_device(W) helper that returns W.device when W is already on the active backend, otherwise constructs torch.device( DEVICE_TYPE_TORCH[, index]). Route _merge_lora and the five MoE expert merge helpers (_merge_moe_gate_expert, _merge_moe_up_expert, _merge_moe_down_proj_expert, _merge_moe_fused_gate_up_expert, _merge_moe_fused_down_proj_expert) through it so MoE LoRA merges run on the active accelerator instead of silently falling back to CPU on XPU. CUDA/HIP behavior is unchanged because DEVICE_TYPE_TORCH equals "cuda" for both backends and device_empty_cache() preserves the existing torch.cuda.is_available() guard.
Add device_is_bf16_supported() to device_type.py alongside the existing
device_synchronize() and device_empty_cache() helpers, and route the three
torch.cuda.is_bf16_supported() callsites in llama_cpp.py's convert_to_gguf
mmproj/outtype branches through it. On XPU torch builds these calls would
otherwise raise "Torch not compiled with CUDA enabled" during VLM GGUF
export, mirroring the same crash class fixed in saving_utils.py.
CUDA and HIP behavior is unchanged (DEVICE_TYPE in ("cuda","hip") -> the
helper returns torch.cuda.is_bf16_supported() exactly as before).
2a87149 to
895ecb0
Compare
Mirror the defensive hasattr pattern from device_is_bf16_supported in device_empty_cache so that a torch.xpu module that exposes is_available but not empty_cache (custom or partial XPU build) does not raise AttributeError when the active backend cache is flushed.
895ecb0 to
2fd55a4
Compare
Mirror the defensive hasattr pattern already applied to device_empty_cache and device_is_bf16_supported so a torch.xpu module that exposes is_available but not synchronize (custom or partial XPU build) does not raise AttributeError when device_synchronize is invoked from the GGUF merge path.
2fd55a4 to
9123710
Compare
Rename test_device_synchronize_partial_build.py to test_backend_device_helpers.py so the file name reflects the actual scope (dispatch and partial-build safety across all three backend helpers: device_synchronize, device_empty_cache, device_is_bf16_supported).
ab7b4a8 to
b42a801
Compare
Collaborator
Author
|
Fixes pushed to unslothai#615. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated test files from review process