Skip to content

Guard torch.distributed submodule imports for ROCm builds#1709

Closed
0xDELUXA wants to merge 1 commit into
deepbeepmeep:mainfrom
0xDELUXA:fix/rocm-distributed-imports
Closed

Guard torch.distributed submodule imports for ROCm builds#1709
0xDELUXA wants to merge 1 commit into
deepbeepmeep:mainfrom
0xDELUXA:fix/rocm-distributed-imports

Conversation

@0xDELUXA
Copy link
Copy Markdown

@0xDELUXA 0xDELUXA commented Apr 14, 2026

Problem

On Windows ROCm via TheRock, the torch._C._distributed_c10d C-extension is not shipped. torch.distributed itself is present as a stub, but importing any submodule that touches c10d bindings - torch.distributed.fsdp, torch.distributed._tensor, torch.distributed.tensor.parallel - raises at import time:

ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package

Wan2GP hits this at startup because models/wan/any2video.py unconditionally imports shard_model from models/wan/distributed/fsdp.py, which pulls in torch.distributed.fsdp at module top. Full traceback ends at:

File "Wan2GP/models/wan/distributed/fsdp.py", line 5, in <module>
    from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
...
ModuleNotFoundError: No module named 'torch._C._distributed_c10d'

This blocks Wan2GP from starting at all on AMD + TheRock, even for single-GPU inference where distributed is never used.

Fix

Wrap the affected top-level imports in try/except (ImportError, ModuleNotFoundError), falling back to None when the submodule cannot be loaded. Behavior on CUDA and on ROCm builds that ship c10d is unchanged - the imports succeed and the bound names are identical.

Changed files:

  • models/wan/any2video.py - guard import torch.distributed as dist and from .distributed.fsdp import shard_model.
  • models/wan/multitalk/multitalk.py - guard import torch.distributed as dist. Imported lazily from any2video.py, so this path only triggers when multitalk features are used, but fails for the same reason.
  • models/kandinsky5/kandinsky/models/parallelize.py - guard torch.distributed._tensor and torch.distributed.tensor.parallel imports.

Plain import torch.distributed as dist statements elsewhere in the repo (hyvideo, nanovllm, mmaudio, TTS, magi_human, longcat, radial_attention) were not changed - the bare namespace imports succeed on TheRock; only submodule imports that touch c10d fail.

Verification

Reproduced on torch 2.12.0a0+rocm7.13.0a (TheRock), Windows 11, Python 3.12:

  • Before: python wgp.py crashes at import with the traceback above.
  • After: Wan2GP starts successfully. shard_model, dist, and the kandinsky tensor-parallel symbols resolve to None; no code path in single-GPU inference references them.
  • On CUDA PyTorch (verified separately): no behavior change - try-blocks succeed and bindings are identical.

Resolves

cc @deepbeepmeep @Tophness

@deepbeepmeep
Copy link
Copy Markdown
Owner

thx, but I have removed all distributed code in the wan model, please update and let me know if there is some left that crashed on AMD

@0xDELUXA
Copy link
Copy Markdown
Author

Great! Closing this as resolved by ce961de

@0xDELUXA 0xDELUXA closed this Apr 18, 2026
@0xDELUXA 0xDELUXA deleted the fix/rocm-distributed-imports branch April 18, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants