Guard `torch.distributed` submodule imports for ROCm builds by 0xDELUXA · Pull Request #1709 · deepbeepmeep/Wan2GP

0xDELUXA · 2026-04-14T11:00:50Z

Problem

On Windows ROCm via TheRock, the torch._C._distributed_c10d C-extension is not shipped. torch.distributed itself is present as a stub, but importing any submodule that touches c10d bindings - torch.distributed.fsdp, torch.distributed._tensor, torch.distributed.tensor.parallel - raises at import time:

ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package

Wan2GP hits this at startup because models/wan/any2video.py unconditionally imports shard_model from models/wan/distributed/fsdp.py, which pulls in torch.distributed.fsdp at module top. Full traceback ends at:

File "Wan2GP/models/wan/distributed/fsdp.py", line 5, in <module>
    from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
...
ModuleNotFoundError: No module named 'torch._C._distributed_c10d'

This blocks Wan2GP from starting at all on AMD + TheRock, even for single-GPU inference where distributed is never used.

Fix

Wrap the affected top-level imports in try/except (ImportError, ModuleNotFoundError), falling back to None when the submodule cannot be loaded. Behavior on CUDA and on ROCm builds that ship c10d is unchanged - the imports succeed and the bound names are identical.

Changed files:

models/wan/any2video.py - guard import torch.distributed as dist and from .distributed.fsdp import shard_model.
models/wan/multitalk/multitalk.py - guard import torch.distributed as dist. Imported lazily from any2video.py, so this path only triggers when multitalk features are used, but fails for the same reason.
models/kandinsky5/kandinsky/models/parallelize.py - guard torch.distributed._tensor and torch.distributed.tensor.parallel imports.

Plain import torch.distributed as dist statements elsewhere in the repo (hyvideo, nanovllm, mmaudio, TTS, magi_human, longcat, radial_attention) were not changed - the bare namespace imports succeed on TheRock; only submodule imports that touch c10d fail.

Verification

Reproduced on torch 2.12.0a0+rocm7.13.0a (TheRock), Windows 11, Python 3.12:

Before: python wgp.py crashes at import with the traceback above.
After: Wan2GP starts successfully. shard_model, dist, and the kandinsky tensor-parallel symbols resolve to None; no code path in single-GPU inference references them.
On CUDA PyTorch (verified separately): no behavior change - try-blocks succeed and bindings are identical.

Resolves

cc @deepbeepmeep @Tophness

deepbeepmeep · 2026-04-17T23:05:06Z

thx, but I have removed all distributed code in the wan model, please update and let me know if there is some left that crashed on AMD

0xDELUXA · 2026-04-18T12:04:07Z

Great! Closing this as resolved by ce961de

Guard torch.distributed submodule imports for ROCm builds

458e734

0xDELUXA closed this Apr 18, 2026

0xDELUXA deleted the fix/rocm-distributed-imports branch April 18, 2026 20:40

0xDELUXA mentioned this pull request May 16, 2026

Port gfx12 native attention to ABI3 fork woct0rdho/SageAttention#92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard `torch.distributed` submodule imports for ROCm builds#1709

Guard `torch.distributed` submodule imports for ROCm builds#1709
0xDELUXA wants to merge 1 commit into
deepbeepmeep:mainfrom
0xDELUXA:fix/rocm-distributed-imports

0xDELUXA commented Apr 14, 2026 •

edited

Loading

Uh oh!

deepbeepmeep commented Apr 17, 2026

Uh oh!

0xDELUXA commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xDELUXA commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Verification

Resolves

Uh oh!

deepbeepmeep commented Apr 17, 2026

Uh oh!

0xDELUXA commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

0xDELUXA commented Apr 14, 2026 •

edited

Loading