[Bugfix] Fix layer-wise offload incompatibility with HSDP by RuixiangMa · Pull Request #2021 · vllm-project/vllm-omni

RuixiangMa · 2026-03-19T17:37:30Z

Purpose

vllm serve Wan-AI/Wan2.2-TI2V-5B-Diffusers --omni --use-hsdp --hsdp-shard-size 4 --enable-layerwise-offload --port 8004

File "/home/ruixiang/omni/vllm_omni/diffusion/offloader/layerwise_backend.py", line 115, in _to_cpu
cpu_tensor[current_offset : current_offset + numel].copy_(param_or_buf.flatten())
RuntimeError: aten.copy_.default got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!

Fix layer-wise offload incompatibility with HSDP by properly handling DTensor to local tensor conversion

Test Plan

Test Result

(APIServer pid=199924) INFO: 127.0.0.1:52912 - "POST /v1/videos HTTP/1.1" 200 OK
(APIServer pid=199924) INFO 03-20 01:33:30 [serving_video.py:118] Boundary ratio parse: request=0.875 gen_params=0.875
(APIServer pid=199924) INFO 03-20 01:33:30 [serving_video.py:128] Video sampling params: steps=10 guidance=4.0 guidance_2=4.0 seed=42
(APIServer pid=199924) INFO 03-20 01:33:30 [serving_video.py:204] Video generation routing: stage_configs=present, has_stage_list=True, engine_type=AsyncOmni
(APIServer pid=199924) INFO 03-20 01:33:30 [async_omni.py:511] [AsyncOrchestrator] Inline diffusion generate for request video_gen_02d35aafeb1e46399788ce9181d04de8
(APIServer pid=199924) INFO 03-20 01:33:30 [diffusion_engine.py:86] Pre-processing completed in 0.0000 seconds
INFO 03-20 01:33:30 [manager.py:608] Deactivating all adapters: 0 layers
INFO 03-20 01:33:30 [manager.py:608] Deactivating all adapters: 0 layers
INFO 03-20 01:33:30 [manager.py:608] Deactivating all adapters: 0 layers
INFO 03-20 01:33:30 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-20 01:33:30 [kv_transfer_manager.py:381] No connector available for receiving KV cache
WARNING 03-20 01:33:30 [kv_transfer_manager.py:381] No connector available for receiving KV cache
WARNING 03-20 01:33:30 [kv_transfer_manager.py:381] No connector available for receiving KV cache
WARNING 03-20 01:33:30 [kv_transfer_manager.py:381] No connector available for receiving KV cache
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00, 1.56s/it]
(APIServer pid=199924) INFO 03-20 01:33:47 [diffusion_engine.py:94] Generation completed successfully.
(APIServer pid=199924) INFO 03-20 01:33:47 [diffusion_engine.py:116] Post-processing completed in 0.0542 seconds
(APIServer pid=199924) INFO 03-20 01:33:47 [diffusion_engine.py:119] DiffusionEngine.step breakdown: preprocess=0.04 ms, add_req_and_wait=17790.30 ms, postprocess=54.16 ms, total=17845.49 ms
(APIServer pid=199924) INFO 03-20 01:33:47 [omni_diffusion.py:133] OmniDiffusion.generate total: 17846.18 ms
(APIServer pid=199924) INFO 03-20 01:33:48 [serving_video.py:159] Video response encoding (MP4+base64): 327.40 ms
(APIServer pid=199924) INFO 03-20 01:33:48 [api_server.py:1778] Video request video_gen_02d35aafeb1e46399788ce9181d04de8 persisted /tmp/storage/video_gen_02d35aafeb1e46399788ce9181d04de8.mp4 output file.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Lancer <maruixiang6688@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2c9d6a9cf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-19T17:42:17Z

+                if use_hsdp and hasattr(t, "to_local"):
+                    local_t = t.to_local()
+                else:
+                    local_t = t
+                weights_with_local.append((name, t, local_t))


Don't rematerialize HSDP blocks from to_local() shards

With --use-hsdp --enable-layerwise-offload, _to_cpu() now snapshots each DTensor via to_local() and prefetch_layer() later writes that shard back as a plain Tensor. Because HookRegistry wraps forward() instead of registering a forward-pre-hook (vllm_omni/diffusion/hooks/base.py:165-172), the block's FSDP/HSDP pre-hook runs outside this path; the eagerly prefetched first block in LayerWiseOffloadBackend.enable() therefore enters its first call with only rank-local weights, not a DTensor that FSDP can all-gather. On multi-GPU HSDP, any dim-0-sharded weight will then run with truncated per-rank shapes or fail outright.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-03-20T00:28:14Z

@yuanheng-zhao PTAL

yuanheng-zhao · 2026-03-20T01:36:49Z

+                if use_hsdp and hasattr(t, "to_local"):
+                    local_t = t.to_local()
+                else:
+                    local_t = t


Does HSDP convert all of the model parameters to DTensor?

only parameters of sharded modules (matched by hsdp_shard_conditions) become DTensor, other parameter remain as regular

yuanheng-zhao · 2026-03-20T01:46:36Z

+            weights_with_local = []
+            for name, t in name2weights.items():
+                if use_hsdp and hasattr(t, "to_local"):
+                    local_t = t.to_local()


Is there any chances that DTensor.to_local returns AsyncCollectiveTensor at here

to_local() returns the local shard directly without triggering communication，i think it is saft

yuanheng-zhao · 2026-03-20T01:54:22Z

                )

-                param_or_buf.data = torch.empty((), device=device, dtype=dtype)
+                original_tensor.data = torch.empty((), device=device, dtype=dtype)


DTensor.data = XXX is accessing Dtensor data directly rather than its local tensor. Could you help to check if this is consistent and compatible?

Agreed. That would bypass DTensor semantics, so I changed the layerwise backend to keep the DTensor wrapper and update _local_tensor instead

yuanheng-zhao · 2026-03-20T01:58:46Z

            for metadata in ordered_metadata:
                target_name = metadata["name"]
                target_param_or_buf = (
                    layer_params[target_name] if target_name in layer_params else layer_bufs[target_name]
                )

                target_param_or_buf.data = gpu_weight[metadata["offset"] : metadata["offset"] + metadata["numel"]].view(
                    metadata["shape"]
                )


Same as above. The assignment and materialization of gpu tensors during prefetch for DTensor directly happen here (e.g., target_param_or_buf.data = ...)

lishunyang12

Fix addresses the crash correctly. Two notes:

The use_hsdp flag feels redundant — just checking hasattr(t, "to_local") everywhere would be simpler, no config plumbing needed, and works for any future DTensor scenario.
yuanheng-zhao's .data assignment question (lines 142, 192) is important — assigning plain tensors back into DTensor parameters during prefetch_layer may break FSDP bookkeeping. offload_layer() has the same pattern but wasn't updated either.

lishunyang12

Left a couple comments. The _to_cpu fix looks correct, but prefetch_layer and offload_layer still assign plain tensors to .data on DTensor parameters — that seems like it'll break HSDP bookkeeping on the reload path.

lishunyang12 · 2026-03-22T17:26:57Z

+            # When HSDP is enabled, tensors are DTensor and we need to_local() for correct numel/shape
+            weights_with_local = []
+            for name, t in name2weights.items():
+                if use_hsdp and hasattr(t, "to_local"):


The use_hsdp guard is redundant — just check hasattr(t, "to_local") unconditionally. That removes the need to thread a bool through OffloadConfig → LayerwiseOffloadHook → _to_cpu → apply_block_hook, and it'll work for any future DTensor usage, not just HSDP.

Suggested change

if use_hsdp and hasattr(t, "to_local"):

if hasattr(t, "to_local"):

removed the local use_hsdp plumbing in this module and switched to capability-based handling for DTensor tensors only

lishunyang12 · 2026-03-22T17:26:57Z

                        "offset": current_offset,
                        "numel": numel,
-                        "shape": param_or_buf.shape,
+                        "shape": local_tensor.shape,


prefetch_layer uses metadata["shape"] to .view() the GPU slice and then assigns it to target_param_or_buf.data. With this PR, shape is now the local tensor shape, but target_param_or_buf is still a DTensor. That .data assignment replaces the DTensor internals with a plain tensor — doesn't this break FSDP/HSDP state tracking on the reload path? Same concern for offload_layer which does param.data = torch.empty(...).

applied the same DTensor-safe storage update in prefetch_layer() as well, so both prefetch and offload follow the same handling

lishunyang12 · 2026-03-22T17:26:57Z

        enable_layerwise_offload = getattr(od_config, "enable_layerwise_offload", False)
        pin_cpu_memory = getattr(od_config, "pin_cpu_memory", True)

+        parallel_config = getattr(od_config, "parallel_config", None)


If you drop the explicit use_hsdp flag (see other comment about duck-typing to_local), this config plumbing and the field on OffloadConfig can all go away.

RuixiangMa · 2026-03-25T15:26:51Z

Fix addresses the crash correctly. Two notes:

The use_hsdp flag feels redundant — just checking hasattr(t, "to_local") everywhere would be simpler, no config plumbing needed, and works for any future DTensor scenario.

yuanheng-zhao's .data assignment question (lines 142, 192) is important — assigning plain tensors back into DTensor parameters during prefetch_layer may break FSDP bookkeeping. offload_layer() has the same pattern but wasn't updated either.

Good suggestion, my initial commit used a similar approach, but I updated it to make the intent clearer for HSDP, i will modify it.

Signed-off-by: Lancer <maruixiang6688@gmail.com>

wtomin · 2026-03-31T03:25:19Z

Please fix the pre-commit errors. This bugfix looks good to me.

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa · 2026-03-31T05:16:53Z

Please fix the pre-commit errors. This bugfix looks good to me.

fixed

…ct#2021) Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa added 2 commits March 15, 2026 23:49

[Bugfix] fix layer offload and fsdp incompatible

170b396

Signed-off-by: Lancer <maruixiang6688@gmail.com>

upd

b2c9d6a

Signed-off-by: Lancer <maruixiang6688@gmail.com>

RuixiangMa requested a review from hsliuustc0106 as a code owner March 19, 2026 17:37

chatgpt-codex-connector Bot reviewed Mar 19, 2026

View reviewed changes

RuixiangMa changed the title ~~Fixlayeroffloadhsdp~~ [Buggix] Fix layer-wise offload incompatibility with HSDP Mar 19, 2026

RuixiangMa changed the title ~~[Buggix] Fix layer-wise offload incompatibility with HSDP~~ [Bugfix] Fix layer-wise offload incompatibility with HSDP Mar 19, 2026

yuanheng-zhao reviewed Mar 20, 2026

View reviewed changes

lishunyang12 reviewed Mar 21, 2026

View reviewed changes

lishunyang12 reviewed Mar 22, 2026

View reviewed changes

Merge branch 'main' into fixlayeroffloadhsdp

93ad8c7

upd

2fc035f

Signed-off-by: Lancer <maruixiang6688@gmail.com>

upd

f0a6d1b

Signed-off-by: Lancer <maruixiang6688@gmail.com>

wtomin added the ready label to trigger buildkite CI label Mar 31, 2026

wtomin approved these changes Mar 31, 2026

View reviewed changes

wtomin mentioned this pull request Mar 31, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

wtomin merged commit dd0b6fd into vllm-project:main Apr 1, 2026
7 of 8 checks passed

wtomin mentioned this pull request Apr 1, 2026

[Feat] support for multi-block layerwise offloading, fix top-level parameters/buffers staying on CPU #1486

Merged

vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026

[Bugfix] Fix layer-wise offload incompatibility with HSDP (vllm-proje…

0ac7891

…ct#2021) Signed-off-by: Lancer <maruixiang6688@gmail.com>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Bugfix] Fix layer-wise offload incompatibility with HSDP (vllm-proje…

17d0cf0

…ct#2021) Signed-off-by: Lancer <maruixiang6688@gmail.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Bugfix] Fix layer-wise offload incompatibility with HSDP (vllm-proje…

15336b2

…ct#2021) Signed-off-by: Lancer <maruixiang6688@gmail.com>

	if use_hsdp and hasattr(t, "to_local"):
	if hasattr(t, "to_local"):

Conversation

RuixiangMa commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Mar 20, 2026

Uh oh!

yuanheng-zhao Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RuixiangMa commented Mar 25, 2026

Uh oh!

wtomin commented Mar 31, 2026

Uh oh!

RuixiangMa commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

RuixiangMa commented Mar 19, 2026 •

edited

Loading

yuanheng-zhao Mar 20, 2026 •

edited

Loading