[Feat] Support layerwise CPU offloading for more videogen models by yuanheng-zhao · Pull Request #2018 · vllm-project/vllm-omni

yuanheng-zhao · 2026-03-19T14:20:49Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Part of #1217

Support and test layerwise CPU offloading on more models

LTX-2 - natively supported
DreamID-Omni - modified modeling to form fused blocks, so that adapt to support layerwise offloading

Test Plan

e2e generation and output quality comparison

Lightricks/LTX-2

python examples/offline_inference/text_to_video/text_to_video.py \
  --model "Lightricks/LTX-2" \
  --prompt "A cinematic close-up of ocean waves at golden hour." \
  --negative-prompt "worst quality, inconsistent motion, blurry, jittery, distorted" \
  --height 512 \
  --width 768 \
  --num-frames 121 \
  --num-inference-steps 40 \
  --guidance-scale 4.0 \
  --frame-rate 24 \
  --output ltx2_out.mp4

python examples/offline_inference/text_to_video/text_to_video.py \
  --model "Lightricks/LTX-2" \
  --prompt "A cinematic close-up of ocean waves at golden hour." \
  --negative-prompt "worst quality, inconsistent motion, blurry, jittery, distorted" \
  --height 512 \
  --width 768 \
  --num-frames 121 \
  --num-inference-steps 40 \
  --guidance-scale 4.0 \
  --frame-rate 24 \
  --enable-layerwise-offload \
  --output ltx2_out_layerwise.mp4

XuGuo699/DreamID-Omni

python x_to_video_audio.py \
  --model /path/to/vllm-project/vllm-omni/examples/offline_inference/x_to_video_audio/dreamid_omni \
  --prompt "<img1>: In the frame, a woman with black long hair is identified as <sub1>.\n**Overall Environment/Scene**: A lively open-kitchen café at night; stove flames flare, steam rises, and warm pendant lights swing slightly as staff move behind her. The shot is an upper-body close-up.\n**Main Characters/Subjects Appearance**: <sub1> is a young woman with thick dark wavy hair and a side part. She wears a fitted black top under a light apron, a thin gold chain necklace, and small stud earrings.\n**Main Characters/Subjects Actions**: <sub1> tastes the sauce with a spoon, then turns her face toward the camera while still holding the spoon, her expression shifting from focused to conflicted.\n<sub1> maintains eye contact, swallows as if choosing her words, and says, <S>I keep telling myself I’m fine,but some nights it feels like I’m just performing calm.<E>" \
  --image-path 9.png \
  --audio-path 9.wav \
  --video-negative-prompt "jitter, bad hands, blur, distortion" \
  --audio-negative-prompt "robotic, muffled, echo, distorted" \
  --cfg-parallel-size 2 \
  --num-inference-steps 45 \
  --height 704 \
  --width 1280 \
  --enable-layerwise-offload \
  --output out_dreamid_omni_layerwise.mp4

Test Result

Stats:

Model \ Offloading	Disabled	Enabled
Lightricks/LTX-2	Peak GPU memory (this request): 72.35 GB reserved, 69.15 GB allocated, 3.20 GB pool overhead (4.4%)	Peak GPU memory (this request): 38.88 GB reserved, 35.35 GB allocated, 3.53 GB pool overhead (9.1%)
XuGuo699/DreamID-Omni (2 devices)	Peak GPU memory (this request): 64.32 GB reserved, 55.31 GB allocated, 9.01 GB pool overhead (14.0%)	Peak GPU memory (this request): 45.26 GB reserved, 35.59 GB allocated, 9.67 GB pool overhead (21.4%)

*Collected by DiffusionModelRunner._record_peak_memory

Generated Videos:

Model \ Offloading	Disabled	Enabled
Lightricks/LTX-2	https://github.com/user-attachments/assets/f029ebee-a7ea-4d70-a0c9-c6be1d758707	https://github.com/user-attachments/assets/96caebaf-1854-4feb-910c-513dca6ded67
XuGuo699/DreamID-Omni	https://github.com/user-attachments/assets/89e7c3d4-cd6a-48d8-a991-b06c229aaa31	https://github.com/user-attachments/assets/7ac9b354-6eca-46ae-a5c9-c74d60210b06

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-04-15T16:03:31Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

yuanheng-zhao · 2026-04-15T16:03:56Z

Wait for #2809 to be merged first, and then rebase

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

yuanheng-zhao · 2026-04-16T03:38:34Z

Wait for #2809 to be merged first, and then rebase

Done

yuanheng-zhao · 2026-04-16T03:40:29Z

Layerwise offloading now supported on LTX-2, DreamID-Omni.

cc @wtomin , @gcanlin , @Bounty-hunter

wtomin · 2026-04-17T06:55:09Z

According to #1832, DreamID-Omni has no L4 e2e test yet. Could you create tests/e2e/online_serving/test_dreamid_omni_expansion.py to cover it's acceleration features, like sp, and layerwise cpu offloading? @yuanheng-zhao

As for LTX-2, there is an existing PR #2815. I will remind @Songrui625 to cover the layerwise cpu offloading feature when this PR is merged.

yuanheng-zhao · 2026-04-17T07:03:33Z

According to #1832, DreamID-Omni has no L4 e2e test yet. Could you create tests/e2e/online_serving/test_dreamid_omni_expansion.py to cover it's acceleration features, like sp, and layerwise cpu offloading? @yuanheng-zhao

As for LTX-2, there is an existing PR #2815. I will remind @Songrui625 to cover the layerwise cpu offloading feature when this PR is merged.

For now it's still not easy to run DreamID-Omni, as it require extra download and a hacky way of installation (examples/offline_inference/x_to_video_audio/download_dreamid_omni.py) and an inconsistency to our main env (e.g., difference version of transformers/diffusers, etc). During supporting this feat it took me quite a while to successfully run DreamID-Omni (and then found out failed to run other models due to DreamID-Omni's updated dependency conflicts) - I'd suggest to split the fixing and adding of DreamID-Omni e2e to another issue. @wtomin

I'd like to raise another PR to fix both modeling and dependency for DreamID-Omni so that it could be compatible with main vllm-omni usage - this might depend on the upgrading of transformers on vllm-omni as well

wtomin · 2026-04-17T08:06:24Z

I'd like to raise another PR to fix both modeling and dependency for DreamID-Omni so that it could be compatible with main vllm-omni usage - this might depend on the upgrading of transformers on vllm-omni as well

I am totally fine with it.

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

yuanheng-zhao · 2026-04-18T11:13:37Z

For updated commit, layerwise offloading enabled

out_dreamid_omni_oneip-3.mp4

yuanheng-zhao · 2026-04-19T13:58:25Z

cc @wtomin , @hsliuustc0106

hsliuustc0106

lgtm

lishunyang12 · 2026-04-19T16:43:52Z

LGTM

…m-project#2018) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

yuanheng-zhao force-pushed the feat/layerwise-offload-videogen branch 2 times, most recently from 86acc80 to 53745f6 Compare March 19, 2026 14:29

wtomin mentioned this pull request Mar 30, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

yuanheng-zhao force-pushed the feat/layerwise-offload-videogen branch from 53745f6 to 14cd581 Compare March 30, 2026 08:24

yuanheng-zhao force-pushed the feat/layerwise-offload-videogen branch 2 times, most recently from ad5e0ad to 02cd7b6 Compare April 15, 2026 15:55

yuanheng-zhao changed the title ~~[WIP][Feat] Support layerwise CPU offloading for more videogen models~~ [Do-Not-Merge][Feat] Support layerwise CPU offloading for more videogen models Apr 15, 2026

yuanheng-zhao marked this pull request as ready for review April 15, 2026 16:03

yuanheng-zhao requested a review from hsliuustc0106 as a code owner April 15, 2026 16:03

yuanheng-zhao added 7 commits April 16, 2026 11:35

enable layerwise offload for LTX-2

c143ff0

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

try to support layerwise offload for DreamID-Omni

7aa5f8c

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

upd

0fb1bc6

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

remap and impl load_weights

c343f32

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

upd fusion model modeling

c1cf882

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

upd gen script

1d10d0a

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

upd diffusion feat doc

78f0b6e

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

yuanheng-zhao force-pushed the feat/layerwise-offload-videogen branch from 35dee95 to 78f0b6e Compare April 16, 2026 03:36

yuanheng-zhao changed the title ~~[Do-Not-Merge][Feat] Support layerwise CPU offloading for more videogen models~~ [Feat] Support layerwise CPU offloading for more videogen models Apr 16, 2026

wtomin self-requested a review April 17, 2026 03:10

Merge branch 'main' into feat/layerwise-offload-videogen

ccb3123

wtomin requested review from Gaohan123, david6666666 and gcanlin April 17, 2026 08:11

wtomin requested a review from lishunyang12 April 17, 2026 08:11

lishunyang12 reviewed Apr 18, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/dreamid_omni/fusion.py Outdated

Comment thread vllm_omni/diffusion/models/dreamid_omni/fusion.py Outdated

upd fusion model

1b7a36b

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

lishunyang12 added the ready label to trigger buildkite CI label Apr 18, 2026

yuanheng-zhao requested a review from lishunyang12 April 19, 2026 01:53

hsliuustc0106 approved these changes Apr 19, 2026

View reviewed changes

lishunyang12 merged commit 68f28f9 into vllm-project:main Apr 19, 2026
8 checks passed

yuanheng-zhao deleted the feat/layerwise-offload-videogen branch April 19, 2026 17:23

BBuf mentioned this pull request Apr 20, 2026

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 BBuf/how-to-optim-algorithm-in-cuda#14

Open

Songrui625 mentioned this pull request Apr 20, 2026

[BugFix] Fix layerwise CPU offloading for LTX2 two-stages pipeline #2935

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Support layerwise CPU offloading for more videogen models#2018

[Feat] Support layerwise CPU offloading for more videogen models#2018
lishunyang12 merged 9 commits intovllm-project:mainfrom
yuanheng-zhao:feat/layerwise-offload-videogen

yuanheng-zhao commented Mar 19, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 15, 2026

Uh oh!

yuanheng-zhao commented Apr 15, 2026

Uh oh!

yuanheng-zhao commented Apr 16, 2026

Uh oh!

yuanheng-zhao commented Apr 16, 2026

Uh oh!

wtomin commented Apr 17, 2026

Uh oh!

yuanheng-zhao commented Apr 17, 2026 •

edited

Loading

Uh oh!

wtomin commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

yuanheng-zhao commented Apr 18, 2026

Uh oh!

yuanheng-zhao commented Apr 19, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

lishunyang12 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yuanheng-zhao commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 15, 2026

Uh oh!

yuanheng-zhao commented Apr 15, 2026

Uh oh!

yuanheng-zhao commented Apr 16, 2026

Uh oh!

yuanheng-zhao commented Apr 16, 2026

Uh oh!

wtomin commented Apr 17, 2026

Uh oh!

yuanheng-zhao commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wtomin commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

yuanheng-zhao commented Apr 18, 2026

Uh oh!

yuanheng-zhao commented Apr 19, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yuanheng-zhao commented Mar 19, 2026 •

edited

Loading

yuanheng-zhao commented Apr 17, 2026 •

edited

Loading