Skip to content

[Feat] Support layerwise CPU offloading for more videogen models#2018

Merged
lishunyang12 merged 9 commits intovllm-project:mainfrom
yuanheng-zhao:feat/layerwise-offload-videogen
Apr 19, 2026
Merged

[Feat] Support layerwise CPU offloading for more videogen models#2018
lishunyang12 merged 9 commits intovllm-project:mainfrom
yuanheng-zhao:feat/layerwise-offload-videogen

Conversation

@yuanheng-zhao
Copy link
Copy Markdown
Contributor

@yuanheng-zhao yuanheng-zhao commented Mar 19, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Part of #1217

Support and test layerwise CPU offloading on more models

  • LTX-2 - natively supported
  • DreamID-Omni - modified modeling to form fused blocks, so that adapt to support layerwise offloading

Test Plan

e2e generation and output quality comparison

Lightricks/LTX-2

python examples/offline_inference/text_to_video/text_to_video.py \
  --model "Lightricks/LTX-2" \
  --prompt "A cinematic close-up of ocean waves at golden hour." \
  --negative-prompt "worst quality, inconsistent motion, blurry, jittery, distorted" \
  --height 512 \
  --width 768 \
  --num-frames 121 \
  --num-inference-steps 40 \
  --guidance-scale 4.0 \
  --frame-rate 24 \
  --output ltx2_out.mp4

python examples/offline_inference/text_to_video/text_to_video.py \
  --model "Lightricks/LTX-2" \
  --prompt "A cinematic close-up of ocean waves at golden hour." \
  --negative-prompt "worst quality, inconsistent motion, blurry, jittery, distorted" \
  --height 512 \
  --width 768 \
  --num-frames 121 \
  --num-inference-steps 40 \
  --guidance-scale 4.0 \
  --frame-rate 24 \
  --enable-layerwise-offload \
  --output ltx2_out_layerwise.mp4

XuGuo699/DreamID-Omni

python x_to_video_audio.py \
  --model /path/to/vllm-project/vllm-omni/examples/offline_inference/x_to_video_audio/dreamid_omni \
  --prompt "<img1>: In the frame, a woman with black long hair is identified as <sub1>.\n**Overall Environment/Scene**: A lively open-kitchen café at night; stove flames flare, steam rises, and warm pendant lights swing slightly as staff move behind her. The shot is an upper-body close-up.\n**Main Characters/Subjects Appearance**: <sub1> is a young woman with thick dark wavy hair and a side part. She wears a fitted black top under a light apron, a thin gold chain necklace, and small stud earrings.\n**Main Characters/Subjects Actions**: <sub1> tastes the sauce with a spoon, then turns her face toward the camera while still holding the spoon, her expression shifting from focused to conflicted.\n<sub1> maintains eye contact, swallows as if choosing her words, and says, <S>I keep telling myself I’m fine,but some nights it feels like I’m just performing calm.<E>" \
  --image-path 9.png \
  --audio-path 9.wav \
  --video-negative-prompt "jitter, bad hands, blur, distortion" \
  --audio-negative-prompt "robotic, muffled, echo, distorted" \
  --cfg-parallel-size 2 \
  --num-inference-steps 45 \
  --height 704 \
  --width 1280 \
  --enable-layerwise-offload \
  --output out_dreamid_omni_layerwise.mp4

Test Result

Stats:

Model \ Offloading Disabled Enabled
Lightricks/LTX-2 Peak GPU memory (this request): 72.35 GB reserved, 69.15 GB allocated, 3.20 GB pool overhead (4.4%) Peak GPU memory (this request): 38.88 GB reserved, 35.35 GB allocated, 3.53 GB pool overhead (9.1%)
XuGuo699/DreamID-Omni (2 devices) Peak GPU memory (this request): 64.32 GB reserved, 55.31 GB allocated, 9.01 GB pool overhead (14.0%) Peak GPU memory (this request): 45.26 GB reserved, 35.59 GB allocated, 9.67 GB pool overhead (21.4%)

*Collected by DiffusionModelRunner._record_peak_memory

Generated Videos:

Model \ Offloading Disabled Enabled
Lightricks/LTX-2 https://github.com/user-attachments/assets/f029ebee-a7ea-4d70-a0c9-c6be1d758707 https://github.com/user-attachments/assets/96caebaf-1854-4feb-910c-513dca6ded67
XuGuo699/DreamID-Omni https://github.com/user-attachments/assets/89e7c3d4-cd6a-48d8-a991-b06c229aaa31 https://github.com/user-attachments/assets/7ac9b354-6eca-46ae-a5c9-c74d60210b06

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@yuanheng-zhao yuanheng-zhao force-pushed the feat/layerwise-offload-videogen branch 2 times, most recently from 86acc80 to 53745f6 Compare March 19, 2026 14:29
@yuanheng-zhao yuanheng-zhao force-pushed the feat/layerwise-offload-videogen branch from 53745f6 to 14cd581 Compare March 30, 2026 08:24
@yuanheng-zhao yuanheng-zhao force-pushed the feat/layerwise-offload-videogen branch 2 times, most recently from ad5e0ad to 02cd7b6 Compare April 15, 2026 15:55
@yuanheng-zhao yuanheng-zhao changed the title [WIP][Feat] Support layerwise CPU offloading for more videogen models [Do-Not-Merge][Feat] Support layerwise CPU offloading for more videogen models Apr 15, 2026
@yuanheng-zhao yuanheng-zhao marked this pull request as ready for review April 15, 2026 16:03
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@yuanheng-zhao
Copy link
Copy Markdown
Contributor Author

Wait for #2809 to be merged first, and then rebase

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
@yuanheng-zhao yuanheng-zhao force-pushed the feat/layerwise-offload-videogen branch from 35dee95 to 78f0b6e Compare April 16, 2026 03:36
@yuanheng-zhao
Copy link
Copy Markdown
Contributor Author

Wait for #2809 to be merged first, and then rebase

Done

@yuanheng-zhao yuanheng-zhao changed the title [Do-Not-Merge][Feat] Support layerwise CPU offloading for more videogen models [Feat] Support layerwise CPU offloading for more videogen models Apr 16, 2026
@yuanheng-zhao
Copy link
Copy Markdown
Contributor Author

Layerwise offloading now supported on LTX-2, DreamID-Omni.

cc @wtomin , @gcanlin , @Bounty-hunter

@wtomin wtomin self-requested a review April 17, 2026 03:10
@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Apr 17, 2026

According to #1832, DreamID-Omni has no L4 e2e test yet. Could you create tests/e2e/online_serving/test_dreamid_omni_expansion.py to cover it's acceleration features, like sp, and layerwise cpu offloading? @yuanheng-zhao

As for LTX-2, there is an existing PR #2815. I will remind @Songrui625 to cover the layerwise cpu offloading feature when this PR is merged.

@yuanheng-zhao
Copy link
Copy Markdown
Contributor Author

yuanheng-zhao commented Apr 17, 2026

According to #1832, DreamID-Omni has no L4 e2e test yet. Could you create tests/e2e/online_serving/test_dreamid_omni_expansion.py to cover it's acceleration features, like sp, and layerwise cpu offloading? @yuanheng-zhao

As for LTX-2, there is an existing PR #2815. I will remind @Songrui625 to cover the layerwise cpu offloading feature when this PR is merged.

For now it's still not easy to run DreamID-Omni, as it require extra download and a hacky way of installation (examples/offline_inference/x_to_video_audio/download_dreamid_omni.py) and an inconsistency to our main env (e.g., difference version of transformers/diffusers, etc). During supporting this feat it took me quite a while to successfully run DreamID-Omni (and then found out failed to run other models due to DreamID-Omni's updated dependency conflicts) - I'd suggest to split the fixing and adding of DreamID-Omni e2e to another issue. @wtomin

I'd like to raise another PR to fix both modeling and dependency for DreamID-Omni so that it could be compatible with main vllm-omni usage - this might depend on the upgrading of transformers on vllm-omni as well

@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Apr 17, 2026

I'd like to raise another PR to fix both modeling and dependency for DreamID-Omni so that it could be compatible with main vllm-omni usage - this might depend on the upgrading of transformers on vllm-omni as well

I am totally fine with it.

@wtomin wtomin requested a review from lishunyang12 April 17, 2026 08:11
Comment thread vllm_omni/diffusion/models/dreamid_omni/fusion.py Outdated
Comment thread vllm_omni/diffusion/models/dreamid_omni/fusion.py Outdated
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
@yuanheng-zhao
Copy link
Copy Markdown
Contributor Author

For updated commit, layerwise offloading enabled

out_dreamid_omni_oneip-3.mp4

@lishunyang12 lishunyang12 added the ready label to trigger buildkite CI label Apr 18, 2026
@yuanheng-zhao
Copy link
Copy Markdown
Contributor Author

cc @wtomin , @hsliuustc0106

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lishunyang12
Copy link
Copy Markdown
Collaborator

LGTM

@lishunyang12 lishunyang12 merged commit 68f28f9 into vllm-project:main Apr 19, 2026
8 checks passed
@yuanheng-zhao yuanheng-zhao deleted the feat/layerwise-offload-videogen branch April 19, 2026 17:23
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
…m-project#2018)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026
…m-project#2018)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants