[Refactor] Let diffusion pipelines declare offloadable modules via SupportsModuleOffload#2427
Conversation
0f85f38 to
8876037
Compare
|
@yuanheng-zhao PTAL |
|
Could you add a column in your table to show the current main branch memory and performance? |
yuanheng-zhao
left a comment
There was a problem hiding this comment.
Thanks for contributing. It's good to have SupportsModuleOffload as an interface to adapt module level offloading for new models more flexibly. Left some comments
Updated the table, on main branch, OmniGen2 actually crashes with --enable-cpu-offload, since the modules are on wrong devices. |
lishunyang12
left a comment
There was a problem hiding this comment.
left a couple of small things, overall approach looks good.
5409f46 to
1699909
Compare
|
add tests since it introduces a new class. does it affect api and ux? |
1699909 to
b65fe3d
Compare
Unit test added. This PR alone should not affect UX and external API, do affect model authors. After all the models are migrated to the explicit path we can drop the fallback, and throw an error when offload is enabled on unsupported model rather than crashing, that's when UX would be improved. |
|
Also: I find SupportsModuleOffload to be not very descriptive, what do you think would be better, SupportsSequentialOffload? |
b65fe3d to
f4ffc03
Compare
|
Rebased, conflicts with #2339 |
f4ffc03 to
67d39bd
Compare
yuanheng-zhao
left a comment
There was a problem hiding this comment.
LGTM. This PR will be helpful for other models with uncommon attr names and multiple VAE/encoder components as well.
3cc4711 to
3babe17
Compare
3babe17 to
4970b73
Compare
|
Added support for nested modules, and declared SupportModuleOffload for Bagel and LTX2 |
4970b73 to
37354be
Compare
I see that there are skills in both .claude/skills of this repo, and https://github.com/hsliuustc0106/vllm-omni-skills, which one should be considered the authoritative source? |
this repo please |
0e1ecc5 to
c8099c1
Compare
Done. |
c8099c1 to
61cd06f
Compare
Huh why's librosa back. |
61cd06f to
fef0586
Compare
|
It's due to huggingface ratelimit, could anyone restart it? |
|
Weird, they were happening during cpu unit tests. I checked several recent commits and didn't find the same rate limit issue. Could you merge main to trigger CI again? |
fef0586 to
288db29
Compare
|
The doc failure seems unrelated? Can't tell. |
…pportsModuleOffload ModuleDiscovery previously hardcoded attribute names to find DiT, encoder, and VAE modules for CPU offload. This silently failed for pipelines using non-standard names (e.g. OmniGen2's 'mllm', Bagel's 'vit_model', MammothModa2's 'gen_transformer'/'gen_vae'), leaving multi-GB models idle on GPU during the denoising loop. Add SupportsModuleOffload protocol to the pipeline interface. Pipelines declare _dit_modules, _encoder_modules, and _vae_modules as class variables, and ModuleDiscovery.discover() reads them directly. Both DiT and encoder lists are needed because the offload hooks use mutual exclusion. Pipelines without the protocol fall back to the existing attribute name scan. Also update PipelineModules.vae to PipelineModules.vaes (list) to support pipelines with multiple VAEs (e.g. LTX2's audio_vae, DreamIDOmni's vae_model_audio). Both sequential and layerwise offload backends updated to iterate the list. Behavioral changes from unifying collection logic into _collect_modules: - Encoder collection now checks isinstance(nn.Module) (original did not) — prevents non-Module objects from reaching .to(device). - Encoder collection now deduplicates (original did not) — avoids double hook registration when two attrs point to the same module. - Non-Module attributes are warned when declared via the protocol (pipeline authoring bug), silently skipped in fallback path. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
Add SupportsModuleOffload to OmniGen2Pipeline so ModuleDiscovery
can find the Qwen2.5-VL text encoder ('mllm', ~6-16 GB) for
sequential CPU offload. Previously, 'mllm' was not in the hardcoded
attribute scan list, so enable_cpu_offload silently left it on GPU
during the entire denoising loop.
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
Allow dotted attribute names (e.g. "pipe.transformer") in _dit_modules, _encoder_modules, and _vae_modules to resolve nested modules via operator.attrgetter. This handles pipelines like LTX2TwoStagesPipeline where the transformer lives under a child pipeline (pipe.transformer), and Bagel where the encoder is at language_model.model. Flat attribute names continue to work unchanged. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
…offload Add _resident_modules class variable to SupportsModuleOffload for small submodules that must stay on GPU during layer-wise offloading (e.g. embedders, connectors). Defaults to empty list. During layerwise offload, pipelines load everything to CPU and the offloader selectively moves dit/encoder/vae groups to GPU. Modules outside these groups stay on CPU, which breaks pipelines like Bagel where time_embedder, vae2llm, vit_model etc. are needed every forward pass but are not children of any discovered group. _resident_modules lets pipelines declare these modules explicitly. The layerwise backend pins them on GPU alongside encoders and VAEs. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
Add 'To Support a Model' section under model-level offloading showing how to implement the SupportsModuleOffload protocol. Restore the layerwise 'To Support a Model' section under its own parent. Update the Module Discovery section to document both protocol-based and fallback attribute scan discovery paths. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
LTX2 two-stage pipelines have nested module structure where the DiT, encoders, and VAEs live under self.pipe. The fallback attribute scan cannot find them, causing layerwise offloading to skip DiT discovery entirely. Implement SupportsModuleOffload on LTX2TwoStagesPipeline and LTX2ImageToVideoTwoStagesPipeline using dotted paths to reach nested modules (pipe.transformer, pipe.text_encoder, pipe.vae, pipe.audio_vae). Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
BagelPipeline has non-standard module layout: the DiT lives at language_model.model, and several small modules under self.bagel (time_embedder, vae2llm, llm2vae, latent_pos_embed, vit_model, connector, vit_pos_embed) are needed every forward pass but are not children of the DiT. Implement SupportsModuleOffload with _resident_modules to pin these small modules on GPU during layerwise offloading. Without this, they stay on CPU (offload pipelines skip self.to(device)) and forward() fails with device mismatch. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
Add Step 11 (CPU Offload Support) covering SupportsModuleOffload protocol: _dit_modules, _encoder_modules, _vae_modules, _resident_modules, dotted path support. Add cpu_offload_diffusion.md to Step 7 required docs list. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Nick Cao <ncao@redhat.com>
288db29 to
7df49c3
Compare
…upportsModuleOffload (vllm-project#2427) Signed-off-by: Nick Cao <ncao@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>
Purpose
ModuleDiscovery previously hardcoded attribute names to find DiT,
encoder, and VAE modules for CPU offload. This silently failed for
pipelines using non-standard names (e.g. OmniGen2's 'mllm', Bagel's
'vit_model', MammothModa2's 'gen_transformer'/'gen_vae'), leaving
multi-GB models idle on GPU during the denoising loop.
Add SupportsModuleOffload protocol to the pipeline interface.
Pipelines declare _dit_modules, _encoder_modules, and _vae_modules
as class variables, and ModuleDiscovery.discover() reads them
directly. Both DiT and encoder lists are needed because the offload
hooks use mutual exclusion. Pipelines without the protocol fall back
to the existing attribute name scan.
Also update PipelineModules.vae to PipelineModules.vaes (list) to
support pipelines with multiple VAEs (e.g. LTX2's audio_vae,
DreamIDOmni's vae_model_audio). Both sequential and layerwise
offload backends updated to iterate the list.
Behavioral changes from unifying collection logic into
_collect_modules:
did not) — prevents non-Module objects from reaching .to(device).
double hook registration when two attrs point to the same module.
(pipeline authoring bug), silently skipped in fallback path.
Test Plan
vllm serve --omni --model OmniGen2/OmniGen2 --port 8091 vllm serve --omni --model OmniGen2/OmniGen2 --port 8091 --enable-cpu-offload # send image generation requestsTest Result
--enable-cpu-offloadNo encoder modules found, model stays on CPU--enable-cpu-offloadtransformer <-> mllmmutual exclusion activeEssential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)