[BugFix] Fix layerwise CPU offloading for LTX2 two-stages pipeline#2935
[BugFix] Fix layerwise CPU offloading for LTX2 two-stages pipeline#2935Songrui625 wants to merge 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Songrui625 <songrui625@gmail.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
@wtomin @hsliuustc0106 @lishunyang12 PTAL. Thanks. |
hsliuustc0106
left a comment
There was a problem hiding this comment.
BLOCKING:
- Test Coverage — Missing regression test. Please add an automated test that verifies layerwise offloading correctly discovers DiT/transformer modules in nested pipeline structures like LTX2TwoStagesPipeline. The current test plan only provides manual server startup verification.
|
@yuanheng-zhao PTAL, this is your domain |
Signed-off-by: Songrui625 <songrui625@gmail.com>
6ae1a95 to
9f6b04b
Compare
Added a simple test case to test it. CC @yuanheng-zhao |
| module = find_module_with_attr(pipeline, attr) | ||
| if module is None: | ||
| continue | ||
| pipeline = module |
There was a problem hiding this comment.
The reassignment to pipeline here descend other dit module which it's going to look for under the current one - which I think might be not that stable as it discards the root
There was a problem hiding this comment.
It's better to have some tracking from the outmost wrapper pipeline to the transformer module which contains offloadable layers
| self.upsample_pipe = DummyPipeline() | ||
|
|
||
|
|
||
| class TestModuleDiscovery: |
There was a problem hiding this comment.
What if a DiT on both pipe and upsample_pipe is found? The current resolution seems to fail on it
|
I think we might prefer a user(developer)-specified way to control how the target transformer(s) should be found. Even #2427 does not handle the condition of recursively looking for transformers from children modules. @NickCao Might want to add this handling? For example, enable looking for modules in childrens _dit_modules: ClassVar[list[str]] = ["pipe.transformer"] |
LGTM, totally agree with it. let's track this case in the PR #2427. Happy to help if needed. |
|
It is solved by the PR #2427. |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR fixes layerwise CPU offloading for LTX-2 two-stage pipelines, LTX2TwoStagesPipeline and LTX2ImageToVideoTwoStagesPipeline.
I'm working on adding L4 tests for diffusion model LTX-2 #2815. After PR #2018 was merged into main branch, the layerwise CPU offloading tests of LTX-2 two-stages pipelines (LTX2TwoStagesPipeline and LTX2ImageToVideoTwoStagesPipeline) failed.
The server crashes at start up stage as DiT modules are not found in layerwise CPU offloading context. As hinted at line 7 in the code block below:
WARNING 04-19 22:52:46 [layerwise_backend.py:293] No DiT/transformer modules found, skipping layer-wise offloadingTest Plan
test_module_collector.pypassed.Test Result
test_module_collector.pypassedThe output of omni server startup successfully
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)