[BugFix] Fix layerwise CPU offloading for LTX2 two-stages pipeline by Songrui625 · Pull Request #2935 · vllm-project/vllm-omni

Songrui625 · 2026-04-20T06:48:23Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR fixes layerwise CPU offloading for LTX-2 two-stage pipelines, LTX2TwoStagesPipeline and LTX2ImageToVideoTwoStagesPipeline.

I'm working on adding L4 tests for diffusion model LTX-2 #2815. After PR #2018 was merged into main branch, the layerwise CPU offloading tests of LTX-2 two-stages pipelines (LTX2TwoStagesPipeline and LTX2ImageToVideoTwoStagesPipeline) failed.

The server crashes at start up stage as DiT modules are not found in layerwise CPU offloading context. As hinted at line 7 in the code block below: WARNING 04-19 22:52:46 [layerwise_backend.py:293] No DiT/transformer modules found, skipping layer-wise offloading

$ /app/.venv/bin/python -m vllm_omni.entrypoints.cli.main serve /data00/models/LTX-2-19b-distilled --host [127.0.0.1](http://127.0.0.1/) --port 58713 --omni --enable-layerwise-offload --stage-init-timeout 600 --init-timeout 900 --model-class-name LTX2TwoStagesPipelin
...
INFO 04-19 22:52:45 [diffusers_loader.py:324] Loading weights took 2.66 seconds
INFO 04-19 22:52:46 [diffusion_model_runner.py:142] Model loading took 28.8962 GiB and 12.764463 seconds
INFO 04-19 22:52:46 [diffusion_model_runner.py:147] Model runner: Model loaded successfully.
INFO 04-19 22:52:46 [diffusion_model_runner.py:159]  Enabling offloader backend: LayerWiseOffloadBackend
WARNING 04-19 22:52:46 [layerwise_backend.py:293] No DiT/transformer modules found, skipping layer-wise offloading # <======== Here is the point!

INFO 04-19 22:52:46 [diffusion_model_runner.py:188] Model runner: Initialization complete.
INFO 04-19 22:52:46 [diffusion_worker.py:175] Worker 0: Process-scoped GPU memory after model loading: 0.00 GiB.
INFO 04-19 22:52:46 [manager.py:96] Initializing DiffusionLoRAManager: device=cuda:0, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
INFO 04-19 22:52:46 [diffusion_worker.py:91] Worker 0: Initialization complete.
INFO 04-19 22:52:46 [diffusion_worker.py:555] Worker 0: Scheduler loop started.
INFO 04-19 22:52:46 [diffusion_worker.py:478] Worker 0 ready to receive requests via shared memory
(APIServer pid=833315) INFO 04-19 22:52:46 [diffusion_engine.py:443] dummy run to warm up the model
WARNING 04-19 22:52:46 [kv_transfer_manager.py:985] No connector available for receiving KV cache
  0%|                                                                                                                                                                                         | 0/8 [00:00<?, ?it/s]
ERROR 04-19 22:52:47 [diffusion_worker.py:765] Error executing method 'execute_model'. This might cause issues in distributed execution.
ERROR 04-19 22:52:47 [diffusion_worker.py:765] Traceback (most recent call last):
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/vllm-omni/vllm_omni/diffusion/worker/diffusion_worker.py", line 761, in execute_method
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     return func(*args, **kwargs)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/vllm-omni/vllm_omni/diffusion/worker/diffusion_worker.py", line 236, in execute_model
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     output = self.model_runner.execute_model(req)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/vllm-omni/vllm_omni/diffusion/worker/diffusion_model_runner.py", line 276, in execute_model
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     output = self.pipeline.forward(req)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]              ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/vllm-omni/vllm_omni/diffusion/models/ltx2/pipeline_ltx2.py", line 1228, in forward
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     video_latent, audio_latent = self.pipe(
ERROR 04-19 22:52:47 [diffusion_worker.py:765]                                  ^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     return self._call_impl(*args, **kwargs)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     return forward_call(*args, **kwargs)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     return func(*args, **kwargs)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/vllm-omni/vllm_omni/diffusion/models/ltx2/pipeline_ltx2.py", line 1069, in forward
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     noise_pred_video, noise_pred_audio = self.predict_noise_maybe_with_cfg(
ERROR 04-19 22:52:47 [diffusion_worker.py:765]                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/vllm-omni/vllm_omni/diffusion/distributed/cfg_parallel.py", line 133, in predict_noise_maybe_with_cfg
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     positive_noise_pred = _wrap(self.predict_noise(**positive_kwargs))
ERROR 04-19 22:52:47 [diffusion_worker.py:765]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/vllm-omni/vllm_omni/diffusion/models/ltx2/pipeline_ltx2.py", line 717, in predict_noise
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     noise_pred_video, noise_pred_audio = self.transformer(**kwargs)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     return self._call_impl(*args, **kwargs)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     return forward_call(*args, **kwargs)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/vllm-omni/vllm_omni/diffusion/models/ltx2/ltx2_transformer.py", line 1655, in forward
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     hidden_states = self.proj_in(hidden_states)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     return self._call_impl(*args, **kwargs)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     return forward_call(*args, **kwargs)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765]   File "/app/.venv/lib/python3.12/site-packages/torch/nn/modules/linear.py", line 134, in forward
ERROR 04-19 22:52:47 [diffusion_worker.py:765]     return F.linear(input, self.weight, self.bias)
ERROR 04-19 22:52:47 [diffusion_worker.py:765]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-19 22:52:47 [diffusion_worker.py:765] RuntimeError: Expected all tensors to be on the same device, but got mat2 is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA_mm)

Test Plan

New added test test_module_collector.py passed.

pytest -v tests/diffusion/offloader/test_module_collector.py

Start omni server for pipeline LTX2TwoStagesPipelin no crash.

/app/.venv/bin/python -m vllm_omni.entrypoints.cli.main serve /data00/models/LTX-2-19b-distilled --host [127.0.0.1](http://127.0.0.1/) --port 58713 --omni --enable-layerwise-offload --stage-init-timeout 600 --init-timeout 900 --model-class-name LTX2TwoStagesPipeline

Test Result

All 2 test case from test_module_collector.py passed

(app) root@iv-ye1ye80vlsxjd1txczgc:/app/vllm-omni/tests/diffusion/offloader# pytest -v test_module_collector.py
=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.12.12, pytest-9.0.3, pluggy-1.6.0 -- /app/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /app/vllm-omni
configfile: pyproject.toml
plugins: mock-3.15.1, anyio-4.12.1
collected 2 items

test_module_collector.py::TestModuleDiscovery::test_discover_basic PASSED                                                                                                                                    [ 50%]
test_module_collector.py::TestModuleDiscovery::test_discover_nested PASSED                                                                                                                                   [100%]

================================================================================================= warnings summary =================================================================================================
../../../vllm_omni/version.py:55
  /app/vllm-omni/vllm_omni/version.py:55: RuntimeWarning: vLLM and vLLM-Omni appear to have mismatched major/minor versions:
   --> vLLM-Omni version 0.1.dev1338+gf0756914d.d20260420
   --> vLLM version 0.19.0
  This will likely cause compatibility issues.
    warn_if_misaligned_vllm_version()

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../../../../.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: 14 warnings
  /app/.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

../../../../.venv/lib/python3.12/site-packages/_pytest/config/__init__.py:1434
  /app/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py:1434: PytestConfigWarning: Unknown config option: asyncio_mode

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================== 2 passed, 18 warnings in 0.10s ==========================================================================================

Omni server startup successfully.

The output of omni server startup successfully

:/app/vllm-omni# /app/.venv/bin/python -m vllm_omni.entrypoints.cli.main serve /data00/models/LTX-2-19b-distilled --host 127.0.0.1 --port 58713 --omni --enable-layerwise-offload --stage-init-timeout 600 --init-timeout 900 --model-class-name LTX2TwoStagesPipeline
/app/vllm-omni/vllm_omni/version.py:55: RuntimeWarning: vLLM and vLLM-Omni appear to have mismatched major/minor versions:
 --> vLLM-Omni version 0.1.dev1338+gf0756914d.d20260420
 --> vLLM version 0.19.0
This will likely cause compatibility issues.
  warn_if_misaligned_vllm_version()
/app/.venv/lib/python3.12/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
INFO 04-19 23:14:31 [serve.py:116] Detected diffusion model: /data00/models/LTX-2-19b-distilled
INFO 04-19 23:14:31 [logo.py:45]        █     █     █▄   ▄█       ▄▀▀▀▀▄ █▄   ▄█ █▄    █ ▀█▀
INFO 04-19 23:14:31 [logo.py:45]  ▄▄ ▄█ █     █     █ ▀▄▀ █  ▄▄▄  █    █ █ ▀▄▀ █ █ ▀▄  █  █
INFO 04-19 23:14:31 [logo.py:45]   █▄█▀ █     █     █     █       █    █ █     █ █   ▀▄█  █
INFO 04-19 23:14:31 [logo.py:45]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀        ▀▀▀▀  ▀     ▀ ▀     ▀ ▀▀▀
INFO 04-19 23:14:31 [logo.py:45]
(APIServer pid=836403) INFO 04-19 23:14:31 [utils.py:299] vLLM server version 0.19.0, serving model /data00/models/LTX-2-19b-distilled
(APIServer pid=836403) INFO 04-19 23:14:31 [utils.py:233] non-default args: {'model_tag': '/data00/models/LTX-2-19b-distilled', 'host': '127.0.0.1', 'port': 58713, 'model': '/data00/models/LTX-2-19b-distilled'}
(APIServer pid=836403) INFO 04-19 23:14:31 [omni_base.py:139] [AsyncOmni] Initializing with model /data00/models/LTX-2-19b-distilled
(APIServer pid=836403) INFO 04-19 23:14:31 [async_omni_engine.py:272] [AsyncOmniEngine] Initializing with model /data00/models/LTX-2-19b-distilled
(APIServer pid=836403) WARNING 04-19 23:14:31 [utils.py:177] Filtered out 1 callable object(s) from base_engine_args that are not compatible with OmegaConf: ['dispatch_function'].
(APIServer pid=836403) INFO 04-19 23:14:31 [async_omni_engine.py:329] [AsyncOmniEngine] Launching Orchestrator thread with 1 stages
(APIServer pid=836403) INFO 04-19 23:14:31 [async_omni_engine.py:748] [AsyncOmniEngine] Initializing stage 0
(APIServer pid=836403) INFO 04-19 23:14:31 [stage_init_utils.py:384] [stage_init] Stage-0 set runtime devices: 0
(APIServer pid=836403) INFO 04-19 23:14:32 [multiproc_executor.py:105] Starting server...
/app/vllm-omni/vllm_omni/version.py:55: RuntimeWarning: vLLM and vLLM-Omni appear to have mismatched major/minor versions:
 --> vLLM-Omni version 0.1.dev1338+gf0756914d.d20260420
 --> vLLM version 0.19.0
This will likely cause compatibility issues.
  warn_if_misaligned_vllm_version()
/app/.venv/lib/python3.12/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
INFO 04-19 23:14:40 [diffusion_worker.py:417] Worker 0 created result MessageQueue
INFO 04-19 23:14:40 [scheduler.py:238] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 04-19 23:14:40 [vllm.py:790] Asynchronous scheduling is enabled.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
INFO 04-19 23:14:40 [diffusion_worker.py:127] Worker 0: Initialized device and distributed environment.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
INFO 04-19 23:14:40 [parallel_state.py:588] Building SP subgroups from explicit sp_group_ranks (sp_size=1, ulysses=1, ring=1, use_ulysses_low=True).
INFO 04-19 23:14:40 [parallel_state.py:630] SP group details for rank 0: sp_group=[0], ulysses_group=[0], ring_group=[0]
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:02<00:00,  5.44it/s]
INFO 04-19 23:14:49 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
INFO 04-19 23:14:50 [manager.py:96] Initializing DiffusionLoRAManager: device=cuda:0, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
Multi-thread loading shards:   0% Completed | 0/8 [00:00<?, ?it/s]
Multi-thread loading shards:  12% Completed | 1/8 [00:00<00:02,  2.40it/s]
Multi-thread loading shards:  25% Completed | 2/8 [00:00<00:02,  2.61it/s]
Multi-thread loading shards:  38% Completed | 3/8 [00:00<00:01,  3.48it/s]
Multi-thread loading shards:  50% Completed | 4/8 [00:01<00:01,  3.27it/s]
Multi-thread loading shards:  62% Completed | 5/8 [00:01<00:00,  3.13it/s]
Multi-thread loading shards:  75% Completed | 6/8 [00:01<00:00,  3.00it/s]
Multi-thread loading shards:  88% Completed | 7/8 [00:02<00:00,  2.93it/s]
Multi-thread loading shards: 100% Completed | 8/8 [00:02<00:00,  3.26it/s]
Multi-thread loading shards: 100% Completed | 8/8 [00:02<00:00,  3.10it/s]

INFO 04-19 23:14:53 [diffusers_loader.py:324] Loading weights took 2.62 seconds
INFO 04-19 23:14:53 [diffusion_model_runner.py:142] Model loading took 28.8962 GiB and 12.712799 seconds
INFO 04-19 23:14:53 [diffusion_model_runner.py:147] Model runner: Model loaded successfully.
INFO 04-19 23:14:53 [diffusion_model_runner.py:159]  Enabling offloader backend: LayerWiseOffloadBackend
INFO 04-19 23:14:53 [layerwise_backend.py:307] Applying layer-wise offloading on ['transformer', 'language_model', 'model']
INFO 04-19 23:14:53 [layerwise_backend.py:313] Applying hooks on transformer (LTX2VideoTransformer3DModel)
INFO 04-19 23:15:09 [layerwise_backend.py:385] Layer-wise offloading enabled on 48 layers (blocks)
INFO 04-19 23:15:09 [layerwise_backend.py:313] Applying hooks on language_model (Gemma3TextModel)
WARNING 04-19 23:15:09 [layerwise_backend.py:443] No _layerwise_offload_blocks_attrs defined for Gemma3TextModel, skipping layerwise offloading
WARNING 04-19 23:15:09 [layerwise_backend.py:318] Target layers (blocks) not found. Skipping offloading on language_model (Gemma3TextModel)
INFO 04-19 23:15:09 [layerwise_backend.py:313] Applying hooks on model (Gemma3Model)
WARNING 04-19 23:15:09 [layerwise_backend.py:443] No _layerwise_offload_blocks_attrs defined for Gemma3Model, skipping layerwise offloading
WARNING 04-19 23:15:09 [layerwise_backend.py:318] Target layers (blocks) not found. Skipping offloading on model (Gemma3Model)
INFO 04-19 23:15:09 [diffusion_model_runner.py:188] Model runner: Initialization complete.
INFO 04-19 23:15:09 [diffusion_worker.py:175] Worker 0: Process-scoped GPU memory after model loading: 0.00 GiB.
INFO 04-19 23:15:09 [manager.py:96] Initializing DiffusionLoRAManager: device=cuda:0, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
INFO 04-19 23:15:09 [diffusion_worker.py:91] Worker 0: Initialization complete.
INFO 04-19 23:15:09 [diffusion_worker.py:555] Worker 0: Scheduler loop started.
INFO 04-19 23:15:09 [diffusion_worker.py:478] Worker 0 ready to receive requests via shared memory
(APIServer pid=836403) INFO 04-19 23:15:09 [diffusion_engine.py:443] dummy run to warm up the model
WARNING 04-19 23:15:09 [kv_transfer_manager.py:985] No connector available for receiving KV cache
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:11<00:00,  1.41s/it]
INFO 04-19 23:15:21 [pipeline_ltx2.py:922] Got latents of shape [batch_size, latent_dim, latent_frames, latent_height, latent_width], `latent_num_frames`, `latent_height`, `latent_width` will be inferred.
INFO 04-19 23:15:21 [pipeline_ltx2.py:958] Got audio_latents of shape [batch_size, num_channels, audio_length, mel_bins], `audio_num_frames` will be inferred.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.42it/s]
INFO 04-19 23:15:23 [diffusion_model_runner.py:213] Peak GPU memory (this request): 35.98 GB reserved, 33.86 GB allocated, 2.12 GB pool overhead (5.9%)
(APIServer pid=836403) INFO 04-19 23:15:23 [inline_stage_diffusion_client.py:63] [InlineStageDiffusionClient] Stage-0 initialized inline (batch_size=1)
(APIServer pid=836403) INFO 04-19 23:15:23 [async_omni_engine.py:803] [AsyncOmniEngine] Stage 0 initialized (diffusion, batch_size=1)
(APIServer pid=836403) INFO 04-19 23:15:23 [orchestrator.py:185] [Orchestrator] Starting event loop
(APIServer pid=836403) INFO 04-19 23:15:23 [async_omni_engine.py:371] [AsyncOmniEngine] Orchestrator ready with 1 stages
(APIServer pid=836403) INFO 04-19 23:15:23 [omni_base.py:152] [AsyncOmni] AsyncOmniEngine initialized in 51.98 seconds
(APIServer pid=836403) INFO 04-19 23:15:23 [omni_base.py:167] [AsyncOmni] Initialized with 1 stages for model /data00/models/LTX-2-19b-distilled
(APIServer pid=836403) INFO 04-19 23:15:24 [api_server.py:477] Detected pure diffusion mode (single diffusion stage)
(APIServer pid=836403) INFO 04-19 23:15:24 [api_server.py:528] Pure diffusion API server initialized for model: /data00/models/LTX-2-19b-distilled
(APIServer pid=836403) INFO 04-19 23:15:24 [api_server.py:323] Starting vLLM API server (pure diffusion mode) on http://127.0.0.1:58713
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:37] Available routes are:
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /openapi.json, Methods: HEAD, GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /docs, Methods: HEAD, GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: HEAD, GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /redoc, Methods: HEAD, GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/chat/completions/batch, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/messages/count_tokens, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/audio/speech, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/audio/speech/batch, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/audio/voices, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/audio/voices, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/audio/voices/{name}, Methods: DELETE
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/images/generations, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/images/edits, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/videos, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/videos/sync, Methods: POST
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/videos, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/videos/{video_id}, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/videos/{video_id}, Methods: DELETE
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:46] Route: /v1/videos/{video_id}/content, Methods: GET
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:57] Route: /v1/audio/speech/stream, Endpoint: streaming_speech
(APIServer pid=836403) INFO 04-19 23:15:24 [launcher.py:57] Route: /v1/realtime, Endpoint: realtime_websocket
(APIServer pid=836403) INFO:     Started server process [836403]
(APIServer pid=836403) INFO:     Waiting for application startup.
(APIServer pid=836403) INFO:     Application startup complete.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Songrui625 <songrui625@gmail.com>

chatgpt-codex-connector · 2026-04-20T06:48:29Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Songrui625 · 2026-04-20T06:50:15Z

@wtomin @hsliuustc0106 @lishunyang12 PTAL. Thanks.

hsliuustc0106

BLOCKING:

Test Coverage — Missing regression test. Please add an automated test that verifies layerwise offloading correctly discovers DiT/transformer modules in nested pipeline structures like LTX2TwoStagesPipeline. The current test plan only provides manual server startup verification.

hsliuustc0106 · 2026-04-20T11:28:01Z

@yuanheng-zhao PTAL, this is your domain

Signed-off-by: Songrui625 <songrui625@gmail.com>

Songrui625 · 2026-04-20T11:42:53Z

BLOCKING:

Test Coverage — Missing regression test. Please add an automated test that verifies layerwise offloading correctly discovers DiT/transformer modules in nested pipeline structures like LTX2TwoStagesPipeline. The current test plan only provides manual server startup verification.

Added a simple test case to test it. CC @yuanheng-zhao

yuanheng-zhao

This can be considered as a temporary fix for LTX2 two stages pipeline: as LTX2TwoStagesPipeline wraps LTX2Pipeline instance inside so the transformer module collector fails to find it.

Related PR #2427 cc @NickCao

yuanheng-zhao · 2026-04-20T11:58:45Z

+                module = find_module_with_attr(pipeline, attr)
+                if module is None:
+                    continue
+                pipeline = module


The reassignment to pipeline here descend other dit module which it's going to look for under the current one - which I think might be not that stable as it discards the root

It's better to have some tracking from the outmost wrapper pipeline to the transformer module which contains offloadable layers

yuanheng-zhao · 2026-04-20T12:04:58Z

+        self.upsample_pipe = DummyPipeline()
+
+
+class TestModuleDiscovery:


What if a DiT on both pipe and upsample_pipe is found? The current resolution seems to fail on it

yuanheng-zhao · 2026-04-20T12:14:47Z

I think we might prefer a user(developer)-specified way to control how the target transformer(s) should be found. Even #2427 does not handle the condition of recursively looking for transformers from children modules.

@NickCao Might want to add this handling? For example, enable looking for modules in childrens A.B.transformer:

_dit_modules: ClassVar[list[str]] = ["pipe.transformer"]

Songrui625 · 2026-04-20T12:19:33Z

I think we might prefer a user(developer)-specified way to control how the target transformer(s) should be found. Even #2427 does not handle the condition of recursively looking for transformers from children modules.

@NickCao Might want to add this handling? For example, enable looking for modules in childrens A.B.transformer:
_dit_modules: ClassVar[list[str]] = ["pipe.transformer"]

LGTM, totally agree with it. let's track this case in the PR #2427. Happy to help if needed.

Songrui625 · 2026-04-21T03:46:16Z

It is solved by the PR #2427.

[BugFix] Fix layerwise CPU offloading for LTX2 two-stages pipeline

d535c30

Signed-off-by: Songrui625 <songrui625@gmail.com>

Songrui625 requested a review from hsliuustc0106 as a code owner April 20, 2026 06:48

Songrui625 mentioned this pull request Apr 20, 2026

[Test] Add L4 complete diffusion feature test for LTX-2 model #2815

Open

5 tasks

hsliuustc0106 reviewed Apr 20, 2026

View reviewed changes

Songrui625 added 2 commits April 20, 2026 19:38

Add unittest for module_collector

2730a8d

Signed-off-by: Songrui625 <songrui625@gmail.com>

Merge branch 'main' into fix-ltx2-2stages-offload

9f6b04b

Songrui625 force-pushed the fix-ltx2-2stages-offload branch from 6ae1a95 to 9f6b04b Compare April 20, 2026 11:39

yuanheng-zhao reviewed Apr 20, 2026

View reviewed changes

Songrui625 closed this Apr 21, 2026

Songrui625 mentioned this pull request Apr 23, 2026

[Bug]: Failed to run LTX-2 two-stage pipelines when HSDP is enabled #3062

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix layerwise CPU offloading for LTX2 two-stages pipeline#2935

[BugFix] Fix layerwise CPU offloading for LTX2 two-stages pipeline#2935
Songrui625 wants to merge 3 commits intovllm-project:mainfrom
Songrui625:fix-ltx2-2stages-offload

Songrui625 commented Apr 20, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 20, 2026

Uh oh!

Songrui625 commented Apr 20, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

hsliuustc0106 commented Apr 20, 2026

Uh oh!

Songrui625 commented Apr 20, 2026

Uh oh!

yuanheng-zhao left a comment

Uh oh!

yuanheng-zhao Apr 20, 2026

Uh oh!

yuanheng-zhao Apr 20, 2026

Uh oh!

yuanheng-zhao Apr 20, 2026

Uh oh!

yuanheng-zhao commented Apr 20, 2026

Uh oh!

Songrui625 commented Apr 20, 2026

Uh oh!

Songrui625 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		self.upsample_pipe = DummyPipeline()


		class TestModuleDiscovery:

Conversation

Songrui625 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 20, 2026

Uh oh!

Songrui625 commented Apr 20, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 20, 2026

Uh oh!

Songrui625 commented Apr 20, 2026

Uh oh!

yuanheng-zhao left a comment

Choose a reason for hiding this comment

Uh oh!

yuanheng-zhao Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

yuanheng-zhao Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

yuanheng-zhao Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

yuanheng-zhao commented Apr 20, 2026

Uh oh!

Songrui625 commented Apr 20, 2026

Uh oh!

Songrui625 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Songrui625 commented Apr 20, 2026 •

edited

Loading