[Test] Add L4 diffusion feature test for LongCat-Image by lcukyfuture · Pull Request #1970 · vllm-project/vllm-omni

lcukyfuture · 2026-03-18T03:16:56Z

Purpose

To follow the recent establishment of multi-level testing system, this PR adds L4 test for LongCat-Image.

Test Plan

The most recent list of diffusion features is at #1217. The test covers the supported LongCat-Image diffusion feature matrix in online serving mode, including:

Module-wise CPU offloading
Cache-DiT
SP (Ulysses & Ring)

Settings

(1 GPU) Module-wise CPU offloading only
(2 GPUs) CacheDiT + Ulysses2

Test Result

Collection check:

pytest tests/e2e/online_serving/test_longcat_image_expansion.py --collect-only -m diffusion

platform linux -- Python 3.12.13, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/vllm-omni
configfile: pyproject.toml
plugins: asyncio-1.3.0, cov-7.0.0, mock-3.15.1, anyio-4.12.1
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                

<Dir vllm-omni>
  <Package tests>
    <Package e2e>
      <Package online_serving>
        <Module test_longcat_image_expansion.py>
          <Function test_longcat_image[single_card_001]>
          <Function test_longcat_image[parallel_001]>

Additional Test Results:

pytest -s -v tests/e2e/online_serving/test_longcat_image_edit_expansion.py \
  -m "diffusion and advanced_model and H100" \
  --collect-only

platform linux -- Python 3.12.13, pytest-9.0.2, pluggy-1.6.0 -- /home/vllm-omni/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/vllm-omni
configfile: pyproject.toml
plugins: asyncio-1.3.0, cov-7.0.0, mock-3.15.1, anyio-4.12.1
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                                                                                

<Dir vllm-omni>
  <Package tests>
    <Package e2e>
      <Package online_serving>
        <Module test_longcat_image_expansion.py>
          Recommended tests of diffusion features that are available in online serving mode
          and are supported by the following model:
          - LongCat-Image: text-to-image with single prompt input
          Coverage:
          - CPU offloading (model-level sequential offload via --enable-cpu-offload)
          - Cache-DiT
          - SP (Ulysses)
          
          This validates:
           - Successful image generation at the expected 768x1344 resolution with recommended feature combinations
          <Function test_longcat_image[single_card_001]>
            Test the recommended feature combinations for LongCat-Image.
          <Function test_longcat_image[parallel_001]>
            Test the recommended feature combinations for LongCat-Image.

CI Test Result [Passed]

CI Successful in both 2 tests

Test_longcat_image[single_card_001]
Test_longcat_image[parallel_001]

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60e291117e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T03:26:15Z

+                server_args=[
+                    "--cache-backend",
+                    "cache_dit",


Assert Cache-DiT is enabled for the cache_dit cases

These parametrizations only verify that image generation still works, but assert_diffusion_response in tests/conftest.py:1333-1367 checks response validity/shape only. Because Cache-DiT is an acceleration path, LongCat will still return a correct image if the cache backend is skipped or regresses, so this new L4 test can stay green while Cache-DiT support is broken. Please add a positive assertion (log/metric/backend state) that the Cache-DiT hooks were actually enabled.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-18T03:26:15Z

+                    "--ulysses-degree",
+                    "2",
+                ],


Verify Ulysses/Ring actually run instead of falling back

The sequence-parallel cases currently pass as long as a request returns an image, but LongCat can warn and continue without SP hooks (vllm_omni/diffusion/registry.py:283-289), and the attention factory explicitly falls back to NoParallelAttention when SP groups are unavailable (vllm_omni/diffusion/attention/parallel/factory.py:47-52). In that scenario the Ulysses/Ring feature is broken but this test still passes, so it does not reliably cover the advertised SP matrix.

Useful? React with 👍 / 👎.

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

hsliuustc0106

I found one blocking issue in the new LongCat L4 test.

The test claims to validate Ulysses and Ring sequence parallelism (parallel_001 / parallel_002), but LongCat does not appear to have any _sp_plan hooks, and the runtime explicitly no-ops SP when no _sp_plan is found. In vllm_omni/diffusion/registry.py, _apply_sequence_parallel_if_enabled() logs a warning and continues when applied_count == 0, so these cases can still return a valid image without ever exercising the SP path. I also could not find _sp_plan, ulysses_degree, or ring_degree in vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image.py.

That means the PR currently overstates its feature coverage. Please either add actual LongCat SP support plus a test that proves the SP path was used, or remove the Ulysses/Ring cases from the claimed matrix for this model.

lcukyfuture · 2026-03-18T09:28:43Z

I found one blocking issue in the new LongCat L4 test. The test claims to validate Ulysses and Ring sequence parallelism (parallel_001 / parallel_002), but LongCat does not appear to have any _sp_plan hooks, and the runtime explicitly no-ops SP when no _sp_plan is found. In vllm_omni/diffusion/registry.py, _apply_sequence_parallel_if_enabled() logs a warning and continues when applied_count == 0, so these cases can still return a valid image without ever exercising the SP path. I also could not find _sp_plan, ulysses_degree, or ring_degree in vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image.py. That means the PR currently overstates its feature coverage. Please either add actual LongCat SP support plus a test that proves the SP path was used, or remove the Ulysses/Ring cases from the claimed matrix for this model.

@hsliuustc0106 Thanks for your comment. It looks LongCat have SP support, and the _sp_plan is on LongCatImageTransformer2DModel in vllm_omni/diffusion/models/longcat_image/longcat_image_transformer.py. The pipeline attaches that transformer as self.transformer, and the registry applies SP hooks to that transformer. I also verified this from the runtime logs, which include:

INFO 03-18 16:20:17 [parallel_state.py:588] Building SP subgroups from explicit sp_group_ranks (sp_size=2, ulysses=2, ring=1, use_ulysses_low=True).
INFO 03-18 16:20:17 [parallel_state.py:588] Building SP subgroups from explicit sp_group_ranks (sp_size=2, ulysses=2, ring=1, use_ulysses_low=True).
INFO 03-18 16:20:17 [parallel_state.py:630] SP group details for rank 0: sp_group=[0, 1], ulysses_group=[0, 1], ring_group=[0]
INFO 03-18 16:20:17 [parallel_state.py:630] SP group details for rank 1: sp_group=[0, 1], ulysses_group=[0, 1], ring_group=[1]

For completeness, I am including the full server log below.

Full server log for the Ulysses case

/home/24037978r/vllm-omni/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:647: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
warnings.warn(
INFO 03-18 16:20:05 [serve.py:91] Detected diffusion model: meituan-longcat/LongCat-Image
INFO 03-18 16:20:05 [logo.py:45]        █     █     █▄   ▄█       ▄▀▀▀▀▄ █▄   ▄█ █▄    █ ▀█▀ 
INFO 03-18 16:20:05 [logo.py:45]  ▄▄ ▄█ █     █     █ ▀▄▀ █  ▄▄▄  █    █ █ ▀▄▀ █ █ ▀▄  █  █  
INFO 03-18 16:20:05 [logo.py:45]   █▄█▀ █     █     █     █       █    █ █     █ █   ▀▄█  █  
INFO 03-18 16:20:05 [logo.py:45]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀        ▀▀▀▀  ▀     ▀ ▀     ▀ ▀▀▀ 
INFO 03-18 16:20:05 [logo.py:45] 
(APIServer pid=1065189) INFO 03-18 16:20:05 [utils.py:302] vLLM server version 0.17.0, serving model meituan-longcat/LongCat-Image
(APIServer pid=1065189) INFO 03-18 16:20:05 [utils.py:238] non-default args: {'model_tag': 'meituan-longcat/LongCat-Image', 'host': '127.0.0.1', 'port': 33333, 'model': 'meituan-longcat/LongCat-Image'}
(APIServer pid=1065189) INFO 03-18 16:20:05 [weight_utils.py:50] Using model weights format ['*']
(APIServer pid=1065189) INFO 03-18 16:20:06 [omni.py:195] Initializing stages for model: meituan-longcat/LongCat-Image
(APIServer pid=1065189) INFO 03-18 16:20:06 [omni.py:322] No omni_master_address provided, defaulting to localhost (127.0.0.1)
(APIServer pid=1065189) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1065189) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1065189) INFO 03-18 16:20:07 [initialization.py:35] No OmniTransferConfig provided
(APIServer pid=1065189) INFO 03-18 16:20:07 [omni.py:356] [AsyncOrchestrator] Loaded 1 stages
(APIServer pid=1065189) INFO 03-18 16:20:08 [weight_utils.py:50] Using model weights format ['*']
(APIServer pid=1065189) INFO 03-18 16:20:08 [multiproc_executor.py:74] Starting server...
/home/24037978r/vllm-omni/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:647: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
warnings.warn(
/home/24037978r/vllm-omni/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:647: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
warnings.warn(
INFO 03-18 16:20:17 [diffusion_worker.py:355] Worker 0 created result MessageQueue
INFO 03-18 16:20:17 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 03-18 16:20:17 [vllm.py:747] Asynchronous scheduling is enabled.
INFO 03-18 16:20:17 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 03-18 16:20:17 [vllm.py:747] Asynchronous scheduling is enabled.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
INFO 03-18 16:20:17 [diffusion_worker.py:118] Worker 1: Initialized device and distributed environment.
INFO 03-18 16:20:17 [diffusion_worker.py:118] Worker 0: Initialized device and distributed environment.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
INFO 03-18 16:20:17 [parallel_state.py:588] Building SP subgroups from explicit sp_group_ranks (sp_size=2, ulysses=2, ring=1, use_ulysses_low=True).
INFO 03-18 16:20:17 [parallel_state.py:588] Building SP subgroups from explicit sp_group_ranks (sp_size=2, ulysses=2, ring=1, use_ulysses_low=True).
INFO 03-18 16:20:17 [parallel_state.py:630] SP group details for rank 0: sp_group=[0, 1], ulysses_group=[0, 1], ring_group=[0]
INFO 03-18 16:20:17 [parallel_state.py:630] SP group details for rank 1: sp_group=[0, 1], ulysses_group=[0, 1], ring_group=[1]
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:  20%|██        | 1/5 [00:00<00:03,  1.26it/s]
Loading checkpoint shards:  20%|██        | 1/5 [00:00<00:03,  1.12it/s]
Loading checkpoint shards:  40%|████      | 2/5 [00:01<00:02,  1.35it/s]
Loading checkpoint shards:  40%|████      | 2/5 [00:01<00:02,  1.26it/s]
Loading checkpoint shards:  60%|██████    | 3/5 [00:03<00:02,  1.09s/it]
Loading checkpoint shards:  60%|██████    | 3/5 [00:03<00:02,  1.23s/it]
Loading checkpoint shards:  80%|████████  | 4/5 [00:04<00:01,  1.39s/it]
Loading checkpoint shards:  80%|████████  | 4/5 [00:05<00:01,  1.49s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:05<00:00,  1.05s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:05<00:00,  1.06s/it]

Loading checkpoint shards: 100%|██████████| 5/5 [00:05<00:00,  1.03s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:05<00:00,  1.09s/it]
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
INFO 03-18 16:20:30 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
INFO 03-18 16:20:30 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
INFO 03-18 16:20:31 [registry.py:271] Applying sequence parallelism to LongCatImageTransformer2DModel (transformer) (sp_size=2, mode=ulysses, ulysses=2, ring=1)
INFO 03-18 16:20:32 [registry.py:271] Applying sequence parallelism to LongCatImageTransformer2DModel (transformer) (sp_size=2, mode=ulysses, ulysses=2, ring=1)
INFO 03-18 16:20:32 [weight_utils.py:601] No transformer/diffusion_pytorch_model.safetensors.index.json found in remote.
INFO 03-18 16:20:33 [weight_utils.py:601] No transformer/diffusion_pytorch_model.safetensors.index.json found in remote.

Multi-thread loading shards:   0% Completed | 0/1 [00:00<?, ?it/s]
INFO 03-18 16:20:35 [diffusers_loader.py:321] Loading weights took 2.33 seconds

Multi-thread loading shards: 100% Completed | 1/1 [00:01<00:00,  1.99s/it]

Multi-thread loading shards: 100% Completed | 1/1 [00:01<00:00,  1.99s/it]

INFO 03-18 16:20:35 [diffusers_loader.py:321] Loading weights took 2.36 seconds
INFO 03-18 16:20:36 [diffusion_model_runner.py:134] Model loading took 27.3097 GiB and 18.290028 seconds
INFO 03-18 16:20:36 [diffusion_model_runner.py:139] Model runner: Model loaded successfully.
INFO 03-18 16:20:36 [diffusion_model_runner.py:78] Model runner: transformer compiled with torch.compile.
INFO 03-18 16:20:36 [cache_dit_backend.py:1132] Using custom cache-dit enabler for model: LongCatImagePipeline
INFO 03-18 16:20:36 [cache_dit_backend.py:226] Enabling cache-dit on LongCatImage transformer with BlockAdapter: Fn=1, Bn=0, W=4, 
INFO 03-18 16:20:36 [cache_dit_backend.py:1139] Cache-dit enabled successfully on LongCatImagePipeline
INFO 03-18 16:20:36 [diffusion_model_runner.py:173] Model runner: Initialization complete.
INFO 03-18 16:20:36 [diffusion_worker.py:148] Worker 1: Process-scoped GPU memory after model loading: 27.84 GiB.
INFO 03-18 16:20:36 [manager.py:96] Initializing DiffusionLoRAManager: device=cuda:1, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
INFO 03-18 16:20:36 [diffusion_worker.py:87] Worker 1: Initialization complete.
INFO 03-18 16:20:36 [diffusion_worker.py:493] Worker 1: Scheduler loop started.
INFO 03-18 16:20:36 [diffusion_worker.py:416] Worker 1 ready to receive requests via shared memory
INFO 03-18 16:20:36 [diffusion_model_runner.py:134] Model loading took 27.3097 GiB and 18.877100 seconds
INFO 03-18 16:20:36 [diffusion_model_runner.py:139] Model runner: Model loaded successfully.
INFO 03-18 16:20:36 [diffusion_model_runner.py:78] Model runner: transformer compiled with torch.compile.
INFO 03-18 16:20:36 [cache_dit_backend.py:1132] Using custom cache-dit enabler for model: LongCatImagePipeline
INFO 03-18 16:20:36 [cache_dit_backend.py:226] Enabling cache-dit on LongCatImage transformer with BlockAdapter: Fn=1, Bn=0, W=4, 
[03-18 16:20:36] [Cache-DiT] pipe is None, use FakeDiffusionPipeline instead.
[03-18 16:20:36] [Cache-DiT] Auto fill blocks_name: ['transformer_blocks', 'single_transformer_blocks'].
[03-18 16:20:36] [Cache-DiT] Found transformer NOT from diffusers: vllm_omni.diffusion.models.longcat_image.longcat_image_transformer disable check_forward_pattern by default.
[03-18 16:20:36] [Cache-DiT] Adapting Cache Acceleration using custom BlockAdapter!
[03-18 16:20:36] [Cache-DiT] Skipped Forward Pattern Check: ForwardPattern.Pattern_1
[03-18 16:20:36] [Cache-DiT] Skipped Forward Pattern Check: ForwardPattern.Pattern_1
[03-18 16:20:36] [Cache-DiT] Use default 'enable_separate_cfg' from block adapter register: False, Pipeline: FakeDiffusionPipeline.
[03-18 16:20:36] [Cache-DiT] Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_CFG0, Calibrator Config: None
[03-18 16:20:36] [Cache-DiT] Skipped Forward Pattern Check: ForwardPattern.Pattern_1
[03-18 16:20:36] [Cache-DiT] Match Blocks: CachedBlocks_Pattern_0_1_2, for transformer_blocks, cache_context: transformer_blocks_123425887428080, context_manager: FakeDiffusionPipeline_123425351541984.
[03-18 16:20:36] [Cache-DiT] Skipped Forward Pattern Check: ForwardPattern.Pattern_1
[03-18 16:20:36] [Cache-DiT] Match Blocks: CachedBlocks_Pattern_0_1_2, for single_transformer_blocks, cache_context: single_transformer_blocks_123425351546736, context_manager: FakeDiffusionPipeline_123425351541984.
INFO 03-18 16:20:36 [cache_dit_backend.py:1139] Cache-dit enabled successfully on LongCatImagePipeline
INFO 03-18 16:20:36 [diffusion_model_runner.py:173] Model runner: Initialization complete.
INFO 03-18 16:20:37 [diffusion_worker.py:148] Worker 0: Process-scoped GPU memory after model loading: 27.84 GiB.
INFO 03-18 16:20:37 [manager.py:96] Initializing DiffusionLoRAManager: device=cuda:0, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
INFO 03-18 16:20:37 [diffusion_worker.py:87] Worker 0: Initialization complete.
INFO 03-18 16:20:37 [diffusion_worker.py:493] Worker 0: Scheduler loop started.
INFO 03-18 16:20:37 [diffusion_worker.py:416] Worker 0 ready to receive requests via shared memory
(APIServer pid=1065189) INFO 03-18 16:20:37 [scheduler.py:42] SyncScheduler initialized result MessageQueue
(APIServer pid=1065189) INFO 03-18 16:20:37 [diffusion_engine.py:409] dummy run to warm up the model
INFO 03-18 16:20:37 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-18 16:20:37 [kv_transfer_manager.py:479] Request has no ID, cannot receive KV cache
INFO 03-18 16:20:37 [cache_dit_backend.py:1160] Refreshing cache context for transformer with num_inference_steps: 1
INFO 03-18 16:20:37 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-18 16:20:37 [kv_transfer_manager.py:479] Request has no ID, cannot receive KV cache
[03-18 16:20:37] [Cache-DiT] ✅ Refreshed cache context: transformer_blocks_123425887428080, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N1_CFG0, Calibrator Config: None
INFO 03-18 16:20:37 [cache_dit_backend.py:1160] Refreshing cache context for transformer with num_inference_steps: 1
[03-18 16:20:37] [Cache-DiT] ✅ Refreshed cache context: single_transformer_blocks_123425351546736, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N1_CFG0, Calibrator Config: None
You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
WARNING 03-18 16:20:48 [diffusion_worker.py:385] SHM pack failed, falling back to raw enqueue: Got unsupported ScalarType BFloat16
(APIServer pid=1065189) INFO 03-18 16:20:49 [omni.py:437] [AsyncOrchestrator] Inline diffusion mode active – stage worker subprocess bypassed
(APIServer pid=1065189) INFO 03-18 16:20:49 [api_server.py:436] Detected pure diffusion mode (single diffusion stage)
(APIServer pid=1065189) INFO 03-18 16:20:49 [api_server.py:480] Pure diffusion API server initialized for model: meituan-longcat/LongCat-Image
(APIServer pid=1065189) INFO 03-18 16:20:49 [api_server.py:284] Starting vLLM API server (pure diffusion mode) on http://127.0.0.1:33333
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:38] Available routes are:
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /docs, Methods: GET, HEAD
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /tokenize, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /detokenize, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /load, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /version, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /health, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /metrics, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /ping, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /ping, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /invocations, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/audio/speech, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/audio/voices, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/audio/voices, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/audio/voices/{name}, Methods: DELETE
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /health, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/models, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/images/generations, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/images/edits, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos/{video_id}, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos/{video_id}, Methods: DELETE
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos/{video_id}/content, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:58] Route: /v1/audio/speech/stream, Endpoint: streaming_speech
(APIServer pid=1065189) INFO:     Started server process [1065189]
(APIServer pid=1065189) INFO:     Waiting for application startup.
(APIServer pid=1065189) INFO:     Application startup complete.
(APIServer pid=1065189) WARNING 03-18 16:20:50 [protocol.py:51] The following fields were present in the request but ignored: {'negative_prompt', 'num_inference_steps', 'height', 'width', 'guidance_scale'}
(APIServer pid=1065189) INFO 03-18 16:20:50 [serving_chat.py:2061] Diffusion chat request chatcmpl-b6ac3e3836d942e3: prompt='A cinematic illustration of a cat typing on a silv...', ref_images=0, params={}
(APIServer pid=1065189) INFO 03-18 16:20:50 [async_omni.py:512] [AsyncOrchestrator] Inline diffusion generate for request chatcmpl-b6ac3e3836d942e3
INFO 03-18 16:20:50 [manager.py:608] Deactivating all adapters: 0 layers
INFO 03-18 16:20:50 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-18 16:20:50 [kv_transfer_manager.py:381] No connector available for receiving KV cache
WARNING 03-18 16:20:50 [kv_transfer_manager.py:381] No connector available for receiving KV cache
INFO 03-18 16:20:50 [cache_dit_backend.py:1160] Refreshing cache context for transformer with num_inference_steps: 50
INFO 03-18 16:20:50 [cache_dit_backend.py:1160] Refreshing cache context for transformer with num_inference_steps: 50
[03-18 16:20:50] [Cache-DiT] ✅ Refreshed cache context: transformer_blocks_123425887428080, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N50_CFG0, Calibrator Config: None
[03-18 16:20:50] [Cache-DiT] ✅ Refreshed cache context: single_transformer_blocks_123425351546736, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N50_CFG0, Calibrator Config: None
WARNING 03-18 16:20:55 [diffusion_worker.py:385] SHM pack failed, falling back to raw enqueue: Got unsupported ScalarType BFloat16
(APIServer pid=1065189) INFO 03-18 16:20:55 [diffusion_engine.py:94] Generation completed successfully.
(APIServer pid=1065189) INFO 03-18 16:20:55 [diffusion_engine.py:116] Post-processing completed in 0.4051 seconds
(APIServer pid=1065189) INFO 03-18 16:20:55 [diffusion_engine.py:119] DiffusionEngine.step breakdown: preprocess=0.00 ms, add_req_and_wait=5239.26 ms, postprocess=405.12 ms, total=5644.76 ms
(APIServer pid=1065189) INFO 03-18 16:20:55 [omni_diffusion.py:133] OmniDiffusion.generate total: 5644.91 ms
(APIServer pid=1065189) INFO 03-18 16:20:56 [serving_chat.py:2233] Diffusion chat completed for request chatcmpl-b6ac3e3836d942e3: 1 images
(APIServer pid=1065189) INFO:     127.0.0.1:32838 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1065189) INFO 03-18 16:21:06 [launcher.py:122] Shutting down FastAPI HTTP server.
(APIServer pid=1065189) /home/24037978r/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/resource_tracker.py:147: UserWarning: resource_tracker: process died unexpectedly, relaunching.  Some resources might leak.
(APIServer pid=1065189)   warnings.warn('resource_tracker: process died unexpectedly, '
(APIServer pid=1065189) INFO:     Shutting down
Traceback (most recent call last):
File "/home/24037978r/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/resource_tracker.py", line 264, in main
  cache[rtype].remove(name)
KeyError: '/psm_0b79aef2'
(APIServer pid=1065189) INFO:     Waiting for application shutdown.
(APIServer pid=1065189) INFO:     Application shutdown complete.

</details>

Gaohan123 · 2026-03-19T02:21:39Z

@yenuo26 @congw729 PTAL

yenuo26 · 2026-03-19T03:12:46Z

1.Please use pytest -s -v tests/e2e/online_serving/test_*_expansion.py -m "advanced_model and diffusion and H100" --run-level "advanced_model" --collect-only to check if the test cases can be collected.
2.maybe you can temporarily modify pipeline.yml to run it in CI, and attach result.

fhfuih · 2026-03-19T06:41:42Z

maybe you can temporarily modify pipeline.yml to run it in CI, and attach result.

Please see 1938 (321c634) and 1682 (aca42cb)
for example

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

alex-jw-brooks · 2026-03-19T18:02:12Z

Yeah LongCat does support SP, but it was using the invasive approach instead of the SP plan until this PR - FYI @hsliuustc0106, maybe it's the point of confusion for hook related logs

hsliuustc0106 · 2026-03-20T00:12:32Z

-      - buildkite-agent artifact upload "tests/*.xlsx"
-    agents:
-      queue: "cpu_queue_premerge"
+  # - label: ":full_moon: Qwen3-TTS Non-Async-Chunk E2E Test"


why comment so many lines? @lcukyfuture @linyueqian PTAL

To only run the relevant test group when temporarily running the test on CI

hsliuustc0106 · 2026-03-20T00:16:36Z

@yenuo26 do we have a guidance for how many tests we should add for models with different priorities?

fhfuih

Looks good to me for now. Please resolve conflict and wait for CI

fhfuih · 2026-03-20T01:38:58Z

@yenuo26 do we have a guidance for how many tests we should add for models with different priorities?

Following this comment, since we just made an additional decision that not all models enjoy a full test coverage of diffusion features, we may need extra discussion on how to handle these PR's (lest they crush our CI machine 😁 )

Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>

lcukyfuture · 2026-03-20T02:59:29Z

The conflict has been resolved. You can also assign LongCat-Image-Edit to me if needed.

NumberWan · 2026-03-20T03:15:48Z

@fhfuih Should we keep the "Module-wise CPU offloading" item listed for the test? Because "Module-wise CPU offloading" not listed in #1217. Removing it would also help reduce the total number of test cases under limited resources.

NumberWan · 2026-03-20T03:34:48Z

The conflict has been resolved. You can also assign LongCat-Image-Edit to me if needed.

Thank you for your assistance. I will proceed with LongCat-Image-Edit. I expect the two files to be quite similar. I will request your review once the PR is ready. Thank again!

fhfuih · 2026-03-20T07:02:43Z

@fhfuih Should we keep the "Module-wise CPU offloading" item listed for the test? Because "Module-wise CPU offloading" not listed in #1217. Removing it would also help reduce the total number of test cases under limited resources.

I think we only fallback the test to modelwise offloading if this model does not support layerwise. So the number of tests don't change. But please also see my following comment

fhfuih · 2026-03-20T07:03:33Z

@lcukyfuture apologize for any confusion caused, but after some internal discussion, we just decided that we should reduce the number of test cases for not-high-priority models. Could you help settle a recommended feature combination for this model, and edit the test script to only include that feature combination? If you need help finding a good combination of diffusion features, see if hsliuustc0106/vllm-omni-skills#19 this AI skill can help, or search relevant PRs in this repo that introduces this model or relevant features (for any example code snippets).

And @NumberWan you can also coordinate on this matter

lcukyfuture · 2026-03-20T07:39:05Z

ok, I will check this. 👍

NumberWan · 2026-03-23T07:59:49Z

@lcukyfuture After my local testing on the server and reviewing the merged PR, I think these two feature combinations are the best options for LongCat-Image:

(1 GPU) Model-level (sequential) CPU offloading only (--enable-cpu-offload)
(2 GPUs) Cache-DiT + Ulysses-SP (ulysses-degree=2)

Since only these cases cover the 1-GPU and 2-GPU configurations, and in my runs Ulysses-SP (2) showed the shortest generation latency.

So the current PR can reduce the number of tests to 2.

NumberWan · 2026-03-23T08:07:31Z

@lcukyfuture Just a heads-up: the related PR for LongCat-Image-Edit has been created. #2035

NumberWan · 2026-03-24T03:18:44Z

#2035
@lcukyfuture Please use this PR as a reference so we can align both LongCat-Image and LongCat-Image-Edit model test cases.
The main difference between the 2 PR should be only the target resolution. If you have any opinions, please let me know so we can discuss.

fhfuih · 2026-03-24T03:44:40Z

#2035 @lcukyfuture Please use this PR as a reference so we can align both LongCat-Image and LongCat-Image-Edit model test cases. The main difference between the 2 PR should be only the target resolution. If you have any opinions, please let me know so we can discuss.

To add, we also plan to reduce the number of test cases for not-high priority models. So you two can also settle the feature combinations to test within 1~2 test cases

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

lcukyfuture · 2026-03-24T06:50:10Z

@NumberWan I have modified the test cases based on your PR, and it's fine in my local tests.

congw729 · 2026-03-24T06:55:03Z

    if: build.branch != "main"
    commands:
-      - buildkite-agent pipeline upload .buildkite/test-ready.yml
+      - buildkite-agent pipeline upload .buildkite/test-nightly.yml


Please remember to change it back before merging

NumberWan · 2026-03-24T06:59:32Z

@hsliuustc0106 @Gaohan123 Please add a ready label for the CI test

Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>

fhfuih · 2026-03-24T09:41:44Z

@lcukyfuture Please comment out the if: build.env("NIGHTLY") == "1" line in Diffusion Test yml so that it is not skipped. Plus, there are two if: build.env("NIGHTLY") == "1" lines in this YAML, could you help remove one?

Your latest temporary nightly CI did not run anything 😂 https://buildkite.com/vllm/vllm-omni/builds/4978/steps/canvas

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

Remove duplicate `if: build.env("NIGHTLY") == "1"` and commands block that was introduced by a bad conflict resolution on GitHub. Keep the commented-out `if` so the Diffusion Test runs on non-nightly builds. Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

lcukyfuture · 2026-03-24T11:30:29Z

Sorry, it seems I made a mistake while handling the conflict. I'll fix it. 😢

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>

fhfuih

Thanks. I can see that https://buildkite.com/vllm/vllm-omni/builds/4996/steps/canvas CI passes.

Please change the pipeline YAMLs back, and it will be ready to merge

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

NumberWan

LGTM

Gaohan123

LGTM. Thanks

solved

…1970) Signed-off-by: lcukyfuture <zlf994478451@outlook.com> Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>

[Test] Add L4 diffusion feature test for LongCat-Image

60e2911

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

chatgpt-codex-connector Bot reviewed Mar 18, 2026

View reviewed changes

[Test] Fix formatting for LongCat-Image L4 test

bcc1a10

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

hsliuustc0106 reviewed Mar 18, 2026

View reviewed changes

ci: temporarily run H100 diffusion expansion on PR

5926435

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

lcukyfuture force-pushed the test/longcat-image-l4-diffusion branch from 344e33c to 5926435 Compare March 19, 2026 08:15

hsliuustc0106 reviewed Mar 20, 2026

View reviewed changes

fhfuih mentioned this pull request Mar 20, 2026

[RFC]: L4 e2e tests of diffusion models and diffusion features (continuous maintanance) #1832

Open

1 task

fhfuih approved these changes Mar 20, 2026

View reviewed changes

yenuo26 mentioned this pull request Mar 20, 2026

[RFC]: Supplement use cases for L1, L3, and L4 JiusiServe/vllm-omni#163

Closed

1 task

Merge branch 'main' into test/longcat-image-l4-diffusion

fbd2cac

Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>

Update for 2 test cases

f8850b8

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

lcukyfuture force-pushed the test/longcat-image-l4-diffusion branch from 33d30a7 to f8850b8 Compare March 24, 2026 06:43

congw729 reviewed Mar 24, 2026

View reviewed changes

Merge branch 'main' into test/longcat-image-l4-diffusion

f7e78a3

Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>

Gaohan123 added the ready label to trigger buildkite CI label Mar 24, 2026

lcukyfuture added 2 commits March 24, 2026 19:14

Fix temp CI diffusion step gating

4065111

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

lcukyfuture force-pushed the test/longcat-image-l4-diffusion branch from 839864a to bfe01d6 Compare March 24, 2026 11:18

Update nightly CI config

18e2902

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

lcukyfuture force-pushed the test/longcat-image-l4-diffusion branch from 392ca19 to 18e2902 Compare March 24, 2026 11:43

Merge branch 'main' into test/longcat-image-l4-diffusion

d490267

Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>

fhfuih previously requested changes Mar 25, 2026

View reviewed changes

lcukyfuture and others added 2 commits March 25, 2026 12:48

Revert CI config changes in pipeline.yml and test-nightly.yml

451aaee

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>

Merge branch 'main' into test/longcat-image-l4-diffusion

db9c0d5

NumberWan approved these changes Mar 25, 2026

View reviewed changes

Gaohan123 approved these changes Mar 25, 2026

View reviewed changes

Gaohan123 merged commit 0849c5e into vllm-project:main Mar 25, 2026
7 of 8 checks passed

Conversation

lcukyfuture commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Settings

Test Result

CI Test Result [Passed]

CI Successful in both 2 tests

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

lcukyfuture commented Mar 18, 2026

Uh oh!

Gaohan123 commented Mar 19, 2026

Uh oh!

yenuo26 commented Mar 19, 2026

Uh oh!

fhfuih commented Mar 19, 2026

Uh oh!

alex-jw-brooks commented Mar 19, 2026

Uh oh!

hsliuustc0106 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

fhfuih Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Mar 20, 2026

Uh oh!

fhfuih left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhfuih commented Mar 20, 2026

Uh oh!

lcukyfuture commented Mar 20, 2026

Uh oh!

NumberWan commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NumberWan commented Mar 20, 2026

Uh oh!

fhfuih commented Mar 20, 2026

Uh oh!

fhfuih commented Mar 20, 2026

Uh oh!

lcukyfuture commented Mar 20, 2026

Uh oh!

NumberWan commented Mar 23, 2026

Uh oh!

NumberWan commented Mar 23, 2026

Uh oh!

NumberWan commented Mar 24, 2026

Uh oh!

fhfuih commented Mar 24, 2026

Uh oh!

lcukyfuture commented Mar 24, 2026

Uh oh!

congw729 Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

NumberWan commented Mar 24, 2026

Uh oh!

fhfuih commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lcukyfuture commented Mar 24, 2026

Uh oh!

lcukyfuture commented Mar 18, 2026 •

edited

Loading

fhfuih left a comment •

edited

Loading

NumberWan commented Mar 20, 2026 •

edited

Loading

fhfuih commented Mar 24, 2026 •

edited

Loading