Skip to content

[Test] Add L4 diffusion feature test for LongCat-Image#1970

Merged
Gaohan123 merged 12 commits intovllm-project:mainfrom
lcukyfuture:test/longcat-image-l4-diffusion
Mar 25, 2026
Merged

[Test] Add L4 diffusion feature test for LongCat-Image#1970
Gaohan123 merged 12 commits intovllm-project:mainfrom
lcukyfuture:test/longcat-image-l4-diffusion

Conversation

@lcukyfuture
Copy link
Copy Markdown
Contributor

@lcukyfuture lcukyfuture commented Mar 18, 2026

Purpose

To follow the recent establishment of multi-level testing system, this PR adds L4 test for LongCat-Image.

Test Plan

The most recent list of diffusion features is at #1217. The test covers the supported LongCat-Image diffusion feature matrix in online serving mode, including:

  • Module-wise CPU offloading
  • Cache-DiT
  • SP (Ulysses & Ring)

Settings

  • (1 GPU) Module-wise CPU offloading only
  • (2 GPUs) CacheDiT + Ulysses2

Test Result

Collection check:

pytest tests/e2e/online_serving/test_longcat_image_expansion.py --collect-only -m diffusion
platform linux -- Python 3.12.13, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/vllm-omni
configfile: pyproject.toml
plugins: asyncio-1.3.0, cov-7.0.0, mock-3.15.1, anyio-4.12.1
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                

<Dir vllm-omni>
  <Package tests>
    <Package e2e>
      <Package online_serving>
        <Module test_longcat_image_expansion.py>
          <Function test_longcat_image[single_card_001]>
          <Function test_longcat_image[parallel_001]>

Additional Test Results:

pytest -s -v tests/e2e/online_serving/test_longcat_image_edit_expansion.py \
  -m "diffusion and advanced_model and H100" \
  --collect-only
platform linux -- Python 3.12.13, pytest-9.0.2, pluggy-1.6.0 -- /home/vllm-omni/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/vllm-omni
configfile: pyproject.toml
plugins: asyncio-1.3.0, cov-7.0.0, mock-3.15.1, anyio-4.12.1
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                                                                                

<Dir vllm-omni>
  <Package tests>
    <Package e2e>
      <Package online_serving>
        <Module test_longcat_image_expansion.py>
          Recommended tests of diffusion features that are available in online serving mode
          and are supported by the following model:
          - LongCat-Image: text-to-image with single prompt input
          Coverage:
          - CPU offloading (model-level sequential offload via --enable-cpu-offload)
          - Cache-DiT
          - SP (Ulysses)
          
          This validates:
           - Successful image generation at the expected 768x1344 resolution with recommended feature combinations
          <Function test_longcat_image[single_card_001]>
            Test the recommended feature combinations for LongCat-Image.
          <Function test_longcat_image[parallel_001]>
            Test the recommended feature combinations for LongCat-Image.

CI Test Result [Passed]

CI Successful in both 2 tests

  • Test_longcat_image[single_card_001]
  • Test_longcat_image[parallel_001]
image
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60e291117e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +47 to +49
server_args=[
"--cache-backend",
"cache_dit",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Assert Cache-DiT is enabled for the cache_dit cases

These parametrizations only verify that image generation still works, but assert_diffusion_response in tests/conftest.py:1333-1367 checks response validity/shape only. Because Cache-DiT is an acceleration path, LongCat will still return a correct image if the cache backend is skipped or regresses, so this new L4 test can stay green while Cache-DiT support is broken. Please add a positive assertion (log/metric/backend state) that the Cache-DiT hooks were actually enabled.

Useful? React with 👍 / 👎.

Comment on lines +50 to +52
"--ulysses-degree",
"2",
],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Verify Ulysses/Ring actually run instead of falling back

The sequence-parallel cases currently pass as long as a request returns an image, but LongCat can warn and continue without SP hooks (vllm_omni/diffusion/registry.py:283-289), and the attention factory explicitly falls back to NoParallelAttention when SP groups are unavailable (vllm_omni/diffusion/attention/parallel/factory.py:47-52). In that scenario the Ulysses/Ring feature is broken but this test still passes, so it does not reliably cover the advertised SP matrix.

Useful? React with 👍 / 👎.

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue in the new LongCat L4 test.

The test claims to validate Ulysses and Ring sequence parallelism (parallel_001 / parallel_002), but LongCat does not appear to have any _sp_plan hooks, and the runtime explicitly no-ops SP when no _sp_plan is found. In vllm_omni/diffusion/registry.py, _apply_sequence_parallel_if_enabled() logs a warning and continues when applied_count == 0, so these cases can still return a valid image without ever exercising the SP path. I also could not find _sp_plan, ulysses_degree, or ring_degree in vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image.py.

That means the PR currently overstates its feature coverage. Please either add actual LongCat SP support plus a test that proves the SP path was used, or remove the Ulysses/Ring cases from the claimed matrix for this model.

@lcukyfuture
Copy link
Copy Markdown
Contributor Author

I found one blocking issue in the new LongCat L4 test. The test claims to validate Ulysses and Ring sequence parallelism (parallel_001 / parallel_002), but LongCat does not appear to have any _sp_plan hooks, and the runtime explicitly no-ops SP when no _sp_plan is found. In vllm_omni/diffusion/registry.py, _apply_sequence_parallel_if_enabled() logs a warning and continues when applied_count == 0, so these cases can still return a valid image without ever exercising the SP path. I also could not find _sp_plan, ulysses_degree, or ring_degree in vllm_omni/diffusion/models/longcat_image/pipeline_longcat_image.py. That means the PR currently overstates its feature coverage. Please either add actual LongCat SP support plus a test that proves the SP path was used, or remove the Ulysses/Ring cases from the claimed matrix for this model.

@hsliuustc0106 Thanks for your comment. It looks LongCat have SP support, and the _sp_plan is on LongCatImageTransformer2DModel in vllm_omni/diffusion/models/longcat_image/longcat_image_transformer.py. The pipeline attaches that transformer as self.transformer, and the registry applies SP hooks to that transformer. I also verified this from the runtime logs, which include:

INFO 03-18 16:20:17 [parallel_state.py:588] Building SP subgroups from explicit sp_group_ranks (sp_size=2, ulysses=2, ring=1, use_ulysses_low=True).
INFO 03-18 16:20:17 [parallel_state.py:588] Building SP subgroups from explicit sp_group_ranks (sp_size=2, ulysses=2, ring=1, use_ulysses_low=True).
INFO 03-18 16:20:17 [parallel_state.py:630] SP group details for rank 0: sp_group=[0, 1], ulysses_group=[0, 1], ring_group=[0]
INFO 03-18 16:20:17 [parallel_state.py:630] SP group details for rank 1: sp_group=[0, 1], ulysses_group=[0, 1], ring_group=[1]

For completeness, I am including the full server log below.

Full server log for the Ulysses case
/home/24037978r/vllm-omni/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:647: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
warnings.warn(
INFO 03-18 16:20:05 [serve.py:91] Detected diffusion model: meituan-longcat/LongCat-Image
INFO 03-18 16:20:05 [logo.py:45]        █     █     █▄   ▄█       ▄▀▀▀▀▄ █▄   ▄█ █▄    █ ▀█▀ 
INFO 03-18 16:20:05 [logo.py:45]  ▄▄ ▄█ █     █     █ ▀▄▀ █  ▄▄▄  █    █ █ ▀▄▀ █ █ ▀▄  █  █  
INFO 03-18 16:20:05 [logo.py:45]   █▄█▀ █     █     █     █       █    █ █     █ █   ▀▄█  █  
INFO 03-18 16:20:05 [logo.py:45]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀        ▀▀▀▀  ▀     ▀ ▀     ▀ ▀▀▀ 
INFO 03-18 16:20:05 [logo.py:45] 
(APIServer pid=1065189) INFO 03-18 16:20:05 [utils.py:302] vLLM server version 0.17.0, serving model meituan-longcat/LongCat-Image
(APIServer pid=1065189) INFO 03-18 16:20:05 [utils.py:238] non-default args: {'model_tag': 'meituan-longcat/LongCat-Image', 'host': '127.0.0.1', 'port': 33333, 'model': 'meituan-longcat/LongCat-Image'}
(APIServer pid=1065189) INFO 03-18 16:20:05 [weight_utils.py:50] Using model weights format ['*']
(APIServer pid=1065189) INFO 03-18 16:20:06 [omni.py:195] Initializing stages for model: meituan-longcat/LongCat-Image
(APIServer pid=1065189) INFO 03-18 16:20:06 [omni.py:322] No omni_master_address provided, defaulting to localhost (127.0.0.1)
(APIServer pid=1065189) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1065189) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1065189) INFO 03-18 16:20:07 [initialization.py:35] No OmniTransferConfig provided
(APIServer pid=1065189) INFO 03-18 16:20:07 [omni.py:356] [AsyncOrchestrator] Loaded 1 stages
(APIServer pid=1065189) INFO 03-18 16:20:08 [weight_utils.py:50] Using model weights format ['*']
(APIServer pid=1065189) INFO 03-18 16:20:08 [multiproc_executor.py:74] Starting server...
/home/24037978r/vllm-omni/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:647: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
warnings.warn(
/home/24037978r/vllm-omni/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:647: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
warnings.warn(
INFO 03-18 16:20:17 [diffusion_worker.py:355] Worker 0 created result MessageQueue
INFO 03-18 16:20:17 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 03-18 16:20:17 [vllm.py:747] Asynchronous scheduling is enabled.
INFO 03-18 16:20:17 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 03-18 16:20:17 [vllm.py:747] Asynchronous scheduling is enabled.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
INFO 03-18 16:20:17 [diffusion_worker.py:118] Worker 1: Initialized device and distributed environment.
INFO 03-18 16:20:17 [diffusion_worker.py:118] Worker 0: Initialized device and distributed environment.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
INFO 03-18 16:20:17 [parallel_state.py:588] Building SP subgroups from explicit sp_group_ranks (sp_size=2, ulysses=2, ring=1, use_ulysses_low=True).
INFO 03-18 16:20:17 [parallel_state.py:588] Building SP subgroups from explicit sp_group_ranks (sp_size=2, ulysses=2, ring=1, use_ulysses_low=True).
INFO 03-18 16:20:17 [parallel_state.py:630] SP group details for rank 0: sp_group=[0, 1], ulysses_group=[0, 1], ring_group=[0]
INFO 03-18 16:20:17 [parallel_state.py:630] SP group details for rank 1: sp_group=[0, 1], ulysses_group=[0, 1], ring_group=[1]
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:  20%|██        | 1/5 [00:00<00:03,  1.26it/s]
Loading checkpoint shards:  20%|██        | 1/5 [00:00<00:03,  1.12it/s]
Loading checkpoint shards:  40%|████      | 2/5 [00:01<00:02,  1.35it/s]
Loading checkpoint shards:  40%|████      | 2/5 [00:01<00:02,  1.26it/s]
Loading checkpoint shards:  60%|██████    | 3/5 [00:03<00:02,  1.09s/it]
Loading checkpoint shards:  60%|██████    | 3/5 [00:03<00:02,  1.23s/it]
Loading checkpoint shards:  80%|████████  | 4/5 [00:04<00:01,  1.39s/it]
Loading checkpoint shards:  80%|████████  | 4/5 [00:05<00:01,  1.49s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:05<00:00,  1.05s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:05<00:00,  1.06s/it]

Loading checkpoint shards: 100%|██████████| 5/5 [00:05<00:00,  1.03s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:05<00:00,  1.09s/it]
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
INFO 03-18 16:20:30 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
INFO 03-18 16:20:30 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
INFO 03-18 16:20:31 [registry.py:271] Applying sequence parallelism to LongCatImageTransformer2DModel (transformer) (sp_size=2, mode=ulysses, ulysses=2, ring=1)
INFO 03-18 16:20:32 [registry.py:271] Applying sequence parallelism to LongCatImageTransformer2DModel (transformer) (sp_size=2, mode=ulysses, ulysses=2, ring=1)
INFO 03-18 16:20:32 [weight_utils.py:601] No transformer/diffusion_pytorch_model.safetensors.index.json found in remote.
INFO 03-18 16:20:33 [weight_utils.py:601] No transformer/diffusion_pytorch_model.safetensors.index.json found in remote.

Multi-thread loading shards:   0% Completed | 0/1 [00:00<?, ?it/s]
INFO 03-18 16:20:35 [diffusers_loader.py:321] Loading weights took 2.33 seconds

Multi-thread loading shards: 100% Completed | 1/1 [00:01<00:00,  1.99s/it]

Multi-thread loading shards: 100% Completed | 1/1 [00:01<00:00,  1.99s/it]

INFO 03-18 16:20:35 [diffusers_loader.py:321] Loading weights took 2.36 seconds
INFO 03-18 16:20:36 [diffusion_model_runner.py:134] Model loading took 27.3097 GiB and 18.290028 seconds
INFO 03-18 16:20:36 [diffusion_model_runner.py:139] Model runner: Model loaded successfully.
INFO 03-18 16:20:36 [diffusion_model_runner.py:78] Model runner: transformer compiled with torch.compile.
INFO 03-18 16:20:36 [cache_dit_backend.py:1132] Using custom cache-dit enabler for model: LongCatImagePipeline
INFO 03-18 16:20:36 [cache_dit_backend.py:226] Enabling cache-dit on LongCatImage transformer with BlockAdapter: Fn=1, Bn=0, W=4, 
INFO 03-18 16:20:36 [cache_dit_backend.py:1139] Cache-dit enabled successfully on LongCatImagePipeline
INFO 03-18 16:20:36 [diffusion_model_runner.py:173] Model runner: Initialization complete.
INFO 03-18 16:20:36 [diffusion_worker.py:148] Worker 1: Process-scoped GPU memory after model loading: 27.84 GiB.
INFO 03-18 16:20:36 [manager.py:96] Initializing DiffusionLoRAManager: device=cuda:1, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
INFO 03-18 16:20:36 [diffusion_worker.py:87] Worker 1: Initialization complete.
INFO 03-18 16:20:36 [diffusion_worker.py:493] Worker 1: Scheduler loop started.
INFO 03-18 16:20:36 [diffusion_worker.py:416] Worker 1 ready to receive requests via shared memory
INFO 03-18 16:20:36 [diffusion_model_runner.py:134] Model loading took 27.3097 GiB and 18.877100 seconds
INFO 03-18 16:20:36 [diffusion_model_runner.py:139] Model runner: Model loaded successfully.
INFO 03-18 16:20:36 [diffusion_model_runner.py:78] Model runner: transformer compiled with torch.compile.
INFO 03-18 16:20:36 [cache_dit_backend.py:1132] Using custom cache-dit enabler for model: LongCatImagePipeline
INFO 03-18 16:20:36 [cache_dit_backend.py:226] Enabling cache-dit on LongCatImage transformer with BlockAdapter: Fn=1, Bn=0, W=4, 
[03-18 16:20:36] [Cache-DiT] pipe is None, use FakeDiffusionPipeline instead.
[03-18 16:20:36] [Cache-DiT] Auto fill blocks_name: ['transformer_blocks', 'single_transformer_blocks'].
[03-18 16:20:36] [Cache-DiT] Found transformer NOT from diffusers: vllm_omni.diffusion.models.longcat_image.longcat_image_transformer disable check_forward_pattern by default.
[03-18 16:20:36] [Cache-DiT] Adapting Cache Acceleration using custom BlockAdapter!
[03-18 16:20:36] [Cache-DiT] Skipped Forward Pattern Check: ForwardPattern.Pattern_1
[03-18 16:20:36] [Cache-DiT] Skipped Forward Pattern Check: ForwardPattern.Pattern_1
[03-18 16:20:36] [Cache-DiT] Use default 'enable_separate_cfg' from block adapter register: False, Pipeline: FakeDiffusionPipeline.
[03-18 16:20:36] [Cache-DiT] Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_CFG0, Calibrator Config: None
[03-18 16:20:36] [Cache-DiT] Skipped Forward Pattern Check: ForwardPattern.Pattern_1
[03-18 16:20:36] [Cache-DiT] Match Blocks: CachedBlocks_Pattern_0_1_2, for transformer_blocks, cache_context: transformer_blocks_123425887428080, context_manager: FakeDiffusionPipeline_123425351541984.
[03-18 16:20:36] [Cache-DiT] Skipped Forward Pattern Check: ForwardPattern.Pattern_1
[03-18 16:20:36] [Cache-DiT] Match Blocks: CachedBlocks_Pattern_0_1_2, for single_transformer_blocks, cache_context: single_transformer_blocks_123425351546736, context_manager: FakeDiffusionPipeline_123425351541984.
INFO 03-18 16:20:36 [cache_dit_backend.py:1139] Cache-dit enabled successfully on LongCatImagePipeline
INFO 03-18 16:20:36 [diffusion_model_runner.py:173] Model runner: Initialization complete.
INFO 03-18 16:20:37 [diffusion_worker.py:148] Worker 0: Process-scoped GPU memory after model loading: 27.84 GiB.
INFO 03-18 16:20:37 [manager.py:96] Initializing DiffusionLoRAManager: device=cuda:0, dtype=torch.bfloat16, max_cached_adapters=1, static_lora_path=None
INFO 03-18 16:20:37 [diffusion_worker.py:87] Worker 0: Initialization complete.
INFO 03-18 16:20:37 [diffusion_worker.py:493] Worker 0: Scheduler loop started.
INFO 03-18 16:20:37 [diffusion_worker.py:416] Worker 0 ready to receive requests via shared memory
(APIServer pid=1065189) INFO 03-18 16:20:37 [scheduler.py:42] SyncScheduler initialized result MessageQueue
(APIServer pid=1065189) INFO 03-18 16:20:37 [diffusion_engine.py:409] dummy run to warm up the model
INFO 03-18 16:20:37 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-18 16:20:37 [kv_transfer_manager.py:479] Request has no ID, cannot receive KV cache
INFO 03-18 16:20:37 [cache_dit_backend.py:1160] Refreshing cache context for transformer with num_inference_steps: 1
INFO 03-18 16:20:37 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-18 16:20:37 [kv_transfer_manager.py:479] Request has no ID, cannot receive KV cache
[03-18 16:20:37] [Cache-DiT] ✅ Refreshed cache context: transformer_blocks_123425887428080, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N1_CFG0, Calibrator Config: None
INFO 03-18 16:20:37 [cache_dit_backend.py:1160] Refreshing cache context for transformer with num_inference_steps: 1
[03-18 16:20:37] [Cache-DiT] ✅ Refreshed cache context: single_transformer_blocks_123425351546736, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N1_CFG0, Calibrator Config: None
You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a Qwen2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
WARNING 03-18 16:20:48 [diffusion_worker.py:385] SHM pack failed, falling back to raw enqueue: Got unsupported ScalarType BFloat16
(APIServer pid=1065189) INFO 03-18 16:20:49 [omni.py:437] [AsyncOrchestrator] Inline diffusion mode active – stage worker subprocess bypassed
(APIServer pid=1065189) INFO 03-18 16:20:49 [api_server.py:436] Detected pure diffusion mode (single diffusion stage)
(APIServer pid=1065189) INFO 03-18 16:20:49 [api_server.py:480] Pure diffusion API server initialized for model: meituan-longcat/LongCat-Image
(APIServer pid=1065189) INFO 03-18 16:20:49 [api_server.py:284] Starting vLLM API server (pure diffusion mode) on http://127.0.0.1:33333
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:38] Available routes are:
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /docs, Methods: GET, HEAD
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /tokenize, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /detokenize, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /load, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /version, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /health, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /metrics, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /ping, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /ping, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /invocations, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/audio/speech, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/audio/voices, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/audio/voices, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/audio/voices/{name}, Methods: DELETE
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /health, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/models, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/images/generations, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/images/edits, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos, Methods: POST
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos/{video_id}, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos/{video_id}, Methods: DELETE
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:47] Route: /v1/videos/{video_id}/content, Methods: GET
(APIServer pid=1065189) INFO 03-18 16:20:49 [launcher.py:58] Route: /v1/audio/speech/stream, Endpoint: streaming_speech
(APIServer pid=1065189) INFO:     Started server process [1065189]
(APIServer pid=1065189) INFO:     Waiting for application startup.
(APIServer pid=1065189) INFO:     Application startup complete.
(APIServer pid=1065189) WARNING 03-18 16:20:50 [protocol.py:51] The following fields were present in the request but ignored: {'negative_prompt', 'num_inference_steps', 'height', 'width', 'guidance_scale'}
(APIServer pid=1065189) INFO 03-18 16:20:50 [serving_chat.py:2061] Diffusion chat request chatcmpl-b6ac3e3836d942e3: prompt='A cinematic illustration of a cat typing on a silv...', ref_images=0, params={}
(APIServer pid=1065189) INFO 03-18 16:20:50 [async_omni.py:512] [AsyncOrchestrator] Inline diffusion generate for request chatcmpl-b6ac3e3836d942e3
INFO 03-18 16:20:50 [manager.py:608] Deactivating all adapters: 0 layers
INFO 03-18 16:20:50 [manager.py:608] Deactivating all adapters: 0 layers
WARNING 03-18 16:20:50 [kv_transfer_manager.py:381] No connector available for receiving KV cache
WARNING 03-18 16:20:50 [kv_transfer_manager.py:381] No connector available for receiving KV cache
INFO 03-18 16:20:50 [cache_dit_backend.py:1160] Refreshing cache context for transformer with num_inference_steps: 50
INFO 03-18 16:20:50 [cache_dit_backend.py:1160] Refreshing cache context for transformer with num_inference_steps: 50
[03-18 16:20:50] [Cache-DiT] ✅ Refreshed cache context: transformer_blocks_123425887428080, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N50_CFG0, Calibrator Config: None
[03-18 16:20:50] [Cache-DiT] ✅ Refreshed cache context: single_transformer_blocks_123425351546736, Collected Context Config: DBCache_F1B0_W4I1M0MC3_R0.24_N50_CFG0, Calibrator Config: None
WARNING 03-18 16:20:55 [diffusion_worker.py:385] SHM pack failed, falling back to raw enqueue: Got unsupported ScalarType BFloat16
(APIServer pid=1065189) INFO 03-18 16:20:55 [diffusion_engine.py:94] Generation completed successfully.
(APIServer pid=1065189) INFO 03-18 16:20:55 [diffusion_engine.py:116] Post-processing completed in 0.4051 seconds
(APIServer pid=1065189) INFO 03-18 16:20:55 [diffusion_engine.py:119] DiffusionEngine.step breakdown: preprocess=0.00 ms, add_req_and_wait=5239.26 ms, postprocess=405.12 ms, total=5644.76 ms
(APIServer pid=1065189) INFO 03-18 16:20:55 [omni_diffusion.py:133] OmniDiffusion.generate total: 5644.91 ms
(APIServer pid=1065189) INFO 03-18 16:20:56 [serving_chat.py:2233] Diffusion chat completed for request chatcmpl-b6ac3e3836d942e3: 1 images
(APIServer pid=1065189) INFO:     127.0.0.1:32838 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1065189) INFO 03-18 16:21:06 [launcher.py:122] Shutting down FastAPI HTTP server.
(APIServer pid=1065189) /home/24037978r/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/resource_tracker.py:147: UserWarning: resource_tracker: process died unexpectedly, relaunching.  Some resources might leak.
(APIServer pid=1065189)   warnings.warn('resource_tracker: process died unexpectedly, '
(APIServer pid=1065189) INFO:     Shutting down
Traceback (most recent call last):
File "/home/24037978r/.local/share/uv/python/cpython-3.12.13-linux-x86_64-gnu/lib/python3.12/multiprocessing/resource_tracker.py", line 264, in main
  cache[rtype].remove(name)
KeyError: '/psm_0b79aef2'
(APIServer pid=1065189) INFO:     Waiting for application shutdown.
(APIServer pid=1065189) INFO:     Application shutdown complete.

</details>

@Gaohan123
Copy link
Copy Markdown
Collaborator

@yenuo26 @congw729 PTAL

@yenuo26
Copy link
Copy Markdown
Collaborator

yenuo26 commented Mar 19, 2026

1.Please use pytest -s -v tests/e2e/online_serving/test_*_expansion.py -m "advanced_model and diffusion and H100" --run-level "advanced_model" --collect-only to check if the test cases can be collected.
2.maybe you can temporarily modify pipeline.yml to run it in CI, and attach result.

@fhfuih
Copy link
Copy Markdown
Contributor

fhfuih commented Mar 19, 2026

maybe you can temporarily modify pipeline.yml to run it in CI, and attach result.

Please see 1938 (321c634) and 1682 (aca42cb)
for example

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>
@lcukyfuture lcukyfuture force-pushed the test/longcat-image-l4-diffusion branch from 344e33c to 5926435 Compare March 19, 2026 08:15
@alex-jw-brooks
Copy link
Copy Markdown
Contributor

Yeah LongCat does support SP, but it was using the invasive approach instead of the SP plan until this PR - FYI @hsliuustc0106, maybe it's the point of confusion for hook related logs

Comment thread .buildkite/test-nightly.yml Outdated
- buildkite-agent artifact upload "tests/*.xlsx"
agents:
queue: "cpu_queue_premerge"
# - label: ":full_moon: Qwen3-TTS Non-Async-Chunk E2E Test"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why comment so many lines? @lcukyfuture @linyueqian PTAL

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To only run the relevant test group when temporarily running the test on CI

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@yenuo26 do we have a guidance for how many tests we should add for models with different priorities?

Copy link
Copy Markdown
Contributor

@fhfuih fhfuih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me for now. Please resolve conflict and wait for CI

@fhfuih
Copy link
Copy Markdown
Contributor

fhfuih commented Mar 20, 2026

@yenuo26 do we have a guidance for how many tests we should add for models with different priorities?

Following this comment, since we just made an additional decision that not all models enjoy a full test coverage of diffusion features, we may need extra discussion on how to handle these PR's (lest they crush our CI machine 😁 )

Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>
@lcukyfuture
Copy link
Copy Markdown
Contributor Author

The conflict has been resolved. You can also assign LongCat-Image-Edit to me if needed.

@NumberWan
Copy link
Copy Markdown
Contributor

NumberWan commented Mar 20, 2026

@fhfuih Should we keep the "Module-wise CPU offloading" item listed for the test? Because "Module-wise CPU offloading" not listed in #1217. Removing it would also help reduce the total number of test cases under limited resources.

@NumberWan
Copy link
Copy Markdown
Contributor

The conflict has been resolved. You can also assign LongCat-Image-Edit to me if needed.

Thank you for your assistance. I will proceed with LongCat-Image-Edit. I expect the two files to be quite similar. I will request your review once the PR is ready. Thank again!

@fhfuih
Copy link
Copy Markdown
Contributor

fhfuih commented Mar 20, 2026

@fhfuih Should we keep the "Module-wise CPU offloading" item listed for the test? Because "Module-wise CPU offloading" not listed in #1217. Removing it would also help reduce the total number of test cases under limited resources.

I think we only fallback the test to modelwise offloading if this model does not support layerwise. So the number of tests don't change. But please also see my following comment

@fhfuih
Copy link
Copy Markdown
Contributor

fhfuih commented Mar 20, 2026

@lcukyfuture apologize for any confusion caused, but after some internal discussion, we just decided that we should reduce the number of test cases for not-high-priority models. Could you help settle a recommended feature combination for this model, and edit the test script to only include that feature combination? If you need help finding a good combination of diffusion features, see if hsliuustc0106/vllm-omni-skills#19 this AI skill can help, or search relevant PRs in this repo that introduces this model or relevant features (for any example code snippets).

And @NumberWan you can also coordinate on this matter

@lcukyfuture
Copy link
Copy Markdown
Contributor Author

ok, I will check this. 👍

@NumberWan
Copy link
Copy Markdown
Contributor

@lcukyfuture After my local testing on the server and reviewing the merged PR, I think these two feature combinations are the best options for LongCat-Image:

  • (1 GPU) Model-level (sequential) CPU offloading only (--enable-cpu-offload)
  • (2 GPUs) Cache-DiT + Ulysses-SP (ulysses-degree=2)

Since only these cases cover the 1-GPU and 2-GPU configurations, and in my runs Ulysses-SP (2) showed the shortest generation latency.

So the current PR can reduce the number of tests to 2.

@NumberWan
Copy link
Copy Markdown
Contributor

@lcukyfuture Just a heads-up: the related PR for LongCat-Image-Edit has been created. #2035

@NumberWan
Copy link
Copy Markdown
Contributor

#2035
@lcukyfuture Please use this PR as a reference so we can align both LongCat-Image and LongCat-Image-Edit model test cases.
The main difference between the 2 PR should be only the target resolution. If you have any opinions, please let me know so we can discuss.

@fhfuih
Copy link
Copy Markdown
Contributor

fhfuih commented Mar 24, 2026

#2035 @lcukyfuture Please use this PR as a reference so we can align both LongCat-Image and LongCat-Image-Edit model test cases. The main difference between the 2 PR should be only the target resolution. If you have any opinions, please let me know so we can discuss.

To add, we also plan to reduce the number of test cases for not-high priority models. So you two can also settle the feature combinations to test within 1~2 test cases

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>
@lcukyfuture lcukyfuture force-pushed the test/longcat-image-l4-diffusion branch from 33d30a7 to f8850b8 Compare March 24, 2026 06:43
@lcukyfuture
Copy link
Copy Markdown
Contributor Author

@NumberWan I have modified the test cases based on your PR, and it's fine in my local tests.

Comment thread .buildkite/pipeline.yml Outdated
if: build.branch != "main"
commands:
- buildkite-agent pipeline upload .buildkite/test-ready.yml
- buildkite-agent pipeline upload .buildkite/test-nightly.yml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remember to change it back before merging

@NumberWan
Copy link
Copy Markdown
Contributor

@hsliuustc0106 @Gaohan123 Please add a ready label for the CI test

Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>
@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label Mar 24, 2026
@fhfuih
Copy link
Copy Markdown
Contributor

fhfuih commented Mar 24, 2026

@lcukyfuture Please comment out the if: build.env("NIGHTLY") == "1" line in Diffusion Test yml so that it is not skipped. Plus, there are two if: build.env("NIGHTLY") == "1" lines in this YAML, could you help remove one?

Your latest temporary nightly CI did not run anything 😂 https://buildkite.com/vllm/vllm-omni/builds/4978/steps/canvas

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>
Remove duplicate `if: build.env("NIGHTLY") == "1"` and commands block
that was introduced by a bad conflict resolution on GitHub. Keep the
commented-out `if` so the Diffusion Test runs on non-nightly builds.

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>
@lcukyfuture lcukyfuture force-pushed the test/longcat-image-l4-diffusion branch from 839864a to bfe01d6 Compare March 24, 2026 11:18
@lcukyfuture
Copy link
Copy Markdown
Contributor Author

Sorry, it seems I made a mistake while handling the conflict. I'll fix it. 😢

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>
@lcukyfuture lcukyfuture force-pushed the test/longcat-image-l4-diffusion branch from 392ca19 to 18e2902 Compare March 24, 2026 11:43
Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>
fhfuih
fhfuih previously requested changes Mar 25, 2026
Copy link
Copy Markdown
Contributor

@fhfuih fhfuih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I can see that https://buildkite.com/vllm/vllm-omni/builds/4996/steps/canvas CI passes.

Please change the pipeline YAMLs back, and it will be ready to merge

Copy link
Copy Markdown
Contributor

@NumberWan NumberWan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@Gaohan123 Gaohan123 merged commit 0849c5e into vllm-project:main Mar 25, 2026
7 of 8 checks passed
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
…1970)

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>
Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
…1970)

Signed-off-by: lcukyfuture <zlf994478451@outlook.com>
Signed-off-by: Lingfeng Zhang <48312954+lcukyfuture@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants