Skip to content

[Test] L4 complete diffusion feature test for Wan2.2 models#2087

Merged
david6666666 merged 9 commits into
vllm-project:mainfrom
bjf-frz:wan22_l4_test
Mar 26, 2026
Merged

[Test] L4 complete diffusion feature test for Wan2.2 models#2087
david6666666 merged 9 commits into
vllm-project:mainfrom
bjf-frz:wan22_l4_test

Conversation

@bjf-frz
Copy link
Copy Markdown
Contributor

@bjf-frz bjf-frz commented Mar 23, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR adds L4 test for Wan2.2 models, including Wan2.2-T2V-A14B-Diffusers, Wan2.2-I2V-A14B-Diffusers, Wan2.2-TI2V-5B-Diffusers

test features:

  • Cache-DiT
  • CFG-Parallel
  • Ulysses-SP
  • Tensor-Parallel
  • VAE-Patch-Parallel
  • HSDP
  • Ring-Attn

Test Plan

pytest -v -s ./tests/e2e/online_serving/test_wan22_expansion.py -m advanced_model --run-level=advanced_model
All 18 tests passed

Test Result

================================================================================================================================================= warnings summary ==================================================================================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

../../.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: 14 warnings
  /xxx/.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

../../.venv/lib/python3.12/site-packages/_pytest/config/__init__.py:1428
  /xxx/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py:1428: PytestConfigWarning: Unknown config option: asyncio_mode
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================================================================================== 18 passed, 17 warnings in 3242.63s (0:54:02) ====================================================================================================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26acffd86b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/conftest.py Outdated
extra_body=extra_body,
modalities=modalities,
)
if extra_body.get("num_frames", None): # videos
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle missing extra_body before calling .get()

send_diffusion_request now unconditionally evaluates extra_body.get("num_frames", None), but several existing diffusion tests call this helper without extra_body (for example the Bagel online tests that only send model/messages). In those cases this raises AttributeError before any request is sent, which is a regression from the previous behavior where image requests without extra_body worked.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@fhfuih fhfuih Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This AI comment may be OK to pass, because this payload is internal for tests, and extra_body should be always there with a video generation request.

BUT, since the video generation API logic is quite different, I think maybe it is clearer to define a dedicated send_video_request. This also prevents less elegant branching, like using if extra_body.get("num_frames", None) to determine if it is a video request---what if another future test needs to call this function with default num_frames value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently not appropriate to determine whether the video API should be used in this scenario. As the video API differs significantly from the OpenAI API, I have refactored the logic, separating send_video_diffusion_request from the original send_diffusion_request.

Comment thread tests/conftest.py Outdated
Comment on lines +1792 to +1794
except Exception as e:
result.success = False
result.error_message = f"Diffusion response processing error: {str(e)}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-raise video request failures instead of swallowing them

The video path catches all exceptions (including failed HTTP calls and assertion failures), sets result.success = False, and then returns without raising; callers of send_diffusion_request in this suite do not inspect the returned objects, so these failures become false-positive test passes. This makes the new Wan/Hunyuan video tests unable to fail when generation or validation breaks.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread tests/conftest.py Outdated

# Validate against expectations
if num_frames is not None:
expected_num_frames = ((num_frames + 3) // 4) * 4 + 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct frame-count normalization in video assertions

The expected frame calculation ((num_frames + 3) // 4) * 4 + 1 rounds up to the next 4k+1, but Wan/Hunyuan frame normalization keeps values already equal to 4k+1 and otherwise rounds down to num_frames // 4 * 4 + 1. With the current formula, valid outputs for inputs like num_frames=5 or 9 are incorrectly rejected (expected 9 or 13 instead of 5 or 9).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: bjf-frz <frz123db@gmail.com>
@yenuo26
Copy link
Copy Markdown
Collaborator

yenuo26 commented Mar 23, 2026

please attach test result in CI, before merging

@david6666666
Copy link
Copy Markdown
Collaborator

@yenuo26 @SamitHuang PTAL, thx

@bjf-frz
Copy link
Copy Markdown
Contributor Author

bjf-frz commented Mar 23, 2026

please attach test result in CI, before merging

done

pytest.param(
OmniServerParams(
model=model_path,
server_args=["--cache-backend", "cache_dit"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WAN also support layerwise offloading. Can turn on this feature as well (combined in a single test case) to ensure it runs fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, already added like this:

server_args=["--cache-backend", "cache_dit", "--enable-layerwise-offload"],

("cfg_parallel", ["--cfg-parallel-size", "2"]),
("ulysses_sp", ["--usp", "2"]),
("tensor_parallel", ["--tensor-parallel-size", "2"]),
("vae_patch", ["--vae-patch-parallel-size", "2"]),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine VAE patch parallel with tensor parallel to reduce test cases (see my example test case design in #1217 )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added like this:

    ("tp_vae_patch", ["--tensor-parallel-size", "2", "--vae-patch-parallel-size", "2"]),


PARALLEL_CONFIGS = [
("cfg_parallel", ["--cfg-parallel-size", "2"]),
("ulysses_sp", ["--usp", "2"]),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also test ring

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment thread tests/conftest.py
if stream:
raise NotImplementedError("Streaming is not currently implemented for diffusion model e2e test")

if request_num == 1:
Copy link
Copy Markdown
Contributor

@fhfuih fhfuih Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you accidentally remove this if clause to determine single request or concurrent requests? Seems now image generation requests will be routed to the else part, always sending concurrent requests even if there is only one request Sorry, I see that you do preserve the original logic below

Comment thread tests/conftest.py Outdated
modalities=modalities,
)
if extra_body.get("num_frames", None): # videos
sys_prompt, user_prompt, vids, imgs, auds = extract_params_from_messages(messages)
Copy link
Copy Markdown
Contributor

@fhfuih fhfuih Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request_config (the function param) is just a custom dict structure that maximize the convenience of test execution. (It never has to resemble OpenAI API protocol anyways.) So, if your request config is too complicated such that you need to extra params from messages, then you don't need to construct an OpenAI-compliant message at the first place.

Plus, many variables in this function return value is not used at all: sys_promtps, vids, auds.

Try simplify this request_config payload part. For example, what about a "form_data": {...} directly inside request_config? This way, you can get rid of extract_params_from_messages, and all the dynamic form_data construction below (those about boundary ratio and flow_shift)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored the request_confign to include form_data directly, eliminating unnecessary indirection during request processing. Additionally, the asset validation logic has been streamlined: assert_diffusion_response now dispatches to type-specific handlers (assert_video_diffusion_response and assert_image_diffusion_response), which in turn call dedicated validators (assert_video_valid and assert_image_valid). This separation of concerns makes the structure clearer and more maintainable.

Comment thread tests/conftest.py Outdated
futures.append(future)
try:
# create_and_poll includes (POST /v1/videos) and poll req (GET /v1/videos/{video_id})
create_url = f"{self.base_url}//v1/videos"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
create_url = f"{self.base_url}//v1/videos"
create_url = f"{self.base_url}/v1/videos"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the careful review. What's surprising, though, is that it runs successfully without throwing any errors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is valid URL syntax. Just not that "elegant" 😏

Copy link
Copy Markdown
Contributor

@fhfuih fhfuih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Just left some comments about the code design and feature coverage. Wan 2.2 & video generation is indeed quite different in terms of API protocols. I think there are better ways to maintain clarity of the codes

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKER scan:

  • Correctness: ISSUES: the MoE-only Wan2.2 branch sets guidance_scale_high, but the /v1/videos API and VideoGenerationRequest consume guidance_scale_2, so this test does not exercise the intended high-noise CFG path.
  • Reliability/Safety: PASS
  • Breaking Changes: PASS
  • Test Coverage: PASS (PR body includes command + 18 passing runs), but the MoE-specific coverage is incomplete because of the request-field mismatch above.
  • Documentation: PASS (test-only PR)
  • Security: PASS

OVERALL: 1 BLOCKER FOUND

VERDICT: REQUEST_CHANGES

I validated the new /v1/videos-based test harness, the request schema in the video API, and the Wan2.2 test matrix. The main remaining issue is that the I2V-A14B MoE case currently sends the wrong form field name, so the new test can pass without covering the intended MoE-specific guidance behavior.

if is_moe_model:
form_data.update(
{
"guidance_scale_high": 1.0,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/v1/videos expects the Wan2.2 high-noise guidance field as guidance_scale_2 (VideoGenerationRequest + create_video), but this test sends guidance_scale_high. That means the MoE-only branch here never actually exercises the intended high-noise CFG path even though the test still passes. Could you switch this key to guidance_scale_2 so the I2V-A14B case validates the behavior this PR is trying to cover?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
Comment thread tests/conftest.py
if image_url and image_url.startswith("data:image"):
b64_data = image_url.split(",", 1)[1]
img = decode_b64_image(b64_data)
images.append(img)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhfuih type of item here is a dict not an obj like

{'type': 'image_url', 'image_url': {'url': 'data:image/png;base64,iVBxxx='}, 'stage_durations': {}}

I've added a dict-type check to bypass this issue or the subsequent assert would not actually be executed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops thanks very much

Copy link
Copy Markdown
Contributor

@fhfuih fhfuih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@david6666666
Copy link
Copy Markdown
Collaborator

@yenuo26 PTAL thx

@david6666666 david6666666 added the ready label to trigger buildkite CI label Mar 24, 2026
@david6666666
Copy link
Copy Markdown
Collaborator

LGTM now

Signed-off-by: bjf-frz <frz123db@gmail.com>
@bjf-frz
Copy link
Copy Markdown
Contributor Author

bjf-frz commented Mar 25, 2026

Per our offline discussion with @Gaohan123 & @congw729 , could you try somehow redesign the feature & modality combinations and reduce the number of test cases to save some time? Wan 2.2 indeed has covered many modalities, making the test design more complex

Case assignment has been configured to be randomized, with a random seed generated each time based on the current system timestamp.

@david6666666 david6666666 added nightly-test label to trigger buildkite nightly test CI and removed ready label to trigger buildkite CI labels Mar 26, 2026
@david6666666 david6666666 added ready label to trigger buildkite CI nightly-test label to trigger buildkite nightly test CI and removed nightly-test label to trigger buildkite nightly test CI ready label to trigger buildkite CI labels Mar 26, 2026
Signed-off-by: bjf-frz <frz123db@gmail.com>
@david6666666 david6666666 added nightly-test label to trigger buildkite nightly test CI and removed nightly-test label to trigger buildkite nightly test CI labels Mar 26, 2026
@david6666666 david6666666 added ready label to trigger buildkite CI and removed nightly-test label to trigger buildkite nightly test CI labels Mar 26, 2026
@david6666666
Copy link
Copy Markdown
Collaborator

LGTM and test passed.

@david6666666 david6666666 enabled auto-merge (squash) March 26, 2026 08:24
if: build.env("NIGHTLY") == "1" || build.pull_request.labels includes "nightly-test"
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -s -v tests/e2e/online_serving/test_wan22_expansion.py -m "advanced_model" --run-level "advanced_model"
Copy link
Copy Markdown
Collaborator

@yenuo26 yenuo26 Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just modifying here maybe cause both "Diffusion Model Wan22 completed Test with H100" and "Diffusion Model Test with H100" to run this test case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the diffusion mark in the wan22 test script, it won't run in Diffusion Model Test with H100.

Copy link
Copy Markdown
Collaborator

@yenuo26 yenuo26 Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can change to use pytest --ignore in subsequent PR? I think not adding the diffusion tag may cause some statistical issues later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@david6666666 david6666666 merged commit 2dde219 into vllm-project:main Mar 26, 2026
8 checks passed
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
…ject#2087)

Signed-off-by: bjf-frz <frz123db@gmail.com>

Signed-off-by: Zhang <jianmusings@gmail.com>
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants