[Test] L4 complete diffusion feature test for Wan2.2 models#2087
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26acffd86b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| extra_body=extra_body, | ||
| modalities=modalities, | ||
| ) | ||
| if extra_body.get("num_frames", None): # videos |
There was a problem hiding this comment.
Handle missing extra_body before calling .get()
send_diffusion_request now unconditionally evaluates extra_body.get("num_frames", None), but several existing diffusion tests call this helper without extra_body (for example the Bagel online tests that only send model/messages). In those cases this raises AttributeError before any request is sent, which is a regression from the previous behavior where image requests without extra_body worked.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This AI comment may be OK to pass, because this payload is internal for tests, and extra_body should be always there with a video generation request.
BUT, since the video generation API logic is quite different, I think maybe it is clearer to define a dedicated send_video_request. This also prevents less elegant branching, like using if extra_body.get("num_frames", None) to determine if it is a video request---what if another future test needs to call this function with default num_frames value?
There was a problem hiding this comment.
It is currently not appropriate to determine whether the video API should be used in this scenario. As the video API differs significantly from the OpenAI API, I have refactored the logic, separating send_video_diffusion_request from the original send_diffusion_request.
| except Exception as e: | ||
| result.success = False | ||
| result.error_message = f"Diffusion response processing error: {str(e)}" |
There was a problem hiding this comment.
Re-raise video request failures instead of swallowing them
The video path catches all exceptions (including failed HTTP calls and assertion failures), sets result.success = False, and then returns without raising; callers of send_diffusion_request in this suite do not inspect the returned objects, so these failures become false-positive test passes. This makes the new Wan/Hunyuan video tests unable to fail when generation or validation breaks.
Useful? React with 👍 / 👎.
|
|
||
| # Validate against expectations | ||
| if num_frames is not None: | ||
| expected_num_frames = ((num_frames + 3) // 4) * 4 + 1 |
There was a problem hiding this comment.
Correct frame-count normalization in video assertions
The expected frame calculation ((num_frames + 3) // 4) * 4 + 1 rounds up to the next 4k+1, but Wan/Hunyuan frame normalization keeps values already equal to 4k+1 and otherwise rounds down to num_frames // 4 * 4 + 1. With the current formula, valid outputs for inputs like num_frames=5 or 9 are incorrectly rejected (expected 9 or 13 instead of 5 or 9).
Useful? React with 👍 / 👎.
Signed-off-by: bjf-frz <frz123db@gmail.com>
|
please attach test result in CI, before merging |
|
@yenuo26 @SamitHuang PTAL, thx |
done |
| pytest.param( | ||
| OmniServerParams( | ||
| model=model_path, | ||
| server_args=["--cache-backend", "cache_dit"], |
There was a problem hiding this comment.
WAN also support layerwise offloading. Can turn on this feature as well (combined in a single test case) to ensure it runs fine.
There was a problem hiding this comment.
done, already added like this:
server_args=["--cache-backend", "cache_dit", "--enable-layerwise-offload"],
| ("cfg_parallel", ["--cfg-parallel-size", "2"]), | ||
| ("ulysses_sp", ["--usp", "2"]), | ||
| ("tensor_parallel", ["--tensor-parallel-size", "2"]), | ||
| ("vae_patch", ["--vae-patch-parallel-size", "2"]), |
There was a problem hiding this comment.
Combine VAE patch parallel with tensor parallel to reduce test cases (see my example test case design in #1217 )
There was a problem hiding this comment.
done, added like this:
("tp_vae_patch", ["--tensor-parallel-size", "2", "--vae-patch-parallel-size", "2"]),
|
|
||
| PARALLEL_CONFIGS = [ | ||
| ("cfg_parallel", ["--cfg-parallel-size", "2"]), | ||
| ("ulysses_sp", ["--usp", "2"]), |
| if stream: | ||
| raise NotImplementedError("Streaming is not currently implemented for diffusion model e2e test") | ||
|
|
||
| if request_num == 1: |
There was a problem hiding this comment.
Did you accidentally remove this Sorry, I see that you do preserve the original logic belowif clause to determine single request or concurrent requests? Seems now image generation requests will be routed to the else part, always sending concurrent requests even if there is only one request
| modalities=modalities, | ||
| ) | ||
| if extra_body.get("num_frames", None): # videos | ||
| sys_prompt, user_prompt, vids, imgs, auds = extract_params_from_messages(messages) |
There was a problem hiding this comment.
request_config (the function param) is just a custom dict structure that maximize the convenience of test execution. (It never has to resemble OpenAI API protocol anyways.) So, if your request config is too complicated such that you need to extra params from messages, then you don't need to construct an OpenAI-compliant message at the first place.
Plus, many variables in this function return value is not used at all: sys_promtps, vids, auds.
Try simplify this request_config payload part. For example, what about a "form_data": {...} directly inside request_config? This way, you can get rid of extract_params_from_messages, and all the dynamic form_data construction below (those about boundary ratio and flow_shift)
There was a problem hiding this comment.
I have refactored the request_confign to include form_data directly, eliminating unnecessary indirection during request processing. Additionally, the asset validation logic has been streamlined: assert_diffusion_response now dispatches to type-specific handlers (assert_video_diffusion_response and assert_image_diffusion_response), which in turn call dedicated validators (assert_video_valid and assert_image_valid). This separation of concerns makes the structure clearer and more maintainable.
| futures.append(future) | ||
| try: | ||
| # create_and_poll includes (POST /v1/videos) and poll req (GET /v1/videos/{video_id}) | ||
| create_url = f"{self.base_url}//v1/videos" |
There was a problem hiding this comment.
| create_url = f"{self.base_url}//v1/videos" | |
| create_url = f"{self.base_url}/v1/videos" |
There was a problem hiding this comment.
Thanks for the careful review. What's surprising, though, is that it runs successfully without throwing any errors.
There was a problem hiding this comment.
Yes, it is valid URL syntax. Just not that "elegant" 😏
fhfuih
left a comment
There was a problem hiding this comment.
Thanks for the contribution. Just left some comments about the code design and feature coverage. Wan 2.2 & video generation is indeed quite different in terms of API protocols. I think there are better ways to maintain clarity of the codes
hsliuustc0106
left a comment
There was a problem hiding this comment.
BLOCKER scan:
- Correctness: ISSUES: the MoE-only Wan2.2 branch sets
guidance_scale_high, but the/v1/videosAPI andVideoGenerationRequestconsumeguidance_scale_2, so this test does not exercise the intended high-noise CFG path. - Reliability/Safety: PASS
- Breaking Changes: PASS
- Test Coverage: PASS (PR body includes command + 18 passing runs), but the MoE-specific coverage is incomplete because of the request-field mismatch above.
- Documentation: PASS (test-only PR)
- Security: PASS
OVERALL: 1 BLOCKER FOUND
VERDICT: REQUEST_CHANGES
I validated the new /v1/videos-based test harness, the request schema in the video API, and the Wan2.2 test matrix. The main remaining issue is that the I2V-A14B MoE case currently sends the wrong form field name, so the new test can pass without covering the intended MoE-specific guidance behavior.
| if is_moe_model: | ||
| form_data.update( | ||
| { | ||
| "guidance_scale_high": 1.0, |
There was a problem hiding this comment.
/v1/videos expects the Wan2.2 high-noise guidance field as guidance_scale_2 (VideoGenerationRequest + create_video), but this test sends guidance_scale_high. That means the MoE-only branch here never actually exercises the intended high-noise CFG path even though the test still passes. Could you switch this key to guidance_scale_2 so the I2V-A14B case validates the behavior this PR is trying to cover?
Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
| if image_url and image_url.startswith("data:image"): | ||
| b64_data = image_url.split(",", 1)[1] | ||
| img = decode_b64_image(b64_data) | ||
| images.append(img) |
There was a problem hiding this comment.
@fhfuih type of item here is a dict not an obj like
{'type': 'image_url', 'image_url': {'url': 'data:image/png;base64,iVBxxx='}, 'stage_durations': {}}
I've added a dict-type check to bypass this issue or the subsequent assert would not actually be executed.
|
@yenuo26 PTAL thx |
Signed-off-by: bjf-frz <frz123db@gmail.com>
|
LGTM now |
Signed-off-by: bjf-frz <frz123db@gmail.com>
Case assignment has been configured to be randomized, with a random seed generated each time based on the current system timestamp. |
Signed-off-by: bjf-frz <frz123db@gmail.com>
|
LGTM and test passed. |
| if: build.env("NIGHTLY") == "1" || build.pull_request.labels includes "nightly-test" | ||
| commands: | ||
| - export VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| - pytest -s -v tests/e2e/online_serving/test_wan22_expansion.py -m "advanced_model" --run-level "advanced_model" |
There was a problem hiding this comment.
I think just modifying here maybe cause both "Diffusion Model Wan22 completed Test with H100" and "Diffusion Model Test with H100" to run this test case.
There was a problem hiding this comment.
I removed the diffusion mark in the wan22 test script, it won't run in Diffusion Model Test with H100.
There was a problem hiding this comment.
Maybe you can change to use pytest --ignore in subsequent PR? I think not adding the diffusion tag may cause some statistical issues later.
…ject#2087) Signed-off-by: bjf-frz <frz123db@gmail.com> Signed-off-by: Zhang <jianmusings@gmail.com>
…ject#2087) Signed-off-by: bjf-frz <frz123db@gmail.com>
…ject#2087) Signed-off-by: bjf-frz <frz123db@gmail.com>
…ject#2087) Signed-off-by: bjf-frz <frz123db@gmail.com>
…ject#2087) Signed-off-by: bjf-frz <frz123db@gmail.com>
…ject#2087) Signed-off-by: bjf-frz <frz123db@gmail.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR adds L4 test for Wan2.2 models, including Wan2.2-T2V-A14B-Diffusers, Wan2.2-I2V-A14B-Diffusers, Wan2.2-TI2V-5B-Diffusers
test features:
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)