[Perf] Optimize Wan2.2 device free on image preprocess#2852
[Perf] Optimize Wan2.2 device free on image preprocess#2852gcanlin merged 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: fan2956 <zhoufan53@huawei.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Signed-off-by: fan2956 <zhoufan53@huawei.com>
| video_processor = VideoProcessor(vae_scale_factor=self.vae_scale_factor_spatial) | ||
|
|
||
| if isinstance(image, PIL.Image.Image): | ||
| image = TF.to_tensor(image).to(device) |
There was a problem hiding this comment.
Can't we just move tensor to device instead of introducing TF?
There was a problem hiding this comment.
Can't we just move tensor to device instead of introducing
TF?
If we do not use TF.to_tensor, we need to manually implement a function to achieve the same functionality.
gcanlin
left a comment
There was a problem hiding this comment.
I don't have other concerns. Just confirm one question.
| # Handle last_image if provided | ||
| if last_image is not None: | ||
| if isinstance(last_image, PIL.Image.Image): | ||
| image = TF.to_tensor(image).to(device) |
There was a problem hiding this comment.
This sentence is redundant as above. The image here is already a tensor on the device.
There was a problem hiding this comment.
This sentence is redundant as above. The
imagehere is already a tensor on the device.
done
| # Handle last_image if provided | ||
| if last_image is not None: | ||
| if isinstance(last_image, PIL.Image.Image): | ||
| image = TF.to_tensor(image).to(device) |
There was a problem hiding this comment.
This should convert last_image, not image. At this point the first branch may already have reassigned image = TF.to_tensor(image), so when both image and last_image are PIL inputs this line will try to run to_tensor() on an existing tensor and fail before preprocessing the last frame. Can you switch this to last_image = TF.to_tensor(last_image).to(device)?
lishunyang12
left a comment
There was a problem hiding this comment.
Review: [Perf] Optimize Wan2.2 device free on image preprocess
Summary
The mask_lat_size optimization (change 3) is correct and clean — creating the tensor directly on device and using native slicing instead of list(range(...)) is a straightforward improvement.
However, there is a copy-paste bug in the image preprocessing changes that needs to be fixed before merge.
Bug: Wrong variable in last_image branch (line ~491)
if last_image is not None:
if isinstance(last_image, PIL.Image.Image):
image = TF.to_tensor(image).to(device) # BUG: should be `last_image`
last_image_tensor = video_processor.preprocess(last_image, height=height, width=width)This line converts image (not last_image) to a tensor. At this point in the code, image has already been processed in the block above, so this:
- Does not achieve the intended optimization —
last_imageis still a PIL Image when passed tovideo_processor.preprocess, so it will still be preprocessed on CPU. - Overwrites
imagewith a rawTF.to_tensorresult (a [C,H,W] tensor with no resize/normalize), clobbering the value that was set earlier. Whileimagemay not be used after this point, it is still incorrect and fragile.
The fix should be:
last_image = TF.to_tensor(last_image).to(device)
last_image_tensor = video_processor.preprocess(last_image, height=height, width=width)Minor concern: double conversion
For the first image path, TF.to_tensor(image) converts the PIL Image to a float [0,1] tensor. Then video_processor.preprocess(image, ...) receives this tensor. Diffusers' VideoProcessor.preprocess has its own PIL-to-tensor path and normalization logic. Please verify that passing a pre-converted tensor doesn't cause unexpected behavior (e.g., double normalization from [0,1] to [-1,1] twice, or shape mismatches since TF.to_tensor produces [C,H,W] while VideoProcessor may expect [B,C,H,W] or PIL input).
If the goal is simply to avoid CPU-side work in VideoProcessor.preprocess, an alternative would be to just do image_tensor = video_processor.preprocess(image, ...).to(device) and move the result to device immediately, rather than introducing a separate TF.to_tensor call before it.
mask_lat_size changes (LGTM)
The changes in prepare_latents are correct:
- Adding
device=latent_condition.deviceavoids creating on CPU then moving. - Replacing
list(range(1, num_frames))with1:is more idiomatic and avoids materializing a Python list.
Both are good micro-optimizations consistent with the ~3% e2e improvement reported.
Verdict
Please fix the copy-paste bug on the last_image branch and verify the TF.to_tensor + VideoProcessor.preprocess interaction doesn't cause double normalization.
|
BLOCKER scan:
OVERALL: 1 BLOCKER FOUND VERDICT: REQUEST_CHANGES There is a copy-paste error on line 491: if isinstance(last_image, PIL.Image.Image):
image = TF.to_tensor(image).to(device) # BUG: should be `last_image`, not `image`
last_image_tensor = video_processor.preprocess(last_image, ...)This means Please fix line 491 to: last_image = TF.to_tensor(last_image).to(device)The rest of the changes look good - using tensor slicing instead of list(range(...)) is the right approach to avoid CPU fallback. |
Signed-off-by: fan2956 <zhoufan53@huawei.com>
done |
…2852) Signed-off-by: fan2956 <zhoufan53@huawei.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR fixes two critical issues in pipeline_wan2_2_i2v.py to prevent CPU fallback and device mismatches:
No functional changes are made to the original logic. The changes improve inference stability across hardware platforms.
Test Plan
Test Result
before:
e2e_total_ms = 9,837.386
after:
e2e_total_ms = 9,512.703
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)