[Feature]Adding decode vae patch parallel supports for LTX-2#2135
[Feature]Adding decode vae patch parallel supports for LTX-2#2135erfgss wants to merge 17 commits into
Conversation
Signed-off-by: Chen Yang <2082464740@qq.com>
|
add unit test and output video comparison |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c98eb84932
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| timestep: torch.Tensor | None = None, | ||
| return_dict: bool = True, | ||
| *args: Any, |
There was a problem hiding this comment.
Preserve base decode argument order for causal flag
AutoencoderKLLTX2Video.decode takes (z, temb=None, causal=None, return_dict=True), but this override changes the positional order to (z, timestep=None, return_dict=True, ...). Any caller that passes causal positionally (for example decode(z, temb, False)) will now set return_dict=False instead, silently changing the return type and leaving causal unset. This is a behavioral regression in the public method contract and can break wrappers that rely on the original positional API.
Useful? React with 👍 / 👎.
|
update LTX-2 image-to-video also, and should update vllm-omni/docs/user_guide/diffusion_acceleration.md |
I will update these |
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
|
@Bounty-hunter ptal thx |
| ) -> torch.Tensor: | ||
| """Decode a single latent tile into video space.""" | ||
| tile = task.tensor | ||
| if hasattr(self, "clear_cache"): |
There was a problem hiding this comment.
There was a problem hiding this comment.
| dec = torch.clamp(dec, min=-1.0, max=1.0) | ||
| return dec | ||
|
|
||
| def patch_split(self, z: torch.Tensor) -> tuple[list[TileTask], GridSpec]: |
There was a problem hiding this comment.
Have you evaluated the performance gain from patch splitting? When the height and width are small, the splited size (+blend) is almost equal to the total size, so the performance improvement may be limited? In this scenario, using temporal tiled decode parallel might be a better choice? https://github.com/huggingface/diffusers/blob/f2be8bd6b3dc4035bd989dc467f15d86bf3c9c12/src/diffusers/models/autoencoders/autoencoder_kl_ltx2.py#L1497
There was a problem hiding this comment.
Have you evaluated the performance gain from patch splitting? When the height and width are small, the splited size (+blend) is almost equal to the total size, so the performance improvement may be limited? In this scenario, using temporal tiled decode parallel might be a better choice? https://github.com/huggingface/diffusers/blob/f2be8bd6b3dc4035bd989dc467f15d86bf3c9c12/src/diffusers/models/autoencoders/autoencoder_kl_ltx2.py#L1497
When 24 frames of video are generated, temporal tiled decoding does not bring obvious gains, but instead increases the overhead.
|
A recent PR changed the diffusion features docs strucure. Pls PTAL #1928. |
wtomin
left a comment
There was a problem hiding this comment.
LGTM. Pls resolve the conflicts.
|
Follow up pr can refer to #2368 |
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
Signed-off-by: erfgss <97771661+erfgss@users.noreply.github.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR adds support for VAE patch parallelism in the LTX-2 text-to-video pipeline.
By enabling distributed VAE decoding when
--vae-patch-parallel-size > 1, this change improves multi-GPU utilization and reduces VAE decode latency for video generation workloads.Test Plan
tensor-parallel-size=2--vae-use-tiling--vae-patch-parallel-size=1--vae-patch-parallel-size=2Test Result
text-to-video VAE Patch Parallel Size=1
ltx2_t2v_diffvae1.mp4
text-to-video VAE Patch Parallel Size=2
ltx2_t2v_diffvae2.mp4
image-to-video VAE Patch Parallel Size=1
ltx2_i2v_vae1.mp4
image-to-video VAE Patch Parallel Size=2
ltx2_i2v_vae2.mp4
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)