[Model] Add HSDP support for LTX-2 by fywc · Pull Request #2899 · vllm-project/vllm-omni

fywc · 2026-04-18T08:00:21Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add LTX-2 HSDP support.

This PR resolves the LTX-2 gap in HSDP support from RFC #1217 by:

defining shard conditions for LTX2VideoTransformer3DModel so HSDP can shard repeated transformer blocks
integrating LTX-2 into the existing FSDP2-based HSDP flow without changing HSDP core logic
updating tests and documentation to reflect LTX-2 HSDP support

Test Plan

Test Result

INFO 04-18 07:58:31 [diffusers_loader.py:324] Loading weights took 2.77 seconds
INFO 04-18 07:58:31 [hsdp.py:128] HSDP Inference: replicate_size=1, shard_size=2, world_size=2, rank=0, fs_world_size=2, fs_rank=0
INFO 04-18 07:58:34 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
INFO 04-18 07:58:37 [hsdp.py:128] HSDP Inference: replicate_size=1, shard_size=2, world_size=2, rank=1, fs_world_size=2, fs_rank=1
INFO 04-18 07:58:39 [hsdp.py:202] Sharded 912 modules + root
INFO 04-18 07:58:39 [hsdp.py:173] HSDP applied to model: FSDPLTX2VideoTransformer3DModel
INFO 04-18 07:58:40 [diffusion_model_runner.py:142] Model loading took 45.5504 GiB and 23.878847 seconds
INFO 04-18 07:58:40 [diffusion_model_runner.py:147] Model runner: Model loaded successfully.
INFO 04-18 07:58:40 [diffusion_model_runner.py:188] Model runner: Initialization complete.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: hanzheli <hanzheli@kuaishou.com>

chatgpt-codex-connector · 2026-04-18T08:00:27Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-18T09:00:44Z

can you profile with HSDP and check whether there are some free bubble in the first a few blocks with FSDP on? We notice some similar issues with WAN 2.2

gcanlin

Only one issue needs to be fixed. Otherwise LGTM.

hsliuustc0106 · 2026-04-18T09:22:37Z

Well-structured PR adding HSDP support to LTX-2. Good test coverage and documentation.

Signed-off-by: hanzheli <hanzheli@kuaishou.com>

fywc · 2026-04-18T11:19:30Z

can you profile with HSDP and check whether there are some free bubble in the first a few blocks with FSDP on? We notice some similar issues with WAN 2.2

I’ll profile LTX-2 with HSDP enabled for this problem as soon as possible, is there any related issues already?

Signed-off-by: hanzheli <hanzheli@kuaishou.com>

fywc · 2026-04-18T15:20:40Z

can you profile with HSDP and check whether there are some free bubble in the first a few blocks with FSDP on? We notice some similar issues with WAN 2.2

I’ll profile LTX-2 with HSDP enabled for this problem as soon as possible, is there any related issues already?

Hi，I did a profiling analysis on LTX-2 to investigate this problem, not sure if this is helpful or matches what you have observed with WAN2.2 ?

Test environment:

2 × NVIDIA H200 GPU
HSDP shard size = 2

Test Flow：

Initialize the LTX-2 model with & without HSDP.
Send one request to warmup the model.
Profile the next one request and analysis data.

Key findings:

Enabling HSDP introduces a noticeable latency increase at the block level:

Per-block latency increases from ~10 ms → ~27 ms (~2.5–3x).
The overhead is consistent across all transformer blocks, not a one-off effect.

Breaking down the block:

Both Attention and FFN stages are slower (~2–3x).
The main overhead comes from RowParallelLinear / ColumnParallel ops, each increasing from ~100 µs → ~1.2–1.4 ms.

According to the stream timeline :

HSDP introduces additional NCCL all_gather streams.
However, these comm streams are not effectively overlapped with compute, leading to visible pipeline bubbles.

As a result, Maybe the free bubbles are mainly due to lack of comm/compute overlap, rather than compute inefficiency itself.

PS:
2 snapshots showing timeline with one block:

without HSDP

with HSDP

hsliuustc0106 · 2026-04-19T13:34:16Z

can you profile with HSDP and check whether there are some free bubble in the first a few blocks with FSDP on? We notice some similar issues with WAN 2.2

I’ll profile LTX-2 with HSDP enabled for this problem as soon as possible, is there any related issues already?

Hi，I did a profiling analysis on LTX-2 to investigate this problem, not sure if this is helpful or matches what you have observed with WAN2.2 ?

Test environment:

2 × NVIDIA H200 GPU

HSDP shard size = 2

Test Flow：

Initialize the LTX-2 model with & without HSDP.

Send one request to warmup the model.

Profile the next one request and analysis data.

Key findings:

Enabling HSDP introduces a noticeable latency increase at the block level:

Per-block latency increases from ~10 ms → ~27 ms (~2.5–3x).

The overhead is consistent across all transformer blocks, not a one-off effect.

Breaking down the block:

Both Attention and FFN stages are slower (~2–3x).

The main overhead comes from RowParallelLinear / ColumnParallel ops, each increasing from ~100 µs → ~1.2–1.4 ms.

According to the stream timeline :

HSDP introduces additional NCCL all_gather streams.

However, these comm streams are not effectively overlapped with compute, leading to visible pipeline bubbles.

As a result, Maybe the free bubbles are mainly due to lack of comm/compute overlap, rather than compute inefficiency itself.

PS: 2 snapshots showing timeline with one block:

without HSDP

* with HSDP

do you mean HSDP introduces negative latency impact for LTX-2?

hsliuustc0106 · 2026-04-19T13:35:26Z

how about the peak vram consumption?

fywc · 2026-04-19T15:40:12Z

how about the peak vram consumption?

Peak vram remains the same with HSDP enabled, no noticeable difference observed.

fywc · 2026-04-19T15:45:48Z

can you profile with HSDP and check whether there are some free bubble in the first a few blocks with FSDP on? We notice some similar issues with WAN 2.2

I’ll profile LTX-2 with HSDP enabled for this problem as soon as possible, is there any related issues already?

Hi，I did a profiling analysis on LTX-2 to investigate this problem, not sure if this is helpful or matches what you have observed with WAN2.2 ?
Test environment:

2 × NVIDIA H200 GPU

HSDP shard size = 2

Test Flow：

Initialize the LTX-2 model with & without HSDP.

Send one request to warmup the model.

Profile the next one request and analysis data.

Key findings:
Enabling HSDP introduces a noticeable latency increase at the block level:

Per-block latency increases from ~10 ms → ~27 ms (~2.5–3x).

The overhead is consistent across all transformer blocks, not a one-off effect.

Breaking down the block:

Both Attention and FFN stages are slower (~2–3x).

The main overhead comes from RowParallelLinear / ColumnParallel ops, each increasing from ~100 µs → ~1.2–1.4 ms.

According to the stream timeline :

HSDP introduces additional NCCL all_gather streams.

However, these comm streams are not effectively overlapped with compute, leading to visible pipeline bubbles.

As a result, Maybe the free bubbles are mainly due to lack of comm/compute overlap, rather than compute inefficiency itself.
PS: 2 snapshots showing timeline with one block:

without HSDP

* with HSDP

do you mean HSDP introduces negative latency impact for LTX-2?

I think so. Possibly similar behavior may appear on other models as well, I can follow up with additional tests and analysis.

Signed-off-by: fywc <hanzheli@kuaishou.com>

fywc · 2026-04-20T09:51:23Z

cc @hsliuustc0106 @gcanlin

hsliuustc0106 · 2026-04-20T10:50:51Z

thanks, I hope you can profile higher resolution inputs which HSDP may benefit more

Signed-off-by: hanzheli <hanzheli@kuaishou.com> Signed-off-by: fywc <hanzheli@kuaishou.com> Signed-off-by: nainiu258 <cperfect02@163.com>

Signed-off-by: hanzheli <hanzheli@kuaishou.com> Signed-off-by: fywc <hanzheli@kuaishou.com>

fywc and others added 2 commits April 18, 2026 07:57

[Feat] support HSDP for LTX-2

3f2999c

Signed-off-by: hanzheli <hanzheli@kuaishou.com>

Merge branch 'vllm-project:main' into support-ltx2-hsdp

59688d7

fywc requested a review from hsliuustc0106 as a code owner April 18, 2026 08:00

gcanlin reviewed Apr 18, 2026

View reviewed changes

Comment thread tests/diffusion/models/ltx2/test_ltx2_hsdp.py

gcanlin approved these changes Apr 18, 2026

View reviewed changes

Add pytest markers for LTX2 model tests

a134eeb

Signed-off-by: hanzheli <hanzheli@kuaishou.com>

fywc and others added 2 commits April 18, 2026 19:19

Merge branch 'main' into support-ltx2-hsdp

d95ff5e

fix: apply ruff format

ac8c06c

Signed-off-by: hanzheli <hanzheli@kuaishou.com>

gcanlin added the ready label to trigger buildkite CI label Apr 19, 2026

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into support-ltx2-hsdp

3f2b2af

fywc added 2 commits April 20, 2026 09:14

Merge branch 'main' into support-ltx2-hsdp

1e4660a

Signed-off-by: fywc <hanzheli@kuaishou.com>

Merge branch 'main' into support-ltx2-hsdp

d31ac98

hsliuustc0106 merged commit d613864 into vllm-project:main Apr 20, 2026
8 checks passed

qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026

[Model] Add HSDP support for LTX-2 (vllm-project#2899)

ad424e4

Signed-off-by: hanzheli <hanzheli@kuaishou.com> Signed-off-by: fywc <hanzheli@kuaishou.com>

Songrui625 mentioned this pull request Apr 23, 2026

[Bug]: Failed to run LTX-2 two-stage pipelines when HSDP is enabled #3062

Open

1 task

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Model] Add HSDP support for LTX-2 (vllm-project#2899)

51a2b6a

Signed-off-by: hanzheli <hanzheli@kuaishou.com> Signed-off-by: fywc <hanzheli@kuaishou.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Model] Add HSDP support for LTX-2 (vllm-project#2899)

2fb08ff

Signed-off-by: hanzheli <hanzheli@kuaishou.com> Signed-off-by: fywc <hanzheli@kuaishou.com>

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request May 28, 2026

[Model] Add HSDP support for LTX-2 (vllm-project#2899)

8683000

Signed-off-by: hanzheli <hanzheli@kuaishou.com> Signed-off-by: fywc <hanzheli@kuaishou.com>

quyifei23 pushed a commit to quyifei23/vllm-omni that referenced this pull request Jun 6, 2026

[Model] Add HSDP support for LTX-2 (vllm-project#2899)

c736576

Signed-off-by: hanzheli <hanzheli@kuaishou.com> Signed-off-by: fywc <hanzheli@kuaishou.com>

fywc mentioned this pull request Jun 10, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

Conversation

fywc commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 18, 2026

Uh oh!

hsliuustc0106 commented Apr 18, 2026

Uh oh!

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 18, 2026

Uh oh!

fywc commented Apr 18, 2026

Uh oh!

fywc commented Apr 18, 2026

Uh oh!

hsliuustc0106 commented Apr 19, 2026

Uh oh!

hsliuustc0106 commented Apr 19, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

fywc commented Apr 19, 2026

Uh oh!

fywc commented Apr 19, 2026

Uh oh!

fywc commented Apr 20, 2026

Uh oh!

hsliuustc0106 commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fywc commented Apr 18, 2026 •

edited

Loading