[diffusion] chore: align LTX-2 with official by mickqian · Pull Request #24313 · sgl-project/sglang

mickqian · 2026-05-03T14:55:16Z

Motivation

Align native LTX text-encoder attention behavior with the official implementation while preserving high-performance attention backends outside the text encoder path. Keep CI consistency gates honest by using official GT only for cases whose request semantics are currently comparable.

Modifications

Merge latest origin/main, including the merged component attention backend override support from [diffusion] cli: support component attention backend overrides #24320.
Align Gemma3 masked SDPA semantics with official behavior:
- use bool keep-mask instead of additive bf16 mask;
- repeat K/V explicitly for GQA instead of enable_gqa=True.
Scope official-compatible SDPA policy to the text_encoder component for all LTX2 native configs, including LTX-2.0 and LTX-2.3, via component_attention_backends instead of forcing global attention_backend=torch_sdpa.
Set LTX-2.3 sampling seed to 42 to match refreshed official GT generation.
Pin refreshed ci-data revision 6e7b99e16b857c98285277fe3b4ffef30559bde9.
Prefer official_generated GT for currently comparable LTX official-aligned cases:
- ltx_2.3_one_stage_ti2v
- ltx_2.3_two_stage_t2v_2gpus
Keep currently non-comparable LTX CI scenarios on sglang_generated instead of hiding gaps behind very loose official thresholds:
- ltx_2_two_stage_t2v
- ltx_2_3_hq_pipeline
- ltx_2_3_two_stage_ti2v_2gpus
Relax only the LTX-2.0 ltx_2_two_stage_t2v SSIM threshold to 0.89; keep its CLIP/PSNR/MAD video defaults.
Keep the main DiT/connector attention path free to use the global/auto backend, e.g. FA.
Add a GET fallback for remote consistency GT existence checks to avoid CI false negatives when raw.githubusercontent.com HEAD is flaky.

Notes

LTX-2.0 official output is not comparable with the CI native case today; offline comparison against official_generated at the pinned ci-data revision is only clip=0.6489, ssim=0.1460, psnr=5.8351, mean_abs_diff=112.2637. It uses refreshed native sglang_generated GT plus a modest SSIM relaxation instead of hiding the gap behind extremely loose official thresholds.
H100 probe showed ltx_2_3_two_stage_ti2v_2gpus still has mid/last PSNR around 11.1 even with transformer=torch_sdpa, so DIT attention backend is not the root cause.
H100 probe showed HQ does not become official-comparable by only forcing num_frames=24; actual output extraction still differs from the official GT contract.

gemini-code-assist

Code Review

This pull request focuses on precision alignment for the LTX2.3 model, specifically addressing consistency issues in the LoRA warmup and HQ pipelines. Key improvements include ensuring bias and reduction ordering in RowParallelLinear layers matches the base implementation when LoRA is disabled, and adopting NumPy-based double-precision RoPE frequency generation to align with official implementations. The res2s SDE logic was also refined to maintain scheduler dtypes during main-step updates. Review feedback identifies critical risks associated with in-place tensor modifications that could corrupt global sigma schedules or cause side effects for callers, and suggests optimizing Grouped Query Attention (GQA) by utilizing native SDPA support instead of manual tensor expansion.

I am having trouble creating individual review comments. Click here to see my feedback.

python/sglang/multimodal_gen/runtime/pipelines_core/stages/ltx_2_denoising.py (378-379)

In-place modification of sigma_down can lead to critical bugs. Specifically, if sigma_down was None initially, it is assigned a reference to sigma_next at line 372. Modifying it in-place at line 379 will then modify sigma_next. If sigma_next is a view into the scheduler's sigmas tensor, this will corrupt the global sigma schedule. Using torch.where provides a safe, out-of-place alternative.

        sigma_down = torch.where(torch.isnan(sigma_down), sigma_next.to(sigma_down.dtype), sigma_down)

python/sglang/multimodal_gen/runtime/models/encoders/gemma_3.py (319-338)

Manual expansion of key and value tensors for Grouped Query Attention (GQA) is inefficient and increases peak memory usage. torch.nn.functional.scaled_dot_product_attention natively supports GQA when the head counts are compatible (i.e., query heads are a multiple of KV heads). Unless this manual expansion is strictly required for bit-exact alignment with the official implementation, it should be removed to leverage SDPA's optimized kernels.

        attn_output = torch.nn.functional.scaled_dot_product_attention(
            query,
            key,
            value,
            attn_mask=attn_mask,
            dropout_p=0.0,
            is_causal=False,
            scale=self.scaling,
        )

python/sglang/multimodal_gen/runtime/pipelines_core/stages/ltx_2_denoising.py (364)

In-place modification of the sigma_up argument using clamp_ is a dangerous side effect. If the caller intends to reuse the tensor passed as sigma_up, its values will be unexpectedly modified. It is safer to use an out-of-place operation like torch.minimum.

            sigma_up = torch.minimum(sigma_up, sigma_next * 0.9999)

gemini-code-assist · 2026-05-05T03:54:19Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

mickqian · 2026-05-05T03:56:51Z

/tag-and-rerun-ci

This reverts commit c71bd1d.

This reverts commit 4afc36f.

This reverts commit 2f08328.

This reverts commit d66b4f8.

This reverts commit 0834271.

This reverts commit 37f6686.

This reverts commit 19e4647.

This reverts commit c5f2381.

…erge-align

* main: (894 commits) [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715) [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268) propagate pytest exit code from test __main__ entries (sgl-project#24487) [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550) Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981) Support Triton MLA FP8 KV cache (sgl-project#20479) [diffusion] chore: align LTX-2 with official (sgl-project#24313) Expand support matrix for pypi wheel release (sgl-project#24565) [codex] Optimize Z-Image packed QKV (sgl-project#24117) [Misc] Fix breaking weight checker test (sgl-project#24553) [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420) ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551) [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279) Improve metrics, observability, and PD deploy tooling (sgl-project#24521) Fix diffusion fallback guards and validation (sgl-project#23335) [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539) [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040) Support getting checksums in weight checker (sgl-project#24537) Refactor buffer patterns in weight checker (sgl-project#24538) Add unit and end-to-end tests for weight checker (sgl-project#24536) ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/model_executor/model_runner.py

github-actions Bot added documentation Improvements or additions to documentation lora diffusion SGLang Diffusion labels May 3, 2026

gemini-code-assist Bot reviewed May 3, 2026

View reviewed changes

mickqian marked this pull request as ready for review May 5, 2026 03:54

mickqian requested review from BBuf, ping1jing2, yhyang201 and yingluosanqian as code owners May 5, 2026 03:54

github-actions Bot added the run-ci label May 5, 2026

mickqian added 14 commits May 6, 2026 00:32

Align LTX 2.3 HQ single-GPU precision

03cbabe

Align LTX2 text connector RoPE dtype

02be7f4

Revert "Align LTX2 text connector RoPE dtype"

991618a

This reverts commit c71bd1d.

Use official SDPA flags for LTX 2.3 HQ DiT

cbd0ce4

Revert "Use official SDPA flags for LTX 2.3 HQ DiT"

d2a25bb

This reverts commit 4afc36f.

Match official LTX2 connector additive mask shape

39828e1

Revert "Match official LTX2 connector additive mask shape"

69a95e0

This reverts commit 2f08328.

Probe batched LTX2 HQ stage1 guidance

9895857

Revert "Probe batched LTX2 HQ stage1 guidance"

6d62892

This reverts commit d66b4f8.

Use live LTX2.3 generator for stage2 renoise

a624506

Revert "Use live LTX2.3 generator for stage2 renoise"

9194b2a

This reverts commit 0834271.

Match official LTX2.3 connector feature rescale

63717d7

Revert "Match official LTX2.3 connector feature rescale"

5a84b4a

This reverts commit 37f6686.

Align LTX2.3 VAE decode noise

7c6de37

mickqian added 6 commits May 6, 2026 00:38

Drop LTX2 server args unit test

e6898dd

Scope LTX2 HQ res2s alignment

d87ca3c

Use TP for multi-GPU LTX2 CI cases

19e4647

Revert "Use TP for multi-GPU LTX2 CI cases"

5bbe1c5

This reverts commit 19e4647.

Scope LTX2 text encoder SDPA to LTX2.3

7e6c349

Extend official LTX GT generation coverage

c60c894

mickqian requested review from Fridge003, Kangyan-Zhou, bingxche, ispobock and merrymercy as code owners May 6, 2026 03:22

mickqian added 15 commits May 6, 2026 11:32

Fix official LTX GT ci-data checkout

28e3570

Run official LTX GT generation on H100

e44dab6

Pin refreshed official LTX GT revision

b3147a1

Align LTX default seed with official GT

efea8be

Align LTX2.3 GT seed handling

8f7464e

Relax LTX2.3 HQ PSNR threshold

c5f2381

Revert "Relax LTX2.3 HQ PSNR threshold"

78328ef

This reverts commit c5f2381.

Keep LTX consistency thresholds strict

7a0639c

Keep LTX2.0 text encoder backend unchanged

4eead31

Apply LTX text encoder SDPA consistently

bcf0a63

Refine LTX official GT selection

1536272

Merge remote-tracking branch 'origin/main' into ltx23-official-lora-m…

5dcfd09

…erge-align

Refresh LTX native consistency GT

adb72bc

Pin refreshed LTX consistency GT

fbc75f0

Clarify LTX official GT selection

1172f9a

mickqian merged commit 2e642ea into sgl-project:main May 7, 2026
71 of 79 checks passed

LLThomas pushed a commit to LLThomas/sglang that referenced this pull request May 8, 2026

[diffusion] chore: align LTX-2 with official (sgl-project#24313)

434eac7

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

[diffusion] chore: align LTX-2 with official (sgl-project#24313)

22434a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[diffusion] chore: align LTX-2 with official#24313

[diffusion] chore: align LTX-2 with official#24313
mickqian merged 74 commits into
sgl-project:mainfrom
mickqian:ltx23-official-lora-merge-align

mickqian commented May 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot commented May 5, 2026

Uh oh!

mickqian commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mickqian commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

python/sglang/multimodal_gen/runtime/pipelines_core/stages/ltx_2_denoising.py (378-379)

python/sglang/multimodal_gen/runtime/models/encoders/gemma_3.py (319-338)

python/sglang/multimodal_gen/runtime/pipelines_core/stages/ltx_2_denoising.py (364)

Uh oh!

gemini-code-assist Bot commented May 5, 2026

Uh oh!

mickqian commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mickqian commented May 3, 2026 •

edited

Loading