Add HunyuanVideo ModelOpt FP8 diffusion support by BBuf · Pull Request #23199 · sgl-project/sglang

BBuf · 2026-04-20T02:56:17Z

Summary

Add HunyuanVideo ModelOpt FP8 diffusion support and publish the SGLang-native transformer override under the lmsys Hugging Face org.

add HunyuanVideo ModelOpt FP8 runtime support
document the HunyuanVideo ModelOpt FP8 checkpoint flow in the diffusion quantization docs
update the diffusion ModelOpt quant skill with the HunyuanVideo FP8 path
add the HunyuanVideo ModelOpt FP8 case to the B200 diffusion CI set

Published FP8 weights

HunyuanVideo: https://huggingface.co/lmsys/hunyuanvideo-modelopt-fp8-sglang-transformer

The repo is intentionally clean: README.md, config.json, and .safetensors shards only.

H100 Validation

Updated H100 validation used the sglang-diffusion-benchmark-profile HunyuanVideo command shape. This supersedes the earlier short-video validation.

Run setup:

Host/GPU: H100 rank0, CUDA_VISIBLE_DEVICES=0
Backend: --backend=sglang; logs show Using pipeline from model_index.json: HunyuanVideoPipeline, no diffusers fallback markers observed
Model: hunyuanvideo-community/HunyuanVideo
Prompt: A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window.
Skill preset args preserved: --text-encoder-cpu-offload --pin-cpu-memory --num-frames=65 --width=848 --height=480 --num-inference-steps=30 --save-output --warmup --enable-torch-compile --seed=42
5s adjustment: added --fps=13, so the output is exactly 65 frames / 13 fps = 5.000s
FP8 delta: same command plus --transformer-path lmsys/hunyuanvideo-modelopt-fp8-sglang-transformer
Profiler delta: same generation settings, replacing save/perf output with --profile --num-profiled-timesteps=5 --no-save-output and setting SGLANG_DIFFUSION_TORCH_PROFILER_DIR

Benchmark, warmup excluded:

Metric	BF16	FP8	Delta	Speedup
E2E latency	59.546 s	54.748 s	-4.798 s (-8.1%)	1.09x
Denoising stage	42.542 s	37.980 s	-4.562 s (-10.7%)	1.12x
Avg denoise step	1.4180 s	1.2659 s	-0.1521 s	1.12x
Decoding stage	16.692 s	16.458 s	-0.233 s (-1.4%)	1.01x
Text encoding	0.308 s	0.306 s	-0.002 s (-0.7%)	1.01x

Profiler kernel share, 5 profiled denoise timesteps. Profiler timings include profiling overhead and are not used as benchmark latency numbers.

Precision	Total CUDA op time	Top CUDA/kernel shares
BF16	17.055 s	`cudaMemcpyAsync` 41.54%; FlashAttention 31.99%; BF16 GEMM `nvjet_tst_192x208_64x4_2x1_v_bz_coopB_bias_TNT` 9.77%; BF16 GEMM `nvjet_tst_192x208_64x4_1x2_h_bz_coopB_bias_TNT` 8.16%; BF16 GEMM `nvjet_tst_256x152_64x4_1x2_h_bz_coopA_bias_TNT` 2.11%
FP8	15.324 s	`cudaMemcpyAsync` 40.62%; FlashAttention 36.80%; FP8 Cutlass GEMM 12.83%; `triton_poi_fused_cat_gelu_view_0` 1.93%; `_static_quant_fp8` 1.37%

Local Validation After B200 CI Update

git diff --check -> passed
python3 -m py_compile python/sglang/multimodal_gen/test/server/testcase_configs.py python/sglang/multimodal_gen/test/server/gpu_cases.py -> passed
python3 -m black --check python/sglang/multimodal_gen/test/server/testcase_configs.py python/sglang/multimodal_gen/test/server/gpu_cases.py -> passed
python3 -m ruff check --select=F401,F821 python/sglang/multimodal_gen/test/server/testcase_configs.py python/sglang/multimodal_gen/test/server/gpu_cases.py -> passed

B200 CI

Added to ONE_GPU_MODELOPT_CASES for multimodal-gen-test-1-b200:

hunyuanvideo_modelopt_fp8_t2v

One caveat from the FP8 log: the CLI keeps the same offload flags as the skill preset, but the ModelOpt FP8 runtime currently forces dit_cpu_offload off while preserving layerwise offload behavior for restored FP8 tensor strides.

gemini-code-assist · 2026-04-20T02:56:21Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

BBuf · 2026-04-25T09:25:08Z

/tag-and-rerun-ci

BBuf · 2026-04-25T09:39:28Z

/tag-and-rerun-ci

mickqian · 2026-04-27T01:45:13Z


 import diffusers
 import numpy as np
 import torch


duplicated contents?

This reverts commit e586a78.

BBuf · 2026-04-28T08:20:19Z

Updated this PR to use the new clean lmsys ModelOpt diffusion repo.

HunyuanVideo: https://huggingface.co/lmsys/hunyuanvideo-modelopt-fp8-sglang-transformer
Added hunyuanvideo_modelopt_fp8_t2v to ONE_GPU_MODELOPT_CASES for multimodal-gen-test-1-b200.
The shared lmsys collection is https://huggingface.co/collections/lmsys/diffusion-modelopt-69f06a1740c02269e36bf285

…modelopt-fp8

BBuf · 2026-05-02T02:17:24Z

/tag-and-rerun-ci

# Conflicts: # docs/diffusion/quantization.md # docs_new/docs/sglang-diffusion/quantization.mdx # python/sglang/multimodal_gen/test/server/testcase_configs.py

BBuf · 2026-05-03T08:58:04Z

/tag-and-rerun-ci

# Conflicts: # docs/diffusion/quantization.md # docs_new/docs/sglang-diffusion/quantization.mdx # python/sglang/multimodal_gen/.claude/skills/sglang-diffusion-modelopt-quant/SKILL.md # python/sglang/multimodal_gen/test/server/gpu_cases.py # python/sglang/multimodal_gen/test/server/testcase_configs.py # python/sglang/multimodal_gen/tools/build_modelopt_fp8_transformer.py

BBuf · 2026-05-03T16:55:35Z

/tag-and-rerun-ci

BBuf · 2026-05-04T01:12:17Z

/tag-and-rerun-ci

BBuf · 2026-05-05T11:27:25Z

https://github.com/sgl-project/sglang/actions/runs/25367205803/job/74399478580?pr=23199

BBuf requested review from mickqian, ping1jing2, yhyang201 and yingluosanqian as code owners April 20, 2026 02:56

github-actions Bot added documentation Improvements or additions to documentation quant LLM Quantization diffusion SGLang Diffusion labels Apr 20, 2026

BBuf added 2 commits April 20, 2026 10:56

Add Qwen Image ModelOpt FP8 diffusion support

e586a78

Add HunyuanVideo ModelOpt FP8 diffusion support

8f424f4

BBuf force-pushed the codex/hunyuanvideo-modelopt-fp8 branch from 79095de to 8f424f4 Compare April 20, 2026 02:56

Merge remote-tracking branch 'upstream/main' into update-pr-23199

746359c

BBuf requested a review from wisclmy0611 as a code owner April 25, 2026 09:24

github-actions Bot added the run-ci label Apr 25, 2026

Merge remote-tracking branch 'origin/main' into HEAD

44a2c91

mickqian approved these changes Apr 27, 2026

View reviewed changes

BBuf added 3 commits April 27, 2026 09:57

Format HunyuanVideo FP8 mod chunk

050bbf0

Revert "Add Qwen Image ModelOpt FP8 diffusion support"

27b84fc

This reverts commit e586a78.

Use lmsys HunyuanVideo ModelOpt checkpoint

ae41d37

BBuf requested a review from JustinTong0323 as a code owner April 28, 2026 08:15

BBuf mentioned this pull request Apr 29, 2026

SGLang AI Agent Performance Optimization PRs (2026-01-29 to 2026-04-29) BBuf/AI-Infra-Auto-Driven-SKILLS#46

Open

BBuf added 2 commits April 30, 2026 10:33

Merge remote-tracking branch 'upstream/main' into codex/hunyuanvideo-…

6fb6680

…modelopt-fp8

Merge remote-tracking branch 'upstream/main' into codex/hunyuanvideo-…

3b8ae31

…modelopt-fp8

BBuf added 2 commits May 2, 2026 21:12

Merge remote-tracking branch 'origin/main' into HEAD

a985cf5

# Conflicts: # docs/diffusion/quantization.md # docs_new/docs/sglang-diffusion/quantization.mdx # python/sglang/multimodal_gen/test/server/testcase_configs.py

Merge remote-tracking branch 'origin/main' into update-pr-23199

ef9f412

Merge remote-tracking branch 'origin/main' into update-pr-23199

421d467

BBuf and others added 2 commits May 5, 2026 16:43

Merge remote-tracking branch 'origin/main' into update-pr-23199

a061e39

Merge branch 'main' into codex/hunyuanvideo-modelopt-fp8

c6208d6

BBuf merged commit 8c703f2 into sgl-project:main May 5, 2026
71 of 78 checks passed

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

Add HunyuanVideo ModelOpt FP8 diffusion support (sgl-project#23199)

bc52281

zijiexia mentioned this pull request May 25, 2026

fix(ci): enforce legacy docs/ gate in Lint workflow #26322

Merged

5 tasks

BBuf deleted the codex/hunyuanvideo-modelopt-fp8 branch June 2, 2026 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HunyuanVideo ModelOpt FP8 diffusion support#23199

Add HunyuanVideo ModelOpt FP8 diffusion support#23199
BBuf merged 15 commits into
sgl-project:mainfrom
BBuf:codex/hunyuanvideo-modelopt-fp8

BBuf commented Apr 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 20, 2026

Uh oh!

BBuf commented Apr 25, 2026

Uh oh!

BBuf commented Apr 25, 2026

Uh oh!

mickqian Apr 27, 2026

Uh oh!

BBuf commented Apr 28, 2026

Uh oh!

BBuf commented May 2, 2026

Uh oh!

BBuf commented May 3, 2026

Uh oh!

BBuf commented May 3, 2026

Uh oh!

BBuf commented May 4, 2026

Uh oh!

BBuf commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BBuf commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Published FP8 weights

H100 Validation

Local Validation After B200 CI Update

B200 CI

Uh oh!

gemini-code-assist Bot commented Apr 20, 2026

Uh oh!

BBuf commented Apr 25, 2026

Uh oh!

BBuf commented Apr 25, 2026

Uh oh!

mickqian Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

BBuf commented Apr 28, 2026

Uh oh!

BBuf commented May 2, 2026

Uh oh!

BBuf commented May 3, 2026

Uh oh!

BBuf commented May 3, 2026

Uh oh!

BBuf commented May 4, 2026

Uh oh!

BBuf commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BBuf commented Apr 20, 2026 •

edited

Loading