[Doc] Add diffusion attention backend docs by david6666666 · Pull Request #3011 · vllm-project/vllm-omni

david6666666 · 2026-04-22T04:21:07Z

Summary

This PR adds user-facing documentation for diffusion attention backend selection in vLLM-Omni.

What Changed

add docs/user_guide/diffusion/attention_backends.md
document DIFFUSION_ATTENTION_BACKEND selection and backend options
document SageAttention source installation and usage examples
add a link from the text-to-video offline inference guide to the new backend guide
add the new guide to the MkDocs navigation

Why

Diffusion users need one place to understand how to switch attention backends, how to install SageAttention, and what to validate when comparing SAGE_ATTN against the default FlashAttention path.

Validation

pre-commit run --all-files
pytest -q tests/diffusion/attention/test_flash_attn.py
local offline validation on H20 GPUs for:
- HunyuanVideo-1.5
- Wan2.2-TI2V-5B-Diffusers

Signed-off-by: david6666666 <530634352@qq.com>

david6666666 · 2026-04-22T04:23:01Z

Validation

I ran local offline validation on H20 GPUs after rebuilding SageAttention from upstream main.

HunyuanVideo-1.5

Config:

model: hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v
33 frames, 480x832, 8 steps, TP=1, same prompt / seed

Results:

FLASH_ATTN: forward ~= 28.36s, total ~= 28.84s
SAGE_ATTN: forward ~= 24.86s, total ~= 25.33s
output diff vs FLASH_ATTN: PSNR ~= 16.96 dB, MAE ~= 22.96

FLASH_ATTN

hv15_retest_fa3_steps8.mp4

SAGE_ATTN

hv15_retest_sage_steps8.mp4

Wan2.2 TI2V 5B

Config:

model: Wan-AI/Wan2.2-TI2V-5B-Diffusers
49 frames, 704x1280, 30 steps, TP=2, same prompt / seed

Results:

FLASH_ATTN: diffuse ~= 36.20s, forward ~= 44.30s, total ~= 45.06s
SAGE_ATTN: diffuse ~= 32.89s, forward ~= 41.09s, total ~= 41.83s
output diff vs FLASH_ATTN: PSNR ~= 27.96 dB, MAE ~= 3.51

FLASH_ATTN

wan22_fa3.mp4

SAGE_ATTN

wan22_sage.mp4

Notes

pre-commit run --all-files passed locally before publishing the docs changes.
For Wan2.2, SageAttention provided a real speedup with a relatively small output difference.
For HunyuanVideo, SageAttention was faster but still showed a noticeably larger output drift relative to the FlashAttention baseline.

david6666666 · 2026-04-22T04:31:50Z

@ZJY0516 @lishunyang12 ptal thx

hsliuustc0106

Good documentation. Covers all diffusion attention backends with clear installation and usage examples.

Signed-off-by: david6666666 <530634352@qq.com>

Add diffusion attention backend docs

739787d

Signed-off-by: david6666666 <530634352@qq.com>

david6666666 force-pushed the codex/diffusion-attention-backends-docs branch from 6e448a3 to 30e8977 Compare April 22, 2026 04:24

david6666666 mentioned this pull request Apr 22, 2026

[Bug]: Use SageAttention backend Wan2.2 and Hunyuan-Video Quality Crash #2990

Closed

1 task

david6666666 linked an issue Apr 22, 2026 that may be closed by this pull request

[Bug]: Use SageAttention backend Wan2.2 and Hunyuan-Video Quality Crash #2990

Closed

1 task

david6666666 force-pushed the codex/diffusion-attention-backends-docs branch from 30e8977 to 739787d Compare April 22, 2026 04:27

david6666666 marked this pull request as ready for review April 22, 2026 04:30

david6666666 changed the title ~~[codex] Add diffusion attention backend docs~~ [Doc] Add diffusion attention backend docs Apr 22, 2026

hsliuustc0106 reviewed Apr 22, 2026

View reviewed changes

lishunyang12 approved these changes Apr 22, 2026

View reviewed changes

david6666666 added the ready label to trigger buildkite CI label Apr 23, 2026

hsliuustc0106 merged commit d4cbdff into vllm-project:main Apr 23, 2026
6 checks passed

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Doc] Add diffusion attention backend docs (vllm-project#3011)

8c96235

Signed-off-by: david6666666 <530634352@qq.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Doc] Add diffusion attention backend docs (vllm-project#3011)

072f68e

Signed-off-by: david6666666 <530634352@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Add diffusion attention backend docs#3011

[Doc] Add diffusion attention backend docs#3011
hsliuustc0106 merged 1 commit into
vllm-project:mainfrom
david6666666:codex/diffusion-attention-backends-docs

david6666666 commented Apr 22, 2026

Uh oh!

david6666666 commented Apr 22, 2026 •

edited

Loading

Uh oh!

david6666666 commented Apr 22, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

david6666666 commented Apr 22, 2026

Summary

What Changed

Why

Validation

Uh oh!

david6666666 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david6666666 commented Apr 22, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

david6666666 commented Apr 22, 2026 •

edited

Loading