Skip to content

[Doc] Add diffusion attention backend docs#3011

Merged
hsliuustc0106 merged 1 commit into
vllm-project:mainfrom
david6666666:codex/diffusion-attention-backends-docs
Apr 23, 2026
Merged

[Doc] Add diffusion attention backend docs#3011
hsliuustc0106 merged 1 commit into
vllm-project:mainfrom
david6666666:codex/diffusion-attention-backends-docs

Conversation

@david6666666
Copy link
Copy Markdown
Collaborator

Summary

This PR adds user-facing documentation for diffusion attention backend selection in vLLM-Omni.

What Changed

  • add docs/user_guide/diffusion/attention_backends.md
  • document DIFFUSION_ATTENTION_BACKEND selection and backend options
  • document SageAttention source installation and usage examples
  • add a link from the text-to-video offline inference guide to the new backend guide
  • add the new guide to the MkDocs navigation

Why

Diffusion users need one place to understand how to switch attention backends, how to install SageAttention, and what to validate when comparing SAGE_ATTN against the default FlashAttention path.

Validation

  • pre-commit run --all-files
  • pytest -q tests/diffusion/attention/test_flash_attn.py
  • local offline validation on H20 GPUs for:
    • HunyuanVideo-1.5
    • Wan2.2-TI2V-5B-Diffusers

Signed-off-by: david6666666 <530634352@qq.com>
Copy link
Copy Markdown
Collaborator Author

david6666666 commented Apr 22, 2026

Validation

I ran local offline validation on H20 GPUs after rebuilding SageAttention from upstream main.

HunyuanVideo-1.5

Config:

  • model: hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v
  • 33 frames, 480x832, 8 steps, TP=1, same prompt / seed

Results:

  • FLASH_ATTN: forward ~= 28.36s, total ~= 28.84s
  • SAGE_ATTN: forward ~= 24.86s, total ~= 25.33s
  • output diff vs FLASH_ATTN: PSNR ~= 16.96 dB, MAE ~= 22.96

FLASH_ATTN

hv15_retest_fa3_steps8.mp4

SAGE_ATTN

hv15_retest_sage_steps8.mp4

Wan2.2 TI2V 5B

Config:

  • model: Wan-AI/Wan2.2-TI2V-5B-Diffusers
  • 49 frames, 704x1280, 30 steps, TP=2, same prompt / seed

Results:

  • FLASH_ATTN: diffuse ~= 36.20s, forward ~= 44.30s, total ~= 45.06s
  • SAGE_ATTN: diffuse ~= 32.89s, forward ~= 41.09s, total ~= 41.83s
  • output diff vs FLASH_ATTN: PSNR ~= 27.96 dB, MAE ~= 3.51

FLASH_ATTN

wan22_fa3.mp4

SAGE_ATTN

wan22_sage.mp4

Notes

  • pre-commit run --all-files passed locally before publishing the docs changes.
  • For Wan2.2, SageAttention provided a real speedup with a relatively small output difference.
  • For HunyuanVideo, SageAttention was faster but still showed a noticeably larger output drift relative to the FlashAttention baseline.

@david6666666 david6666666 force-pushed the codex/diffusion-attention-backends-docs branch from 6e448a3 to 30e8977 Compare April 22, 2026 04:24
@david6666666 david6666666 linked an issue Apr 22, 2026 that may be closed by this pull request
1 task
@david6666666 david6666666 force-pushed the codex/diffusion-attention-backends-docs branch from 30e8977 to 739787d Compare April 22, 2026 04:27
@david6666666 david6666666 marked this pull request as ready for review April 22, 2026 04:30
@david6666666
Copy link
Copy Markdown
Collaborator Author

@ZJY0516 @lishunyang12 ptal thx

@david6666666 david6666666 changed the title [codex] Add diffusion attention backend docs [Doc] Add diffusion attention backend docs Apr 22, 2026
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good documentation. Covers all diffusion attention backends with clear installation and usage examples.

@david6666666 david6666666 added the ready label to trigger buildkite CI label Apr 23, 2026
@hsliuustc0106 hsliuustc0106 merged commit d4cbdff into vllm-project:main Apr 23, 2026
6 checks passed
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
Signed-off-by: david6666666 <530634352@qq.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: david6666666 <530634352@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Use SageAttention backend Wan2.2 and Hunyuan-Video Quality Crash

3 participants