Skip to content

[BugFix] Fix whisper FA2 + full cudagraphs#33360

Merged
DarkLight1337 merged 4 commits intovllm-project:mainfrom
neuralmagic:lwikinson/fix-whisper-fa2
Jan 31, 2026
Merged

[BugFix] Fix whisper FA2 + full cudagraphs#33360
DarkLight1337 merged 4 commits intovllm-project:mainfrom
neuralmagic:lwikinson/fix-whisper-fa2

Conversation

@LucasWilkinson
Copy link
Copy Markdown
Collaborator

@LucasWilkinson LucasWilkinson commented Jan 29, 2026

Fix: #33091

CrossAttentionBuilder.build() overrides max_seq_len with encoder_seq_lens (new_metadata.max_seq_len = max(encoder_seq_lens_cpu)) this leads to a CG capture with max_seq_len == 0 and an incorrect graph

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@LucasWilkinson LucasWilkinson changed the title [BugFix] Fix whisper fa2. + full cudagraphs [BugFix] Fix whisper FA2 + full cudagraphs Jan 29, 2026
@mergify mergify bot added nvidia v1 bug Something isn't working labels Jan 29, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue with CUDA graph capture for encoder-decoder models using FlashAttention 2 by ensuring a non-zero max_seq_len is used during graph creation. The approach of setting a realistic encoder length during capture is correct. I've identified one area for improvement in the fallback logic for determining this length to make the fix more robust.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! If CI is green this should be good as we have FA2 machines :)

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Jan 29, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 29, 2026

Hi @LucasWilkinson, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@mgoin mgoin added this to the v0.15.1 Hotfix milestone Jan 29, 2026
Copy link
Copy Markdown
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, thanks!

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 29, 2026
@DarkLight1337 DarkLight1337 merged commit 0a3c71e into vllm-project:main Jan 31, 2026
47 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Jan 31, 2026
khluu pushed a commit that referenced this pull request Feb 2, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
(cherry picked from commit 0a3c71e)
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Pai <416932041@qq.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: Whisper accuracy issue with FA2+CG

4 participants