[Bugfix] Disable CG for Whisper+FA2 by NickLucche · Pull Request #33164 · vllm-project/vllm

NickLucche · 2026-01-27T11:32:57Z

Temporary fix to address #33091 for a tentative v0.15.0 release. As the issue is purely in accuracy and does not produce a crash, I believe it's better to patch it until fixed as to not mislead users deploying Whisper.

This PR disables CG for encoder-decoder models when FA2 is detected.

Test with

python -m pytest -v -s -x tests/entrypoints/openai/test_transcription_validation_whisper.py

# On FA2, with following patch/diff
--- a/tests/entrypoints/openai/test_transcription_validation_whisper.py
+++ b/tests/entrypoints/openai/test_transcription_validation_whisper.py
@@ -16,7 +16,7 @@ import soundfile as sf
 from ...utils import RemoteOpenAIServer

 MODEL_NAME = "openai/whisper-large-v3-turbo"
-SERVER_ARGS = ["--enforce-eager"]
+SERVER_ARGS = ["--attention-config.flash_attn_version=2"]

or with manual script from issue (failing on master):

from vllm.config import AttentionConfig
from vllm import LLM, SamplingParams
from vllm.config.compilation import CompilationConfig, CompilationMode, CUDAGraphMode
from vllm.assets.audio import AudioAsset

def main():
    model_name = "openai/whisper-large-v3-turbo"

    llm = LLM(
        model=model_name,
        tensor_parallel_size=1,
        enforce_eager=False,
        attention_config=AttentionConfig(
            flash_attn_version=2,
        ),
    )
    params = SamplingParams(temperature=0.0, max_tokens=3)
    outputs = llm.generate(
        [
            {
            "prompt": "<|startoftranscript|><|en|><|transcribe|><|notimestamps|>",
            "multi_modal_data": {
                "audio": AudioAsset("mary_had_lamb").audio_and_sample_rate,
            },
        },
        ],
        sampling_params=params,
    )
    for o in outputs:
        generated_text = o.outputs[0].text
        print("output:", generated_text)
        assert "The first" in generated_text

if __name__ == "__main__":
    main()

Signed-off-by: NickLucche <nlucches@redhat.com>

gemini-code-assist

Code Review

This pull request introduces a temporary but effective fix for an accuracy issue observed with FlashAttention2 (FA2) when used with encoder-decoder models, specifically Whisper, as detailed in issue #33091. The change correctly disables CUDA graphs for this specific combination, mitigating the reported accuracy problems. The implementation is clear, uses logger.warning_once for informative logging, and appears to handle edge cases appropriately, such as when FlashAttention is not available or when the global CUDA graph mode is already set. This is a well-targeted solution to a critical accuracy bug.

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>

Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 1f3a2c2)

Signed-off-by: NickLucche <nlucches@redhat.com>

Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 1f3a2c2)

init

5963550

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche requested a review from LucasWilkinson as a code owner January 27, 2026 11:32

mergify bot added v1 bug Something isn't working labels Jan 27, 2026

gemini-code-assist bot reviewed Jan 27, 2026

View reviewed changes

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 27, 2026

DarkLight1337 approved these changes Jan 27, 2026

View reviewed changes

DarkLight1337 merged commit 1f3a2c2 into vllm-project:main Jan 27, 2026
56 of 57 checks passed

khluu pushed a commit that referenced this pull request Jan 28, 2026

[Bugfix] Disable CG for Whisper+FA2 (#33164)

d51e1f8

Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 1f3a2c2)

apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026

[Bugfix] Disable CG for Whisper+FA2 (vllm-project#33164)

fc82400

Signed-off-by: NickLucche <nlucches@redhat.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Bugfix] Disable CG for Whisper+FA2 (vllm-project#33164)

539aeae

Signed-off-by: NickLucche <nlucches@redhat.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 26, 2026

[Bugfix] Disable CG for Whisper+FA2 (vllm-project#33164)

39fb5aa

Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 1f3a2c2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Disable CG for Whisper+FA2#33164

[Bugfix] Disable CG for Whisper+FA2#33164
DarkLight1337 merged 1 commit intovllm-project:mainfrom
NickLucche:disable-fa2-cg

NickLucche commented Jan 27, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

NickLucche commented Jan 27, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test with

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NickLucche commented Jan 27, 2026 •

edited by github-actions bot

Loading