Skip to content

[Bugfix] Disable CG for Whisper+FA2#33164

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
NickLucche:disable-fa2-cg
Jan 27, 2026
Merged

[Bugfix] Disable CG for Whisper+FA2#33164
DarkLight1337 merged 1 commit intovllm-project:mainfrom
NickLucche:disable-fa2-cg

Conversation

@NickLucche
Copy link
Copy Markdown
Collaborator

@NickLucche NickLucche commented Jan 27, 2026

Temporary fix to address #33091 for a tentative v0.15.0 release. As the issue is purely in accuracy and does not produce a crash, I believe it's better to patch it until fixed as to not mislead users deploying Whisper.

This PR disables CG for encoder-decoder models when FA2 is detected.

Test with

python -m pytest -v -s -x tests/entrypoints/openai/test_transcription_validation_whisper.py

# On FA2, with following patch/diff
--- a/tests/entrypoints/openai/test_transcription_validation_whisper.py
+++ b/tests/entrypoints/openai/test_transcription_validation_whisper.py
@@ -16,7 +16,7 @@ import soundfile as sf
 from ...utils import RemoteOpenAIServer

 MODEL_NAME = "openai/whisper-large-v3-turbo"
-SERVER_ARGS = ["--enforce-eager"]
+SERVER_ARGS = ["--attention-config.flash_attn_version=2"]

or with manual script from issue (failing on master):

from vllm.config import AttentionConfig
from vllm import LLM, SamplingParams
from vllm.config.compilation import CompilationConfig, CompilationMode, CUDAGraphMode
from vllm.assets.audio import AudioAsset

def main():
    model_name = "openai/whisper-large-v3-turbo"

    llm = LLM(
        model=model_name,
        tensor_parallel_size=1,
        enforce_eager=False,
        attention_config=AttentionConfig(
            flash_attn_version=2,
        ),
    )
    params = SamplingParams(temperature=0.0, max_tokens=3)
    outputs = llm.generate(
        [
            {
            "prompt": "<|startoftranscript|><|en|><|transcribe|><|notimestamps|>",
            "multi_modal_data": {
                "audio": AudioAsset("mary_had_lamb").audio_and_sample_rate,
            },
        },
        ],
        sampling_params=params,
    )
    for o in outputs:
        generated_text = o.outputs[0].text
        print("output:", generated_text)
        assert "The first" in generated_text

if __name__ == "__main__":
    main()

Signed-off-by: NickLucche <nlucches@redhat.com>
@mergify mergify bot added v1 bug Something isn't working labels Jan 27, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a temporary but effective fix for an accuracy issue observed with FlashAttention2 (FA2) when used with encoder-decoder models, specifically Whisper, as detailed in issue #33091. The change correctly disables CUDA graphs for this specific combination, mitigating the reported accuracy problems. The implementation is clear, uses logger.warning_once for informative logging, and appears to handle edge cases appropriately, such as when FlashAttention is not available or when the global CUDA graph mode is already set. This is a well-targeted solution to a critical accuracy bug.

@NickLucche NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 27, 2026
@DarkLight1337 DarkLight1337 merged commit 1f3a2c2 into vllm-project:main Jan 27, 2026
56 of 57 checks passed
VedantMadane pushed a commit to VedantMadane/vllm that referenced this pull request Jan 28, 2026
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com>
khluu pushed a commit that referenced this pull request Jan 28, 2026
Signed-off-by: NickLucche <nlucches@redhat.com>
(cherry picked from commit 1f3a2c2)
apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026
Signed-off-by: NickLucche <nlucches@redhat.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: NickLucche <nlucches@redhat.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 26, 2026
Signed-off-by: NickLucche <nlucches@redhat.com>
(cherry picked from commit 1f3a2c2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants