Skip to content

[BugFix] Potential Fix for FA3 full-cudagraph IMA #25490

Merged
WoosukKwon merged 3 commits intomainfrom
lwilkinson/potential-full-CG-ima-fix
Sep 24, 2025
Merged

[BugFix] Potential Fix for FA3 full-cudagraph IMA #25490
WoosukKwon merged 3 commits intomainfrom
lwilkinson/potential-full-CG-ima-fix

Conversation

@LucasWilkinson
Copy link
Collaborator

@LucasWilkinson LucasWilkinson commented Sep 23, 2025

@WoosukKwon reported an IMA with FA3 full-CG that was fixed by doing https://github.com/vllm-project/vllm/compare/woosuk/fa3-ima?expand=1

the theory here is that get_scheduler_metadata was being called with a different max_num_splits than what was being passed to FlashAttentionMetadata

this is an alternative solution that doesn't lose the logic to use max_num_splits=0 (i.e. use the heuristic) for batches larger then max_cudagraph_size

we do not currently have a repo so cannot confirm this resolves @WoosukKwon 's IMA but this should be resolved regardless; we should alway make sure the arguments to get_scheduler_metadata and FlashAttentionMetadata are inline

vllm serve meta-llama/Meta-Llama-3-8B-Instruct -O.cudagraph_mode=FULL


lm_eval --model local-completions --model_args "base_url=http://0.0.0.0:8000/v1/completions,model=meta-llama/Meta-Llama-3-8B-Instruct,num_concurrent=256" --tasks gsm8k
...
local-completions (base_url=http://0.0.0.0:8000/v1/completions,model=meta-llama/Meta-Llama-3-8B-Instruct,num_concurrent=256), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7544|±  |0.0119|
|     |       |strict-match    |     5|exact_match|↑  |0.7559|±  |0.0118|

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a potential Invalid Memory Access in FlashAttention 3 with full CUDA graphs by ensuring the max_num_splits parameter is consistent. The change refactors the logic for setting max_num_splits to a common location. However, the current implementation introduces a critical flaw: it can lead to an UnboundLocalError because max_num_splits is not defined in all code paths. My review provides a fix for this issue to ensure the variable is always initialized. Addressing this will also help achieve the PR's goal of making the parameter consistent.

@tlrmchlsmth tlrmchlsmth added this to the v0.11.0 milestone Sep 23, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

comment

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@LucasWilkinson LucasWilkinson force-pushed the lwilkinson/potential-full-CG-ima-fix branch from 58df19e to e1c19ca Compare September 23, 2025 16:53
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@WoosukKwon
Copy link
Collaborator

@LucasWilkinson Can you please check the CI again?

mgoin and others added 2 commits September 23, 2025 21:31
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@WoosukKwon WoosukKwon merged commit 2338daf into main Sep 24, 2025
45 checks passed
@WoosukKwon WoosukKwon deleted the lwilkinson/potential-full-CG-ima-fix branch September 24, 2025 09:04
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants