Skip to content

[cudagraphs] Refactor cudagraph capture loop#32946

Merged
LucasWilkinson merged 2 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/cg-refactor
Jan 23, 2026
Merged

[cudagraphs] Refactor cudagraph capture loop#32946
LucasWilkinson merged 2 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/cg-refactor

Conversation

@LucasWilkinson
Copy link
Copy Markdown
Collaborator

Refactor cudagraph capture loop to pave the way for different PIECEWISE and FULL sizes and dynamic spec-decode sizes

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request refactors the CUDA graph capture loop by centralizing the logic for determining which graphs to capture into the CudagraphDispatcher. This significantly cleans up the capture_model method in gpu_model_runner.py. New test cases have been added to ensure the get_capture_descs method in the dispatcher works as expected. However, a critical issue was identified in how uniform_decode is determined for CUDA graph capture, which could lead to incorrect graph configurations.

@LucasWilkinson LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 23, 2026
Copy link
Copy Markdown
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, didn't realize we had logic for different keys in two places

# We skip EPLB here since we don't want to record dummy metrics
for num_tokens, activate_lora in compilation_cases:
for batch_desc in batch_descriptors:
num_tokens = batch_desc.num_tokens
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we're moving closer and closer to passing BatchDescriptor to dummy run directly...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next 😄

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Jan 23, 2026
@LucasWilkinson LucasWilkinson merged commit 3a41459 into vllm-project:main Jan 23, 2026
49 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Jan 23, 2026
cwazai pushed a commit to cwazai/vllm that referenced this pull request Jan 25, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: 陈建华 <1647430658@qq.com>
lapy pushed a commit to lapy/vllm that referenced this pull request Jan 27, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants