[MRV2] Extensible CG dispatch rework by WoosukKwon · Pull Request #36541 · vllm-project/vllm

WoosukKwon · 2026-03-09T20:52:12Z

Forked from #35959 since the PR doesn't allow editing 😅

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

…hanics - CudaGraphManager.capture() now handles iteration, warmup, and capture - Subclasses provide callbacks that set up forward context - Move EagleCudaGraphManager to spec_decode/eagle/cudagraph.py - Rename forward_fn to generate_fn for Eagle Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

- Remove PIECEWISE handling from EagleCudaGraphManager - Move set_forward_context into run_model (like main) - generate_draft takes params directly instead of using forward context - Remove unused instance variables from CudaGraphManager - Remove dead code in _init_candidates Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

The base CudaGraphManager uses a shared global pool for all cudagraphs. When Eagle and the main model share the same pool, their internal allocations (e.g., gumbel_sample temporaries like local_argmax/local_max) can overlap in memory. This causes memory corruption during cudagraph replay, leading to incorrect draft token sampling and broken verification. Symptoms: - Abnormally high acceptance rates (e.g., 76% instead of 62% at pos0) - Low accuracy (46% instead of 78% on GSM8K) - GPU-specific (appeared on H200 but not B200 due to allocation patterns) Fix: Create a dedicated cudagraph pool for Eagle, matching main branch behavior where each cudagraph manager has its own pool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring of the CUDA graph dispatch logic. The core of the changes is in vllm/v1/worker/gpu/cudagraph_utils.py, where the CudaGraphManager is reworked to be more extensible and robust.

Key improvements include:

Introduction of BatchExecutionDescriptor to encapsulate batch shape information for CUDA graph matching.
A new dispatch mechanism that uses pre-computed candidate graphs for different batch sizes, making the dispatch logic cleaner and more efficient.
Refactoring of the graph capture logic into a generic capture method that takes a factory function for the forward pass, decoupling the graph manager from model-specific details.
Introduction of ModelCudaGraphManager to handle model-specific aspects like hidden states, improving separation of concerns.
Updates to data parallelism utilities (dp_utils.py) to synchronize BatchExecutionDescriptor across ranks.
Consistent handling of padding for tokens and requests across various components, including block_table.py, model_runner.py, and model states.
Refactoring of EagleCudaGraphManager to inherit from the new CudaGraphManager, simplifying its implementation significantly.

Overall, these changes make the CUDA graph handling more modular, easier to understand, and more extensible for future features. The code quality is high, and the new design is a clear improvement over the previous implementation. I have not found any critical or high-severity issues in this pull request.

LucasWilkinson added 30 commits March 3, 2026 23:59

wip

d5f114c

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

wip

81e3ac3

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

73a36ff

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

wip

080ee62

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

clean

4f3ed1a

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

29af75c

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

f4626a5

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

18f3e24

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Make num_reqs_padded required in gather_block_tables

ac055f2

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

3ad20f7

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

5d466b6

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

1dc0457

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

db65896

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

c3419a4

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

4d85377

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

d3c93ed

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

91aa819

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

43bb5ca

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

6e6c062

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

465ca0d

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

64db442

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

d0f8bfe

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

6e8b57e

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

1141328

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

review comments + fix PIECEWISE only

d9fe9a5

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix deserialize

ad1834c

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup DP and don't request pad for PIECEWISE

fde7e34

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

0198457

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson and others added 7 commits March 6, 2026 00:01

fix hang

54b9c79

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

03b5789

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

82fbc2d

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

aba1411

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Merge branch 'main' into lwilkinson/mrv2-cg-dispatch

ecf4254

gemini fix

4c7004b

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

WoosukKwon requested review from ProExpertProg, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners March 9, 2026 20:52

mergify bot added nvidia v1 labels Mar 9, 2026

github-project-automation bot added this to NVIDIA Mar 9, 2026

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

WoosukKwon closed this Mar 9, 2026

github-project-automation bot moved this to Done in NVIDIA Mar 9, 2026

WoosukKwon deleted the lwilkinson/mrv2-cg-dispatch branch March 9, 2026 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRV2] Extensible CG dispatch rework#36541

[MRV2] Extensible CG dispatch rework#36541
WoosukKwon wants to merge 37 commits intomainfrom
lwilkinson/mrv2-cg-dispatch

WoosukKwon commented Mar 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

WoosukKwon commented Mar 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants