[Graph Optimization][Speculative Decoding] Fix the bug of CUDAGraph + MTP + EP #4456
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
When used in CUDAGraph+MTP+EP scenarios, Draft Models on different GPUs may run empty, causing
all_gather()
to receive an error when obtainingstep_use_cudagraph
.Modifications
Draft Model directly uses the
step_use_cudagraph
Flag of Target ModelUsage or Command
Accuracy Tests
Checklist
[FDConfig]
,[APIServer]
,[Engine]
,[Scheduler]
,[PD Disaggregation]
,[Executor]
,[Graph Optimization]
,[Speculative Decoding]
,[RL]
,[Models]
,[Quantization]
,[Loader]
,[OP]
,[KVCache]
,[DataProcessor]
,[BugFix]
,[Docs]
,[CI]
,[Optimization]
,[Feature]
,[Benchmark]
,[Others]
,[XPU]
,[HPU]
,[GCU]
,[DCU]
,[Iluvatar]
,[Metax]
]pre-commit
before commit.release
branch, make sure the PR has been submitted to thedevelop
branch, then cherry-pick it to therelease
branch with the[Cherry-Pick]
PR tag.