Skip to content

[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding#33624

Merged
robertgshaw2-redhat merged 2 commits intovllm-project:mainfrom
zou3519:speculative_fix
Feb 3, 2026
Merged

[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding#33624
robertgshaw2-redhat merged 2 commits intovllm-project:mainfrom
zou3519:speculative_fix

Conversation

@zou3519
Copy link
Collaborator

@zou3519 zou3519 commented Feb 3, 2026

I'm also down to turn this optimization off by default too, just let me know.

I don't have a machine to run deepseek v3.2 right now, so someone please test this

…e is speculative decoding

Signed-off-by: Richard Zou <zou3519@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a potential silent incorrectness issue with the fast_moe_cold_start optimization when speculative decoding is enabled. It introduces a new configuration flag, fast_moe_cold_start, which is enabled by default but is now correctly disabled when speculative decoding is active. The documentation for the new flag clearly explains the assumptions and risks associated with this optimization. The implementation is clean and effectively prevents the issue by checking for a speculative decoding configuration and logging a warning when the optimization is consequently ignored. This is a crucial fix for ensuring correctness in MoE models using speculative decoding.

@robertgshaw2-redhat
Copy link
Collaborator

running now

@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) February 3, 2026 01:50
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 3, 2026
@robertgshaw2-redhat
Copy link
Collaborator

robertgshaw2-redhat commented Feb 3, 2026

thanks for the fix!

MODEL := "nvidia/DeepSeek-R1-NVFP4"

GPUS := "4"
PORT := "8001"

launch_mtp:
	chg run --gpus {{GPUS}} -- vllm serve {{MODEL}} -tp {{GPUS}} --speculative_config '{"num_speculative_tokens":1, "method":"deepseek_mtp"}' --port {{PORT}} --enforce-eager

benchmark:
	vllm bench serve \
		--port {{PORT}} \
		--model {{MODEL}} \
		--dataset-name random \
		--input-len 1000 \
		--output-len 200 \
		--max-concurrency 10 \
		--num-prompts 50 \
		--seed $(date +%s) \
                --temperature 0.0 \
(APIServer pid=449916) INFO 02-02 21:04:46 [metrics.py:100] SpecDecoding metrics: Mean acceptance length: 1.93, Accepted throughput: 121.29 tokens/s, Drafted throughput: 129.99 tokens/s, Accepted: 1213 tokens, Drafted: 1300 tokens, Per-position acceptance rate: 0.933, Avg Draft acceptance rate: 93.3%

@mgoin mgoin added the bug Something isn't working label Feb 3, 2026
@mgoin mgoin added this to the v0.15.1 Hotfix milestone Feb 3, 2026
@robertgshaw2-redhat robertgshaw2-redhat merged commit 5eac9a1 into vllm-project:main Feb 3, 2026
45 checks passed
khluu pushed a commit that referenced this pull request Feb 3, 2026
…e is speculative decoding (#33624)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
(cherry picked from commit 5eac9a1)
khluu pushed a commit that referenced this pull request Feb 3, 2026
…e is speculative decoding (#33624)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
(cherry picked from commit 5eac9a1)
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
…e is speculative decoding (vllm-project#33624)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Pai <416932041@qq.com>
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
…e is speculative decoding (vllm-project#33624)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: Pai <416932041@qq.com>
gameofdimension pushed a commit to gameofdimension/vllm that referenced this pull request Feb 5, 2026
…e is speculative decoding (vllm-project#33624)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…e is speculative decoding (vllm-project#33624)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants