Skip to content

[Misc] Tidy up some spec decode logic in GPUModelRunner#31591

Merged
njhill merged 4 commits intovllm-project:mainfrom
njhill:tidy-spec-logic
Jan 8, 2026
Merged

[Misc] Tidy up some spec decode logic in GPUModelRunner#31591
njhill merged 4 commits intovllm-project:mainfrom
njhill:tidy-spec-logic

Conversation

@njhill
Copy link
Copy Markdown
Member

@njhill njhill commented Dec 31, 2025

Simplify messy top-level logic in GPUModelRunner.sample_tokens, avoid computing effective_drafter_max_model_len every step and only execute this spec-decoding-specific logic when spec decoding is actually enabled.

@mergify mergify bot added the v1 label Dec 31, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the speculative decoding logic in GPUModelRunner.sample_tokens. The changes simplify the code by grouping all speculative decoding logic within a check for spec_config is not None, which avoids unnecessary computations when speculative decoding is disabled. The logic for when to propose draft tokens has been clarified by introducing a new boolean flag, propose_drafts_after_bookkeeping, which correctly preserves the original behavior. The refactoring improves code readability and maintainability without introducing any functional changes. The changes look good.

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 31, 2025
@njhill njhill marked this pull request as ready for review December 31, 2025 20:40
@njhill
Copy link
Copy Markdown
Member Author

njhill commented Jan 2, 2026

CI failures are unrelated, also happening on main

njhill added 2 commits January 3, 2026 20:19
Signed-off-by: njhill <nickhill123@gmail.com>
Signed-off-by: njhill <nickhill123@gmail.com>
@njhill njhill requested a review from WoosukKwon as a code owner January 4, 2026 04:50
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 6, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 6, 2026
# Conflicts:
#	vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@mergify mergify bot removed the needs-rebase label Jan 8, 2026
@njhill njhill requested a review from benchislett January 8, 2026 06:20
self.num_spec_tokens = 0
if self.speculative_config:
self.num_spec_tokens = self.speculative_config.num_speculative_tokens
draft_config = self.speculative_config.draft_model_config
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just set self.draft_config and shortcut the multiple None checks when we need to access it?

if draft_config is not None and draft_config.max_model_len is not None:
self.effective_drafter_max_model_len = draft_config.max_model_len
else:
self.effective_drafter_max_model_len = self.max_model_len
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on making this min(self.max model len, draft max model len)? We have been seeing some logs where the drafter has a very high max model len even when the base model doesn't.

Also if you do this clamping you can move it into a helper fn to share the logic with the update function below

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just aiming to keep the existing logic, I'm not sure what makes the most sense, would defer to your judgement.

Copy link
Copy Markdown
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits, lgtm

@njhill
Copy link
Copy Markdown
Member Author

njhill commented Jan 8, 2026

Going to merge since the CI is green, and open follow-on PR for the nits

@njhill njhill merged commit a3d909a into vllm-project:main Jan 8, 2026
48 checks passed
@njhill njhill deleted the tidy-spec-logic branch January 8, 2026 17:10
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…#31591)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants