[Misc] Tidy up some spec decode logic in GPUModelRunner by njhill · Pull Request #31591 · vllm-project/vllm

njhill · 2025-12-31T20:22:16Z

Simplify messy top-level logic in GPUModelRunner.sample_tokens, avoid computing effective_drafter_max_model_len every step and only execute this spec-decoding-specific logic when spec decoding is actually enabled.

gemini-code-assist

Code Review

This pull request refactors the speculative decoding logic in GPUModelRunner.sample_tokens. The changes simplify the code by grouping all speculative decoding logic within a check for spec_config is not None, which avoids unnecessary computations when speculative decoding is disabled. The logic for when to propose draft tokens has been clarified by introducing a new boolean flag, propose_drafts_after_bookkeeping, which correctly preserves the original behavior. The refactoring improves code readability and maintainability without introducing any functional changes. The changes look good.

njhill · 2026-01-02T20:53:51Z

CI failures are unrelated, also happening on main

Signed-off-by: njhill <nickhill123@gmail.com>

mergify · 2026-01-06T18:54:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

# Conflicts: # vllm/v1/worker/gpu_model_runner.py Signed-off-by: Nick Hill <nickhill123@gmail.com>

benchislett · 2026-01-08T14:35:28Z

vllm/v1/worker/gpu_model_runner.py

        self.num_spec_tokens = 0
        if self.speculative_config:
            self.num_spec_tokens = self.speculative_config.num_speculative_tokens
+            draft_config = self.speculative_config.draft_model_config


Can you just set self.draft_config and shortcut the multiple None checks when we need to access it?

benchislett · 2026-01-08T14:37:22Z

vllm/v1/worker/gpu_model_runner.py

+            if draft_config is not None and draft_config.max_model_len is not None:
+                self.effective_drafter_max_model_len = draft_config.max_model_len
+            else:
+                self.effective_drafter_max_model_len = self.max_model_len


Thoughts on making this min(self.max model len, draft max model len)? We have been seeing some logs where the drafter has a very high max model len even when the base model doesn't.

Also if you do this clamping you can move it into a helper fn to share the logic with the update function below

I was just aiming to keep the existing logic, I'm not sure what makes the most sense, would defer to your judgement.

benchislett

Couple nits, lgtm

njhill · 2026-01-08T17:09:51Z

Going to merge since the CI is green, and open follow-on PR for the nits

…#31591) Signed-off-by: Nick Hill <nickhill123@gmail.com>

…#31591) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…#31591) Signed-off-by: Nick Hill <nickhill123@gmail.com>

mergify bot added the v1 label Dec 31, 2025

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 31, 2025

njhill marked this pull request as ready for review December 31, 2025 20:40

njhill added 2 commits January 3, 2026 20:19

[Misc] Tidy up some spec decode logic in GPUModelRunner

b225a73

Signed-off-by: njhill <nickhill123@gmail.com>

determine effective_drafter_max_model_len up-front

f6229e0

Signed-off-by: njhill <nickhill123@gmail.com>

njhill force-pushed the tidy-spec-logic branch from fece86d to f6229e0 Compare January 4, 2026 04:50

njhill requested a review from WoosukKwon as a code owner January 4, 2026 04:50

Merge branch 'main' into tidy-spec-logic

99922e5

mergify bot added the needs-rebase label Jan 6, 2026

Merge remote-tracking branch 'origin/main' into tidy-spec-logic

d3fd6d3

# Conflicts: # vllm/v1/worker/gpu_model_runner.py Signed-off-by: Nick Hill <nickhill123@gmail.com>

mergify bot removed the needs-rebase label Jan 8, 2026

njhill requested a review from benchislett January 8, 2026 06:20

benchislett reviewed Jan 8, 2026

View reviewed changes

benchislett approved these changes Jan 8, 2026

View reviewed changes

njhill merged commit a3d909a into vllm-project:main Jan 8, 2026
48 checks passed

njhill deleted the tidy-spec-logic branch January 8, 2026 17:10

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[Misc] Tidy up some spec decode logic in GPUModelRunner (vllm-project…

82f68a4

…#31591) Signed-off-by: Nick Hill <nickhill123@gmail.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[Misc] Tidy up some spec decode logic in GPUModelRunner (vllm-project…

8010022

…#31591) Signed-off-by: Nick Hill <nickhill123@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[Misc] Tidy up some spec decode logic in GPUModelRunner (vllm-project…

2955d38

…#31591) Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Misc] Tidy up some spec decode logic in GPUModelRunner (vllm-project…

20ff158

…#31591) Signed-off-by: Nick Hill <nickhill123@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Misc] Tidy up some spec decode logic in GPUModelRunner#31591

[Misc] Tidy up some spec decode logic in GPUModelRunner#31591
njhill merged 4 commits intovllm-project:mainfrom
njhill:tidy-spec-logic

njhill commented Dec 31, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

njhill commented Jan 2, 2026

Uh oh!

mergify bot commented Jan 6, 2026

Uh oh!

benchislett Jan 8, 2026

Uh oh!

benchislett Jan 8, 2026

Uh oh!

njhill Jan 8, 2026

Uh oh!

benchislett left a comment

Uh oh!

njhill commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

njhill commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

njhill commented Jan 2, 2026

Uh oh!

mergify bot commented Jan 6, 2026

Uh oh!

benchislett Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

njhill Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Jan 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

njhill commented Dec 31, 2025 •

edited

Loading