[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear by fynnsu · Pull Request #37013 · vllm-project/vllm

fynnsu · 2026-03-13T21:28:47Z

Purpose

#35158 defers kv_connector finalization until after draft models have run. extract_hidden_states previously re-initialized the kv_connector metadata as a workaround but that is no longer needed.

Also patches a small bug in ExampleHiddenStatesConnector which causes occasional crashes.

Test Plan

Ran/updated extract_hidden_states tests. Ran examples/offline_inference/extract_hidden_states.py

Tested generating a few thousand samples and confirmed that output shapes and token ids always matched expectations, even when new_block_ids is None.

Test Result

See above

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

vllm-project#35158 defers kv_connector finalization until after draft models have run. `extract_hidden_states` previously re-initialized the kv_connector metadata as a workaround but that is no longer needed. Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

gemini-code-assist

Code Review

This pull request refactors the extract_hidden_states speculative decoding method to align with recent changes in kv_connector finalization, removing a previous workaround. This simplifies the code in extract_hidden_states.py and gpu_model_runner.py by streamlining the kv_connector handling. Additionally, a bug in ExampleHiddenStatesConnector is fixed, preventing crashes when new_block_ids is None. The changes are well-implemented and improve code clarity and robustness. I have not identified any issues.

tlrmchlsmth

One question, but otherwise LGTM and nice cleanup!

tlrmchlsmth · 2026-03-13T21:41:12Z

vllm/distributed/kv_transfer/kv_connector/v1/example_hidden_states_connector.py

+            if new_block_ids is None:
+                continue


Why can new_block_ids be None?

Having difficulty tracing down the root cause for this. It happens when running a lot of async requests at once and infrequently (i.e. w/ 32 simultaneous requests I hit this after processing 700 requests successfully). It is a rare case on a rare section of the code.

I have verified that it is a real request that triggers it (isn't empty or something), but the request still produces the hidden states file output as expected with the correct token ids and hidden state shapes.

@fynnsu input_fits_in_drafter failing due to long sequences is not a concern for this style of mock-drafting, right?

@fynnsu input_fits_in_drafter failing due to long sequences is not a concern for this style of mock-drafting, right?

I think that is actually a risk with the current setup. I believe it would only be triggered if max_seq_len == max_model_size since num_speculative_tokens=1 but with Eagle3 we just fall back to not drafting, which doesn't work for the extract_hidden_states method.

Claude's suggestion is to just always set this to True for this draft method:

There's a more direct fix to the original problem: since extract_hidden_states doesn't actually speculate (the "draft" tokens are always the sampled tokens, always verify), the
input_fits_in_drafter guard is protecting against a scenario that doesn't apply. The check exists because real drafters (Eagle, DraftModel) would write KV entries beyond
max_model_len. But the extract_hidden_states drafter's KV cache is a separate cache-only layer whose writes are indexed by the target model's slot_mapping — it writes at the same
positions the target already wrote at, so the max_model_len concern doesn't arise.

A targeted fix would be to make input_fits_in_drafter always True for extract_hidden_states:

input_fits_in_drafter = spec_decode_common_attn_metadata is not None and ( spec_config.uses_extract_hidden_states() # always fits or spec_decode_common_attn_metadata.max_seq_len + self.num_spec_tokens <= self.effective_drafter_max_model_len )

That logic does make sense to me and we shouldn't actually need the extra position of the "drafted" token, we just can't set num_speculative_tokens=0 without updating a whole bunch of guards. What do you think?

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Previously we needed to patch the `has_kv_transfer_group` import but that's no longer needed. Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

…or clear (vllm-project#37013)

…or clear (vllm-project#37013) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>

…or clear (vllm-project#37013)

fynnsu added 2 commits March 13, 2026 21:21

Patch bug in ExampleHiddenStatesConnector that causes occasional crashes

ab64c43

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

fynnsu requested review from ApostaC, MatthewBonanni, NickLucche, benchislett, luccafong, njhill and orozery as code owners March 13, 2026 21:28

mergify bot added speculative-decoding v1 kv-connector labels Mar 13, 2026

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

tlrmchlsmth approved these changes Mar 13, 2026

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2026

fynnsu added 2 commits March 13, 2026 21:48

lint

460e38f

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

fix docstring

c22855b

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

fynnsu force-pushed the extract_hidden_states_update branch from af10200 to c22855b Compare March 13, 2026 21:49

fynnsu added 2 commits March 14, 2026 03:20

fix missing attribute

308b63c

Previously we needed to patch the `has_kv_transfer_group` import but that's no longer needed. Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

fix shape mismatch

30eb4cf

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

mgoin approved these changes Mar 16, 2026

View reviewed changes

mgoin merged commit 04bf5a3 into vllm-project:main Mar 16, 2026
49 of 50 checks passed

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026

[Spec Decode] Update extract_hidden_states to use deferred kv_connect…

d276afe

…or clear (vllm-project#37013)

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Spec Decode] Update extract_hidden_states to use deferred kv_connect…

0c5f1c7

…or clear (vllm-project#37013) Signed-off-by: wendyliu235 <wenjun.liu@intel.com>

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[Spec Decode] Update extract_hidden_states to use deferred kv_connect…

e2a0e81

…or clear (vllm-project#37013)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear#37013

[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear#37013
mgoin merged 6 commits intovllm-project:mainfrom
fynnsu:extract_hidden_states_update

fynnsu commented Mar 13, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

tlrmchlsmth left a comment

Uh oh!

tlrmchlsmth Mar 13, 2026

Uh oh!

fynnsu Mar 13, 2026

Uh oh!

benchislett Mar 14, 2026

Uh oh!

fynnsu Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

fynnsu commented Mar 13, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

fynnsu Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

benchislett Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

fynnsu Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fynnsu commented Mar 13, 2026 •

edited by github-actions bot

Loading