Fix DFlash first prefill lookahead allocation by Yuyi-Ao · Pull Request #41971 · vllm-project/vllm

Yuyi-Ao · 2026-05-07T15:09:38Z

Purpose

Fix DFlash first-prefill lookahead allocation.

DFlash needs draft KV slots during the first prefill step. The scheduler should therefore allocate lookahead slots/blocks for DFlash even when num_computed_tokens == 0.

This PR also adds a focused test that connects scheduler allocation output to the real DFlash input expansion kernel and verifies the generated query slots are request-owned.

Test Plan

Run the new DFlash slot mapping regression test.
Compare behavior before and after the scheduler fix.

Test Result

.venv/bin/python -m pytest tests/v1/spec_decode/test_dflash_slot_mapping.py -v

pass

Signed-off-by: George-ao <1586028831@qq.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Yuyi-Ao · 2026-05-07T15:11:27Z

My current understanding is that DFlash needs lookahead KV/slot allocation during first prefill because it runs draft proposal in the same model runner step as the target prefill. During the first DFlash draft proposal, DFlash already needs KV slots for draft-token positions after the prompt.

I added tests/v1/spec_decode/test_dflash_slot_mapping.py to capture the validation I used for this issue. It looks like a repro script than a long-term test. Before the scheduler change, it fails because the kernel maps those first-prefill DFlash query positions through a logical block that was not allocated for the request.

Please correct me if I got any part of the DFlash flow wrong.

gemini-code-assist

Code Review

This pull request introduces DFlash support in the V1 scheduler, enabling lookahead token allocation during the first prefill step. It also adds a new test suite, test_dflash_slot_mapping.py, to verify that DFlash query slots address request-owned blocks. A critical feedback was provided regarding the initialization of num_lookahead_tokens when DFlash is enabled, as its omission would result in zero effective lookahead tokens.

gemini-code-assist · 2026-05-07T15:14:45Z

+            if speculative_config.use_dflash():
+                self.use_dflash = True


The num_lookahead_tokens is not initialized when use_dflash is true, which causes effective_lookahead_tokens to be 0 even when use_dflash is enabled. It should be set to self.num_spec_tokens to ensure lookahead slots are allocated.

Suggested change

if speculative_config.use_dflash():

self.use_dflash = True

if speculative_config.use_dflash():

self.use_dflash = True

self.num_lookahead_tokens = self.num_spec_tokens

num_lookahead_tokens is already initialized for DFlash because SpeculativeConfig.use_eagle() currently returns true for "dflash", and that branch sets self.num_lookahead_tokens = self.num_spec_tokens before the new use_dflash() branch runs. The new self.use_dflash flag is only used later to keep first-prefill lookahead enabled for DFlash.

Fix DFlash first prefill lookahead allocation

774814f

Signed-off-by: George-ao <1586028831@qq.com>

Yuyi-Ao requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners May 7, 2026 15:09

claude Bot reviewed May 7, 2026

View reviewed changes

mergify Bot added speculative-decoding v1 labels May 7, 2026

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

noonghunna mentioned this pull request May 8, 2026

[Spec Decode] Allow DFlash drafter to coexist with quantized target KV via independent KV groups + dtype override #42102

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix DFlash first prefill lookahead allocation#41971

Fix DFlash first prefill lookahead allocation#41971
Yuyi-Ao wants to merge 1 commit into
vllm-project:mainfrom
Yuyi-Ao:yuyiao/verify-dflash-prefill-lookahead

Yuyi-Ao commented May 7, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Yuyi-Ao commented May 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

Yuyi-Ao May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Yuyi-Ao commented May 7, 2026

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Yuyi-Ao commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Yuyi-Ao May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Yuyi-Ao commented May 7, 2026 •

edited

Loading