Skip to content

fix pd-mtp metadata buffer hidden size#23918

Merged
hnyls2002 merged 1 commit intodsv4-rebasefrom
lsyin/fix-pd-mtp-spec-hidden-size
Apr 28, 2026
Merged

fix pd-mtp metadata buffer hidden size#23918
hnyls2002 merged 1 commit intodsv4-rebasefrom
lsyin/fix-pd-mtp-spec-hidden-size

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

Use spec_hidden_size for the PD MetadataBuffers so the spec module's hidden state ferries correctly across prefill/decode. Adds matching EAGLE config to the PD-disagg NIXL test prefill side.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the disaggregation scheduler to use spec_hidden_size for EAGLE-based speculative algorithms and refactors the DSv4 Flash PD-disaggregation test. The test configuration now explicitly defines speculative decoding parameters for both prefill and decode stages and increases the GSM8K accuracy threshold to 0.95. Review feedback identifies an inconsistency in the scheduler where is_standalone() was omitted from a buffer sizing condition and suggests consolidating duplicated test arguments to improve maintainability.

Comment on lines +898 to 899
model_config.spec_hidden_size
if self.spec_algorithm.is_eagle()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The condition for using spec_hidden_size on the decode side is inconsistent with the prefill side (lines 952-954). It should also include is_standalone() to ensure that metadata buffers are sized correctly when using standalone speculative algorithms (such as MTP). Without this, a buffer size mismatch will occur during state transfer if is_standalone() is true. Note that hidden_states_dtype (lines 902-906) likely requires a similar update for consistency, although it is not part of this diff hunk.

Suggested change
model_config.spec_hidden_size
if self.spec_algorithm.is_eagle()
model_config.spec_hidden_size
if self.spec_algorithm.is_eagle() or self.spec_algorithm.is_standalone()

Comment on lines 71 to +96
prefill_args = [
*COMMON_ARGS,
"--trust-remote-code",
"--disaggregation-mode",
"prefill",
"--base-gpu-id",
"0",
"--tp",
"4",
"--cuda-graph-max-bs",
"128",
"--max-running-requests",
"256",
"--mem-fraction-static",
"0.7",
"--disaggregation-decode-tp",
"4",
"--disaggregation-decode-dp",
"4",
"--speculative-algorithm",
"EAGLE",
"--speculative-num-steps",
"3",
"--speculative-eagle-topk",
"1",
"--speculative-num-draft-tokens",
"4",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The speculative and runtime arguments are now duplicated between start_prefill and start_decode. To improve maintainability and prevent configuration drift, consider defining a shared list for common arguments (e.g., --cuda-graph-max-bs, --max-running-requests, and the --speculative-* flags) and unpacking it into both prefill_args and decode_args.

@hnyls2002 hnyls2002 merged commit 7e2faae into dsv4-rebase Apr 28, 2026
6 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/fix-pd-mtp-spec-hidden-size branch April 28, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant