[Spec Decode] Defer clearing KV connector metadata for EAGLE3 speculative decode + prefill / decode disagg setup#34529
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses an issue with speculative decoding in a disaggregated prefill/decode setup, where KV connector metadata was being cleared prematurely. The fix defers this clearing until after the draft model has run. This is achieved by introducing a clear_metadata parameter to the post_forward and _get_kv_connector_output methods, allowing for conditional clearing. A new method is also added to explicitly clear the metadata when needed. The changes are consistently applied across different model runner implementations and appear to be correct and robust. The use of a default value for the new parameter ensures backward compatibility. Overall, this is a well-executed fix.
|
Hi @zixi-qi, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
…tive decoding Signed-off-by: qizixi <qizixi@meta.com>
0f1dce8 to
0bea1d6
Compare
houseroad
left a comment
There was a problem hiding this comment.
Looks good. Better to add some unittest.
…tive decode + prefill / decode disagg setup (vllm-project#34529) Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
…tive decode + prefill / decode disagg setup (vllm-project#34529) Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
There was a problem hiding this comment.
will follow up with a small refactor on top of this with @ZhanqiuHu here #34926
Better to add some unittest
I guess we'll pick this up..
…tive decode + prefill / decode disagg setup (vllm-project#34529) Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
…tive decode + prefill / decode disagg setup (vllm-project#34529) Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
…tive decode + prefill / decode disagg setup (vllm-project#34529) Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…tive decode + prefill / decode disagg setup (vllm-project#34529) Signed-off-by: qizixi <qizixi@meta.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
…tive decoding
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.