Conversation
add dflash prepare inputs logic and dummy run logic Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
original qwen3_dflash from PR Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
return corresponding dflash layers Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
|
@benchislett could you please take a look when you have a moment? |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
|
Hi @StanislavII, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Stanislav <60266193+StanislavII@users.noreply.github.com>
|
Hi @StanislavII, thank you for the contribution. I've had a look through the PR and it seems reasonably well implemented. I have been tinkering with DFlash and have opened a full-support PR today, here: #36847. I intend to solicit reviews and gather opinions based on that PR. Please consider both PRs and give a cursory review of my work, if you are available. If you feel that your implementation has certain advantages and we should move forward with this branch, please let me know as I am open to this possibility. Otherwise, please close the PR or change to a draft, and if you can leave a detailed review of my PR. Thanks again. |
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
DFlash integration in the speculative decoding pipeline related to the PRs #32206, #34014
This PR adds working DFlash support in the proposer path by handling DFlash-specific attention metadata and slot mapping correctly during the first drafting pass with supported batch size
Main changes:
set_inputs_first_pass()[next, mask, mask, ...]CommonAttentionMetadataThis change restores correct DFlash behavior for both single-request and batched proposer execution.
Test Result
Verified with Sglang version of Qwen3-8B-DFlash-b16 in GSM8K dataset.
Throughput / acceptance comparison
T=0 Draft Tokens = 16, Max Tokens = 2048