[Spec Decoding] Streamline batch expansion tensor manipulation #7851

njhill · 2024-08-25T19:00:01Z

There are some inefficiencies in the spec decoding logic, particularly related to batch expansion and handling of mixed spec decode / non spec decode batches.

Have split_batch_by_proposal_len method split the batch in a single iteration rather than iterating separately to get the spec/non-spec lists
Call _run_no_spec rather than _run_speculative_decoding_step in the case that all sequences have max_proposal_len == 0
Add fast-path _contract_batch_all_spec method for the (common) case that all batch sequences have spec decode enabled which excludes logic to split/recombine the spec/non-spec sequences
Simplify the _split_scoring_output method used in _contract_batch to avoid unnecessary intermediate tensor manipulation

In an anecdotal test of mlpspeculator with bs=1 this gives a consistent 2-3% increase in throughput.

github-actions · 2024-08-25T19:00:12Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

cadedaniel

Thanks so much!

…project#7851)

…project#7851) Signed-off-by: Alvant <[email protected]>

…project#7851)

[Spec Decoding] Streamline batch expansion tensor manipulation

26a2335

njhill requested a review from cadedaniel August 25, 2024 19:00

isort

2fd3beb

cadedaniel approved these changes Aug 25, 2024

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 25, 2024

njhill merged commit 1856aff into vllm-project:main Aug 25, 2024
52 checks passed

njhill deleted the avoid-splitting branch August 25, 2024 22:45

gnpinkert pushed a commit to gnpinkert/vllm that referenced this pull request Aug 26, 2024

[Spec Decoding] Streamline batch expansion tensor manipulation (vllm-…

3eb52f7

…project#7851)

omrishiv pushed a commit to omrishiv/vllm that referenced this pull request Aug 26, 2024

[Spec Decoding] Streamline batch expansion tensor manipulation (vllm-…

72c0422

…project#7851)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Spec Decoding] Streamline batch expansion tensor manipulation (vllm-…

8649fcb

…project#7851) Signed-off-by: Alvant <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Spec Decoding] Streamline batch expansion tensor manipulation (vllm-…

8f141eb

…project#7851)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec Decoding] Streamline batch expansion tensor manipulation #7851

[Spec Decoding] Streamline batch expansion tensor manipulation #7851

njhill commented Aug 25, 2024

github-actions bot commented Aug 25, 2024

cadedaniel left a comment

[Spec Decoding] Streamline batch expansion tensor manipulation #7851

[Spec Decoding] Streamline batch expansion tensor manipulation #7851

Conversation

njhill commented Aug 25, 2024

github-actions bot commented Aug 25, 2024

cadedaniel left a comment

Choose a reason for hiding this comment