Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spec Decoding] Streamline batch expansion tensor manipulation #7851

Merged
merged 2 commits into from
Aug 25, 2024

Conversation

njhill
Copy link
Member

@njhill njhill commented Aug 25, 2024

There are some inefficiencies in the spec decoding logic, particularly related to batch expansion and handling of mixed spec decode / non spec decode batches.

  • Have split_batch_by_proposal_len method split the batch in a single iteration rather than iterating separately to get the spec/non-spec lists
  • Call _run_no_spec rather than _run_speculative_decoding_step in the case that all sequences have max_proposal_len == 0
  • Add fast-path _contract_batch_all_spec method for the (common) case that all batch sequences have spec decode enabled which excludes logic to split/recombine the spec/non-spec sequences
  • Simplify the _split_scoring_output method used in _contract_batch to avoid unnecessary intermediate tensor manipulation

In an anecdotal test of mlpspeculator with bs=1 this gives a consistent 2-3% increase in throughput.

@njhill njhill requested a review from cadedaniel August 25, 2024 19:00
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

Copy link
Collaborator

@cadedaniel cadedaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much!

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 25, 2024
@njhill njhill merged commit 1856aff into vllm-project:main Aug 25, 2024
52 checks passed
@njhill njhill deleted the avoid-splitting branch August 25, 2024 22:45
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants