Skip to content

[Speculative decoding] Adding configuration object for speculative decoding#3706

Merged
cadedaniel merged 8 commits intovllm-project:mainfrom
cadedaniel:spec-decode-llm-engine
Apr 3, 2024
Merged

[Speculative decoding] Adding configuration object for speculative decoding#3706
cadedaniel merged 8 commits intovllm-project:mainfrom
cadedaniel:spec-decode-llm-engine

Conversation

@cadedaniel
Copy link
Copy Markdown
Collaborator

@cadedaniel cadedaniel commented Mar 29, 2024

This PR is a subset of PR 6/9: Integrate speculative decoding with LLMEngine. in the speculative decoding open sourcing plan. It introduces a SpeculativeConfig and plumbs it through to the executors. The new flags are as follows:

parser.add_argument(
    '--speculative-model',
    type=str,
    default=None,
    help=
    'The name of the draft model to be used in speculative decoding.')

parser.add_argument(
    '--num-speculative-tokens',
    type=int,
    default=None,
    help='The number of speculative tokens to sample from '
    'the draft model in speculative decoding')

In the future we can extend these flags to support non-draft-model speculative decoding.

Testing

  • We assert that the GPUExecutor raises AssertionError when spec decode is enabled. This verifies the config works.

Misc.

  • This wraps the various engine configs class in an EngineConfig. This removes the need to do parallel_config = engine_configs[2] and device_config = engine_configs[4].

@cadedaniel cadedaniel changed the title [WIP] [Speculative decoding] Adding configuration object for speculative decoding [Speculative decoding] Adding configuration object for speculative decoding Mar 29, 2024
@LiuXiaoxuanPKU LiuXiaoxuanPKU self-assigned this Mar 29, 2024
@cadedaniel cadedaniel marked this pull request as ready for review April 1, 2024 23:48
@cadedaniel
Copy link
Copy Markdown
Collaborator Author

ready for review @LiuXiaoxuanPKU

@cadedaniel cadedaniel force-pushed the spec-decode-llm-engine branch from efc0278 to 7bc7532 Compare April 1, 2024 23:55
@cadedaniel cadedaniel force-pushed the spec-decode-llm-engine branch from 8fad4f5 to 1aefa81 Compare April 2, 2024 03:51
Copy link
Copy Markdown
Collaborator

@LiuXiaoxuanPKU LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Co-authored-by: Lily Liu <lilyliupku@gmail.com>
@cadedaniel cadedaniel enabled auto-merge (squash) April 2, 2024 23:07
@cadedaniel cadedaniel merged commit 5757d90 into vllm-project:main Apr 3, 2024
@cadedaniel cadedaniel deleted the spec-decode-llm-engine branch April 3, 2024 17:37
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants