Skip to content

Commit 3ed574b

Browse files
DraftTarget speculative decoding support in AutoDeploy. Target model runs as ADEngine, draft model runs as PyTorchModelEngine. Only two-model spec dec is supported.
Signed-off-by: Govind Ramnarayan <[email protected]>
1 parent fe569f0 commit 3ed574b

File tree

9 files changed

+995
-215
lines changed

9 files changed

+995
-215
lines changed

tensorrt_llm/_torch/auto_deploy/llm_args.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,12 @@ class AutoDeployConfig(DynamicYamlMixInForSettings, BaseSettings):
185185
),
186186
)
187187

188+
draft_checkpoint_loader: Optional[object] = Field(
189+
default=None,
190+
description=
191+
"The checkpoint loader to use for the draft model when using speculative decoding with two models.",
192+
)
193+
188194
### SEQUENCE INTERFACE CONFIG ##################################################################
189195
max_input_len: int = Field(default=1024, description="The maximum input length.")
190196
max_num_tokens: Optional[int] = Field(default=None, description="The maximum number of tokens.")

0 commit comments

Comments
 (0)