Commit 3ed574b
committed
DraftTarget speculative decoding support in AutoDeploy. Target model runs as ADEngine, draft model runs as PyTorchModelEngine. Only two-model spec dec is supported.
Signed-off-by: Govind Ramnarayan <[email protected]>1 parent fe569f0 commit 3ed574b
File tree
9 files changed
+995
-215
lines changed- tensorrt_llm
- _torch
- auto_deploy
- shim
- pyexecutor
- speculative
- llmapi
- tests
- integration/defs/examples
- unittest/_torch/auto_deploy/unit/singlegpu
9 files changed
+995
-215
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
185 | 185 | | |
186 | 186 | | |
187 | 187 | | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
188 | 194 | | |
189 | 195 | | |
190 | 196 | | |
| |||
0 commit comments