-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feature: make trtllmsampler new_tokens format the universal format #4401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: make trtllmsampler new_tokens format the universal format #4401
Conversation
af06fcd to
27fd1d1
Compare
|
/bot run |
|
PR_Github #6632 [ run ] triggered by Bot |
|
PR_Github #6632 [ run ] completed with state |
8345da8 to
b07fa8e
Compare
|
/bot run |
|
PR_Github #7516 [ run ] triggered by Bot |
|
PR_Github #7516 [ run ] completed with state |
|
/bot run |
|
PR_Github #7698 [ run ] triggered by Bot |
|
PR_Github #7698 [ run ] completed with state |
f48672a to
71783a4
Compare
|
/bot run |
|
PR_Github #9539 [ run ] triggered by Bot |
|
PR_Github #9539 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors how speculative samplers handle new token formatting by unifying on a single TorchSampler.Args structure, streamlining decoder factory logic, and replacing legacy sampler implementations.
- Refactored
get_spec_decoderto acceptTorchSampler.Argsand updated MTP/Eagle3OneModel sampler constructors. - Consolidated request iteration via
ScheduledRequests.all_requests(), replacingitertools.chainacross the codebase. - Removed outdated Eagle3Sampler/Eagle3Decoder classes and integrated
SeqSlotManagerfor draft slot management.
Reviewed Changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| speculative/utils.py | Updated decoder factory signature and imports for spec decoders |
| speculative/mtp.py | Refactored MTPSampler constructor, updated stop criteria logic |
| speculative/eagle3.py | Removed legacy sampler/decoder classes, added new sampler class |
| pyexecutor/seq_slot_manager.py | Simplified resource prep using all_requests() |
| pyexecutor/scheduler.py | Changed all_requests to a method returning a list |
| pyexecutor/py_executor.py | Integrated SeqSlotManager, updated logits field assignments |
| pyexecutor/model_engine.py | Introduced BEAM_WIDTH, centralized batch indexing logic |
| pyexecutor/llm_request.py | Added py_is_draft flag to LlmRequest |
| pyexecutor/guided_decoder.py | Replaced itertools.chain with all_requests() |
| pyexecutor/_util.py | Centralized sampler instantiation with create_torch_sampler_args |
| auto_deploy/shim/ad_executor.py | Updated AD executor to use TorchSampler.Args and slot manager |
Comments suppressed due to low confidence (3)
tensorrt_llm/_torch/speculative/mtp.py:314
- The returned SampleStateMTP no longer includes a logits field, which may be accessed downstream in the executor (e.g., in
_executor_loop_pp). Consider preserving or settingdevice.logitsandhost.logitsinSampleStateMTPto avoid missing attribute errors.
)
tensorrt_llm/_torch/speculative/utils.py:83
- [nitpick] The parameter name
sampler_argsis more verbose than other code that usesargsforTorchSamplerparameters. Consider renaming it toargsfor consistency and brevity.
def get_spec_decoder(sampler_args: TorchSampler.Args, spec_config: SpecConfig):
tensorrt_llm/_torch/pyexecutor/model_engine.py:1160
- [nitpick] The
nonlocal mtp_batch_idxdeclaration appears after a conditional return in the nestedpy_batch_idxfunction. For clarity, move thenonlocalstatement to the top of the function body before any logic.
nonlocal mtp_batch_idx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AD changes LGTM
…er new_tokens format (NVIDIA#4401)" This reverts commit 58a8a8f. Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (#4401)" (#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
No description provided.