- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.8k
[refactor] Clean up drafter/resource manager creation logic #5805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /bot run | 
| PR_Github #11177 [ run ] triggered by Bot | 
| PR_Github #11177 [ run ] completed with state  | 
25197f8    to
    39f1e6b      
    Compare
  
    | /bot run | 
| PR_Github #11311 [ run ] triggered by Bot | 
| PR_Github #11311 [ run ] completed with state  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
For user provided method,
user need to provide both drafter and resource_manager in the spec_config for certain method (e.g. NGram),
or only drafter if no resource_manager is needed (e.g. setting draft_token_ids is always [2,2,2,2]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This refactor streamlines the creation order of speculative decoding components by ensuring resource managers are instantiated before drafters and by surfacing a resource_manager field in user-provided spec configs.
- Add resource_managerfield and defaulting logic to user-provided decoding/config classes.
- Simplify get_spec_resource_managerandget_spec_drafterto use mode-based logic rather than passing the drafter.
- Reorder creation in create_py_executorso resource managers are set up before drafters.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description | 
|---|---|
| tensorrt_llm/llmapi/llm_args.py | Added resource_managertoUserProvidedDecodingConfigand updated instantiation. | 
| tensorrt_llm/_torch/speculative/utils.py | Removed unused drafter param, split get_spec_resource_managerandget_spec_drafter. | 
| tensorrt_llm/_torch/speculative/user_provided.py | Added resource_managerattribute and defaulting logic inUserProvidedConfig. | 
| tensorrt_llm/_torch/speculative/ngram.py | Removed base-class init params; assign spec_resource_managerdirectly. | 
| tensorrt_llm/_torch/speculative/drafter.py | Dropped the now-unnecessary __init__fromDrafter. | 
| tensorrt_llm/_torch/pyexecutor/py_executor_creator.py | Reordered resource manager creation before drafter instantiation. | 
Comments suppressed due to low confidence (3)
tensorrt_llm/_torch/speculative/utils.py:99
- Add unit tests for the ngrambranch inget_spec_resource_managerto verify that anNGramPoolManageris returned with the correct parameters.
    if spec_dec_mode.is_ngram():
tensorrt_llm/_torch/speculative/utils.py:101
- Add unit tests for the user_providedbranch inget_spec_resource_managerto ensurespec_config.resource_manageris returned when configured.
    if spec_dec_mode.is_user_provided():
tensorrt_llm/_torch/speculative/user_provided.py:23
- Add tests for UserProvidedConfig.__post_init__to verify thatresource_managercorrectly defaults todrafter.spec_resource_managerwhen available.
    resource_manager: Optional[BaseResourceManager] = None
22990d0    to
    0e0fd50      
    Compare
  
    | /bot run | 
| PR_Github #11669 [ run ] triggered by Bot | 
| PR_Github #11669 [ run ] completed with state  | 
Signed-off-by: Mike Iovine <[email protected]>
0e0fd50    to
    3c6a609      
    Compare
  
    | /bot run | 
| PR_Github #11836 [ run ] triggered by Bot | 
| PR_Github #11836 [ run ] completed with state  | 
| /bot run | 
| PR_Github #11955 [ run ] triggered by Bot | 
| PR_Github #11955 [ run ] completed with state  | 
| /bot run | 
| PR_Github #11965 [ run ] triggered by Bot | 
| PR_Github #11965 [ run ] completed with state  | 
) Signed-off-by: Mike Iovine <[email protected]>
Description
The purpose of this small refactor is to prepare for the upcoming migration of EAGLE3/DRAFT_TARGET to the new
Drafterinterface.We want all of the resource managers to be created before the
Drafterbecause the drafter might rely on those resource managers. The way things are currently set up is pretty confusing: drafter depends on the spec resource manager, but you still have to create the drafter before the spec resource manager.In this PR:
Drafteris always created after all resource managers.SpecConfig.Even though it cleans up our internal code, I realize that (2) makes the user-facing API fairly clunky for the ngram use case. I have a bit of logic in the
UserProvidedSpecConfigto clean things up: if a drafter is provided and the drafter has aspec_resource_managerattribute,resouce_managerwill default todrafter.spec_resource_manager, you don't need to specify it twice.Test Coverage
Existing tests.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.