Skip to content

Conversation

@mikeiovine
Copy link
Collaborator

Description

The purpose of this small refactor is to prepare for the upcoming migration of EAGLE3/DRAFT_TARGET to the new Drafter interface.

We want all of the resource managers to be created before the Drafter because the drafter might rely on those resource managers. The way things are currently set up is pretty confusing: drafter depends on the spec resource manager, but you still have to create the drafter before the spec resource manager.

In this PR:

  1. Move code around so that the Drafter is always created after all resource managers.
  2. For user-provided spec decode, the resource manager must be provided via the SpecConfig.

Even though it cleans up our internal code, I realize that (2) makes the user-facing API fairly clunky for the ngram use case. I have a bit of logic in the UserProvidedSpecConfig to clean things up: if a drafter is provided and the drafter has a spec_resource_manager attribute, resouce_manager will default to drafter.spec_resource_manager, you don't need to specify it twice.

Test Coverage

Existing tests.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@mikeiovine mikeiovine requested review from Funatiq and wili-65535 July 7, 2025 19:31
@mikeiovine mikeiovine requested review from a team as code owners July 7, 2025 19:31
@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11177 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11177 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8268 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11311 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11311 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8365 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@mikeiovine mikeiovine requested a review from Funatiq July 9, 2025 14:34
Copy link
Collaborator

@wili-65535 wili-65535 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
For user provided method,
user need to provide both drafter and resource_manager in the spec_config for certain method (e.g. NGram),
or only drafter if no resource_manager is needed (e.g. setting draft_token_ids is always [2,2,2,2]).

@Funatiq Funatiq requested a review from Copilot July 10, 2025 13:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This refactor streamlines the creation order of speculative decoding components by ensuring resource managers are instantiated before drafters and by surfacing a resource_manager field in user-provided spec configs.

  • Add resource_manager field and defaulting logic to user-provided decoding/config classes.
  • Simplify get_spec_resource_manager and get_spec_drafter to use mode-based logic rather than passing the drafter.
  • Reorder creation in create_py_executor so resource managers are set up before drafters.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tensorrt_llm/llmapi/llm_args.py Added resource_manager to UserProvidedDecodingConfig and updated instantiation.
tensorrt_llm/_torch/speculative/utils.py Removed unused drafter param, split get_spec_resource_manager and get_spec_drafter.
tensorrt_llm/_torch/speculative/user_provided.py Added resource_manager attribute and defaulting logic in UserProvidedConfig.
tensorrt_llm/_torch/speculative/ngram.py Removed base-class init params; assign spec_resource_manager directly.
tensorrt_llm/_torch/speculative/drafter.py Dropped the now-unnecessary __init__ from Drafter.
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py Reordered resource manager creation before drafter instantiation.
Comments suppressed due to low confidence (3)

tensorrt_llm/_torch/speculative/utils.py:99

  • Add unit tests for the ngram branch in get_spec_resource_manager to verify that an NGramPoolManager is returned with the correct parameters.
    if spec_dec_mode.is_ngram():

tensorrt_llm/_torch/speculative/utils.py:101

  • Add unit tests for the user_provided branch in get_spec_resource_manager to ensure spec_config.resource_manager is returned when configured.
    if spec_dec_mode.is_user_provided():

tensorrt_llm/_torch/speculative/user_provided.py:23

  • Add tests for UserProvidedConfig.__post_init__ to verify that resource_manager correctly defaults to drafter.spec_resource_manager when available.
    resource_manager: Optional[BaseResourceManager] = None

@mikeiovine mikeiovine force-pushed the drafter-changes branch 2 times, most recently from 22990d0 to 0e0fd50 Compare July 11, 2025 16:48
@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11669 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11669 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8641 completed with status: 'FAILURE'

@mikeiovine mikeiovine requested a review from a team as a code owner July 14, 2025 17:35
@mikeiovine mikeiovine requested a review from syuoni July 14, 2025 17:35
@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11836 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11836 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8772 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11955 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11955 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8872 completed with status: 'FAILURE'

@mikeiovine
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11965 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11965 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8883 completed with status: 'SUCCESS'

@schetlur-nv schetlur-nv merged commit fa34cb7 into NVIDIA:main Jul 16, 2025
3 checks passed
yizhang-nv pushed a commit to yizhang-nv/TensorRT-LLM that referenced this pull request Jul 17, 2025
@mikeiovine mikeiovine deleted the drafter-changes branch July 23, 2025 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants