[None][chore] Remove executor_config in create_py_executor_instance #7463

leslie-fang25 · 2025-09-02T05:38:36Z

Summary by CodeRabbit

New Features
- Added optional PEFT cache configuration support.
- Introduced per-call controls for max batch size, beam width, and token limits.
- Exposed granular options for scheduler and KV cache handling.
Refactor
- Simplified executor creation by replacing a monolithic config object with explicit parameters.
- Streamlined propagation of configuration updates during executor setup.
Bug Fixes
- None.
Notes
- No user-visible behavior changes expected; enhancements provide finer-grained configuration control.

Description

This PR removes the use of executor_config in create_py_executor_instance, replacing it with individual field inputs.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: leslie-fang25 <[email protected]>

leslie-fang25 · 2025-09-02T05:56:04Z

/bot run

tensorrt-cicd · 2025-09-02T06:02:18Z

PR_Github #17309 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-02T15:32:59Z

PR_Github #17309 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13009 completed with status: 'SUCCESS'

coderabbitai · 2025-09-03T23:41:25Z

📝 Walkthrough

Walkthrough

Refactors PyExecutor creation to remove the top-level executor_config parameter. Adds granular parameters (batch/beam/token limits and specific config objects) to creation utilities and PyExecutor. Updates call sites to pass these parameters and captures potential peft_cache_config mutations from the created PyExecutor back into the original config holder.

Changes

Cohort / File(s)	Summary of changes
PyExecutor creation utility refactor `tensorrt_llm/_torch/pyexecutor/_util.py`	Removed executor_config argument from create_py_executor_instance. Added parameters: max_batch_size, max_beam_width, max_num_tokens, peft_cache_config, scheduler_config, cache_transceiver_config. Updated internal derivations, logging values, max_num_sequences calculation, PEFT cache manager setup, KV cache transceiver creation, and PyExecutor construction to use new parameters.
PyExecutor constructor extension `tensorrt_llm/_torch/pyexecutor/py_executor.py`	Added optional peft_cache_config parameter to PyExecutor.init. Imported PeftCacheConfig as _PeftCacheConfig. Stored as self.peft_cache_config. No behavioral changes beyond storing the config.
Creator call-site updates and config propagation `tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`	Replaced executor_config argument with explicit parameters in create_py_executor_instance calls within drafting_loop_wrapper. After first creation, propagates potential peft_cache_config mutation back via executor_config.peft_cache_config = py_executor.peft_cache_config. Second creation call updated similarly. Adjusted control/data flow around executor construction.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Creator as py_executor_creator.drafting_loop_wrapper
  participant Util as _util.create_py_executor_instance
  participant Exec as PyExecutor
  participant Cfg as executor_config (holder)

  Note over Creator: First executor creation (granular params)
  Creator->>Util: create_py_executor_instance(max_batch_size, max_beam_width,<br/>max_num_tokens, peft_cache_config, scheduler_config,<br/>cache_transceiver_config, ...)
  rect rgba(220,240,255,0.4)
    Util->>Exec: new PyExecutor(..., peft_cache_config)
    Exec-->>Util: instance (may have mutated peft_cache_config)
  end
  Util-->>Creator: PyExecutor

  Note over Creator: Surface potential PEFT cache mutation
  Creator->>Cfg: Cfg.peft_cache_config = Exec.peft_cache_config

  Note over Creator: Second executor creation with same granular params
  Creator->>Util: create_py_executor_instance(..., peft_cache_config, scheduler_config,<br/>cache_transceiver_config, ...)
  Util->>Exec: new PyExecutor(..., peft_cache_config)
  Exec-->>Util: instance
  Util-->>Creator: PyExecutor

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

[None][chore] rm executor config in kv cache connector #7372 — Similar removal of ExecutorConfig in KV cache connector and executor creation, shifting to granular parameters.
[None][chore] Remove two unused parameters in create_py_executor #7458 — Adjusts PyExecutor creation plumbing and signatures, aligning with granular configuration flow.

Suggested reviewers

QiJune
pcastonguay
lfr-0531

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

tensorrt_llm/_torch/pyexecutor/_util.py (4)
10-16: Deduplicate ModelConfig import to avoid ambiguity

Both absolute and relative imports bring in ModelConfig; keep one.

Apply:
-from ..model_config import ModelConfig
Also applies to: 24-24

533-533: Fix confusing log message (duplicate/max_num_requests label)

Currently logs max_batch_size twice and uses “max_num_requests” ambiguously.

Apply:
-        f"max_seq_len={max_seq_len}, max_num_requests={max_batch_size}, max_num_tokens={max_num_tokens}, max_batch_size={max_batch_size}"
+        f"max_seq_len={max_seq_len}, max_batch_size={max_batch_size}, max_beam_width={max_beam_width}, max_num_tokens={max_num_tokens}"
585-599: PEFT cache conversion path looks correct; minor naming nit

Flow from PeftCacheConfig (pydantic) → _to_pybind() is good. Consider renaming the reassigned peft_cache_config variable to avoid shadowing the input param for readability.

Apply:
-        peft_cache_config = peft_cache_config_model._to_pybind()
+        peft_cache_config_pybind = peft_cache_config_model._to_pybind()
And update uses below in this function:
-        peft_cache_manager = PeftCacheManager(
-            peft_cache_config=peft_cache_config,
+        peft_cache_manager = PeftCacheManager(
+            peft_cache_config=peft_cache_config_pybind,
...
-        return PyExecutor(
+        return PyExecutor(
...
-            peft_cache_config=peft_cache_config)
+            peft_cache_config=peft_cache_config_pybind)
652-652: Remove no-op self-assignment

Redundant statement.

Apply:
-    cache_transceiver_config = cache_transceiver_config
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

161-168: Document new parameters/attributes in constructor docstring

Add brief notes for max_seq_len and peft_cache_config to keep public API discoverable.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 60df6b2 and e159b4e.

📒 Files selected for processing (3)

tensorrt_llm/_torch/pyexecutor/_util.py (7 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

**/*

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Filenames compiled into a target must be case-insensitively unique

Files:

tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/pyexecutor/_util.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py

**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use spaces, not tabs; indent 4 spaces

Files:

tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/pyexecutor/_util.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else

Files:

tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/pyexecutor/_util.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py

**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/_torch/pyexecutor/_util.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py

🧠 Learnings (1)

📚 Learning: 2025-08-26T06:07:02.166Z

Learnt from: shaharmor98
PR: NVIDIA/TensorRT-LLM#7231
File: tensorrt_llm/_torch/pyexecutor/_util.py:504-509
Timestamp: 2025-08-26T06:07:02.166Z
Learning: In tensorrt_llm/_torch/pyexecutor/_util.py, when calling model_engine.set_lora_model_config(), pass model_binding_config.mlp_hidden_size directly without multiplying by mapping.tp_size, as the mlp_hidden_size from get_bindings_model_config() is already the per-TP rank value needed for LoRA weight packaging.

Applied to files:

tensorrt_llm/_torch/pyexecutor/_util.py
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py

🧬 Code graph analysis (1)

tensorrt_llm/_torch/pyexecutor/py_executor.py (2)

cpp/include/tensorrt_llm/executor/types.h (3)

tensorrt_llm (36-39)

tensorrt_llm (41-663)

RequestStage (368-603)

tensorrt_llm/llmapi/llm_args.py (1)

PeftCacheConfig (829-895)

🔇 Additional comments (7)

tensorrt_llm/_torch/pyexecutor/_util.py (3)

620-624: max_num_sequences computation LGTM

Using pp_size multiplier is consistent with existing semantics.

665-676: Plumbing of sizing and PEFT args into PyExecutor looks correct

max_batch_size/max_beam_width/max_seq_len and peft_cache_config are correctly forwarded.

639-646: Remove unnecessary None check for scheduler_config
scheduler_config is always instantiated (with a default capacity_scheduler_policy) and never None, so guarding before .capacity_scheduler_policy is not needed.

Likely an incorrect or invalid review comment.

tensorrt_llm/_torch/pyexecutor/py_executor.py (2)

27-31: Importing _PeftCacheConfig alias here is appropriate

Keeps binding types explicit and avoids confusion with llm_args.PeftCacheConfig.

161-168: Expose peft_cache_config on PyExecutor

Addition is straightforward and low-risk.

tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (2)

516-526: Creation call updated to granular params; PEFT config propagation back to executor_config is sensible

This preserves the previous implicit mutation behavior while aligning with the new API shape.

498-522: Confirm no lingering executor_config usage

I did not find any calls still passing an executor_config param to create_py_executor_instance, but please manually verify across all call sites that no caller still uses the old executor_config argument.

tensorrt_llm/_torch/pyexecutor/_util.py

tensorrt_llm/_torch/pyexecutor/py_executor_creator.py

…xecutor_instance

Signed-off-by: leslie-fang25 <[email protected]>

leslie-fang25 · 2025-09-04T02:11:02Z

/bot run

tensorrt-cicd · 2025-09-04T02:17:07Z

PR_Github #17596 [ run ] triggered by Bot

QiJune

LGTM

tensorrt-cicd · 2025-09-04T14:56:17Z

PR_Github #17596 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13232 completed with status: 'FAILURE'

leslie-fang25 · 2025-09-05T09:33:31Z

/bot run

tensorrt-cicd · 2025-09-05T09:38:37Z

PR_Github #17769 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-05T12:51:41Z

PR_Github #17769 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13302 completed with status: 'SUCCESS'

…VIDIA#7463) Signed-off-by: leslie-fang25 <[email protected]>

[None][chore] Remove executor_config in create_py_executor_instance

e159b4e

Signed-off-by: leslie-fang25 <[email protected]>

leslie-fang25 force-pushed the leslie/remove_executor_config_in_create_py_executor_instance branch from eb64e57 to e159b4e Compare September 2, 2025 05:54

leslie-fang25 requested review from QiJune and Superjomn September 3, 2025 23:32

leslie-fang25 marked this pull request as ready for review September 3, 2025 23:32

leslie-fang25 requested a review from a team as a code owner September 3, 2025 23:32

coderabbitai bot reviewed Sep 3, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/_util.py Show resolved Hide resolved

tensorrt_llm/_torch/pyexecutor/py_executor_creator.py Show resolved Hide resolved

leslie-fang25 added 2 commits September 4, 2025 09:37

Merge branch 'main' into leslie/remove_executor_config_in_create_py_e…

3b46982

…xecutor_instance

code format

a618715

Signed-off-by: leslie-fang25 <[email protected]>

QiJune approved these changes Sep 4, 2025

View reviewed changes

leslie-fang25 merged commit 9eb3911 into NVIDIA:main Sep 5, 2025
5 checks passed

This was referenced Sep 6, 2025

[None][chore] remove executor config in instantiate sampler #7516

Merged

[None][chore] remove executor config in kv cache creator #7526

Merged

Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025

[None][chore] Remove executor_config in create_py_executor_instance (N…

d5e8b72

…VIDIA#7463) Signed-off-by: leslie-fang25 <[email protected]>

[None][chore] Remove executor_config in create_py_executor_instance #7463

[None][chore] Remove executor_config in create_py_executor_instance #7463

Uh oh!

Conversation

leslie-fang25 commented Sep 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

leslie-fang25 commented Sep 2, 2025

Uh oh!

tensorrt-cicd commented Sep 2, 2025

Uh oh!

tensorrt-cicd commented Sep 2, 2025

Uh oh!

coderabbitai bot commented Sep 3, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

leslie-fang25 commented Sep 4, 2025

Uh oh!

tensorrt-cicd commented Sep 4, 2025

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Sep 4, 2025

Uh oh!

leslie-fang25 commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leslie-fang25 commented Sep 2, 2025 •

edited by coderabbitai bot

Loading