Skip to content

Conversation

@leslie-fang25
Copy link
Collaborator

@leslie-fang25 leslie-fang25 commented Sep 2, 2025

Summary by CodeRabbit

  • New Features

    • Added optional PEFT cache configuration support.
    • Introduced per-call controls for max batch size, beam width, and token limits.
    • Exposed granular options for scheduler and KV cache handling.
  • Refactor

    • Simplified executor creation by replacing a monolithic config object with explicit parameters.
    • Streamlined propagation of configuration updates during executor setup.
  • Bug Fixes

    • None.
  • Notes

    • No user-visible behavior changes expected; enhancements provide finer-grained configuration control.

Description

This PR removes the use of executor_config in create_py_executor_instance, replacing it with individual field inputs.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@leslie-fang25 leslie-fang25 force-pushed the leslie/remove_executor_config_in_create_py_executor_instance branch from eb64e57 to e159b4e Compare September 2, 2025 05:54
@leslie-fang25
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17309 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17309 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13009 completed with status: 'SUCCESS'

@leslie-fang25 leslie-fang25 marked this pull request as ready for review September 3, 2025 23:32
@leslie-fang25 leslie-fang25 requested a review from a team as a code owner September 3, 2025 23:32
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 3, 2025

📝 Walkthrough

Walkthrough

Refactors PyExecutor creation to remove the top-level executor_config parameter. Adds granular parameters (batch/beam/token limits and specific config objects) to creation utilities and PyExecutor. Updates call sites to pass these parameters and captures potential peft_cache_config mutations from the created PyExecutor back into the original config holder.

Changes

Cohort / File(s) Summary of changes
PyExecutor creation utility refactor
tensorrt_llm/_torch/pyexecutor/_util.py
Removed executor_config argument from create_py_executor_instance. Added parameters: max_batch_size, max_beam_width, max_num_tokens, peft_cache_config, scheduler_config, cache_transceiver_config. Updated internal derivations, logging values, max_num_sequences calculation, PEFT cache manager setup, KV cache transceiver creation, and PyExecutor construction to use new parameters.
PyExecutor constructor extension
tensorrt_llm/_torch/pyexecutor/py_executor.py
Added optional peft_cache_config parameter to PyExecutor.init. Imported PeftCacheConfig as _PeftCacheConfig. Stored as self.peft_cache_config. No behavioral changes beyond storing the config.
Creator call-site updates and config propagation
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
Replaced executor_config argument with explicit parameters in create_py_executor_instance calls within drafting_loop_wrapper. After first creation, propagates potential peft_cache_config mutation back via executor_config.peft_cache_config = py_executor.peft_cache_config. Second creation call updated similarly. Adjusted control/data flow around executor construction.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Creator as py_executor_creator.drafting_loop_wrapper
  participant Util as _util.create_py_executor_instance
  participant Exec as PyExecutor
  participant Cfg as executor_config (holder)

  Note over Creator: First executor creation (granular params)
  Creator->>Util: create_py_executor_instance(max_batch_size, max_beam_width,<br/>max_num_tokens, peft_cache_config, scheduler_config,<br/>cache_transceiver_config, ...)
  rect rgba(220,240,255,0.4)
    Util->>Exec: new PyExecutor(..., peft_cache_config)
    Exec-->>Util: instance (may have mutated peft_cache_config)
  end
  Util-->>Creator: PyExecutor

  Note over Creator: Surface potential PEFT cache mutation
  Creator->>Cfg: Cfg.peft_cache_config = Exec.peft_cache_config

  Note over Creator: Second executor creation with same granular params
  Creator->>Util: create_py_executor_instance(..., peft_cache_config, scheduler_config,<br/>cache_transceiver_config, ...)
  Util->>Exec: new PyExecutor(..., peft_cache_config)
  Exec-->>Util: instance
  Util-->>Creator: PyExecutor
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • QiJune
  • pcastonguay
  • lfr-0531
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
tensorrt_llm/_torch/pyexecutor/_util.py (4)

10-16: Deduplicate ModelConfig import to avoid ambiguity

Both absolute and relative imports bring in ModelConfig; keep one.

Apply:

-from ..model_config import ModelConfig

Also applies to: 24-24


533-533: Fix confusing log message (duplicate/max_num_requests label)

Currently logs max_batch_size twice and uses “max_num_requests” ambiguously.

Apply:

-        f"max_seq_len={max_seq_len}, max_num_requests={max_batch_size}, max_num_tokens={max_num_tokens}, max_batch_size={max_batch_size}"
+        f"max_seq_len={max_seq_len}, max_batch_size={max_batch_size}, max_beam_width={max_beam_width}, max_num_tokens={max_num_tokens}"

585-599: PEFT cache conversion path looks correct; minor naming nit

Flow from PeftCacheConfig (pydantic) → _to_pybind() is good. Consider renaming the reassigned peft_cache_config variable to avoid shadowing the input param for readability.

Apply:

-        peft_cache_config = peft_cache_config_model._to_pybind()
+        peft_cache_config_pybind = peft_cache_config_model._to_pybind()

And update uses below in this function:

-        peft_cache_manager = PeftCacheManager(
-            peft_cache_config=peft_cache_config,
+        peft_cache_manager = PeftCacheManager(
+            peft_cache_config=peft_cache_config_pybind,
...
-        return PyExecutor(
+        return PyExecutor(
...
-            peft_cache_config=peft_cache_config)
+            peft_cache_config=peft_cache_config_pybind)

652-652: Remove no-op self-assignment

Redundant statement.

Apply:

-    cache_transceiver_config = cache_transceiver_config
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

161-168: Document new parameters/attributes in constructor docstring

Add brief notes for max_seq_len and peft_cache_config to keep public API discoverable.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 60df6b2 and e159b4e.

📒 Files selected for processing (3)
  • tensorrt_llm/_torch/pyexecutor/_util.py (7 hunks)
  • tensorrt_llm/_torch/pyexecutor/py_executor.py (2 hunks)
  • tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Filenames compiled into a target must be case-insensitively unique

Files:

  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tensorrt_llm/_torch/pyexecutor/_util.py
  • tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use spaces, not tabs; indent 4 spaces

Files:

  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tensorrt_llm/_torch/pyexecutor/_util.py
  • tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else

Files:

  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tensorrt_llm/_torch/pyexecutor/_util.py
  • tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tensorrt_llm/_torch/pyexecutor/_util.py
  • tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
🧠 Learnings (1)
📚 Learning: 2025-08-26T06:07:02.166Z
Learnt from: shaharmor98
PR: NVIDIA/TensorRT-LLM#7231
File: tensorrt_llm/_torch/pyexecutor/_util.py:504-509
Timestamp: 2025-08-26T06:07:02.166Z
Learning: In tensorrt_llm/_torch/pyexecutor/_util.py, when calling model_engine.set_lora_model_config(), pass model_binding_config.mlp_hidden_size directly without multiplying by mapping.tp_size, as the mlp_hidden_size from get_bindings_model_config() is already the per-TP rank value needed for LoRA weight packaging.

Applied to files:

  • tensorrt_llm/_torch/pyexecutor/_util.py
  • tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
🧬 Code graph analysis (1)
tensorrt_llm/_torch/pyexecutor/py_executor.py (2)
cpp/include/tensorrt_llm/executor/types.h (3)
  • tensorrt_llm (36-39)
  • tensorrt_llm (41-663)
  • RequestStage (368-603)
tensorrt_llm/llmapi/llm_args.py (1)
  • PeftCacheConfig (829-895)
🔇 Additional comments (7)
tensorrt_llm/_torch/pyexecutor/_util.py (3)

620-624: max_num_sequences computation LGTM

Using pp_size multiplier is consistent with existing semantics.


665-676: Plumbing of sizing and PEFT args into PyExecutor looks correct

max_batch_size/max_beam_width/max_seq_len and peft_cache_config are correctly forwarded.


639-646: Remove unnecessary None check for scheduler_config
scheduler_config is always instantiated (with a default capacity_scheduler_policy) and never None, so guarding before .capacity_scheduler_policy is not needed.

Likely an incorrect or invalid review comment.

tensorrt_llm/_torch/pyexecutor/py_executor.py (2)

27-31: Importing _PeftCacheConfig alias here is appropriate

Keeps binding types explicit and avoids confusion with llm_args.PeftCacheConfig.


161-168: Expose peft_cache_config on PyExecutor

Addition is straightforward and low-risk.

tensorrt_llm/_torch/pyexecutor/py_executor_creator.py (2)

516-526: Creation call updated to granular params; PEFT config propagation back to executor_config is sensible

This preserves the previous implicit mutation behavior while aligning with the new API shape.


498-522: Confirm no lingering executor_config usage

I did not find any calls still passing an executor_config param to create_py_executor_instance, but please manually verify across all call sites that no caller still uses the old executor_config argument.

@leslie-fang25
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17596 [ run ] triggered by Bot

Copy link
Collaborator

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17596 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13232 completed with status: 'FAILURE'

@leslie-fang25
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17769 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #17769 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13302 completed with status: 'SUCCESS'

@leslie-fang25 leslie-fang25 merged commit 9eb3911 into NVIDIA:main Sep 5, 2025
5 checks passed
Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants