[None][feat] AutoDeploy: kwargs-first pipeline #151

lucaslie · 2025-10-02T22:09:02Z

will be transitioned to main repo after NVIDIA#8126 gets merged

PR switches to using kwargs instead of positional args everywhere in our transform system. Why? With mixed nn.Modules and GraphModules coming up, handling positional args reliably is becoming an increasing burden.

…eploy (NVIDIA#7770) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

Signed-off-by: Bo Deng <[email protected]>

Signed-off-by: Izzy Putterman <[email protected]>

…IDIA#7972) Signed-off-by: Balaram Buddharaju <[email protected]>

…n models (NVIDIA#8070) Signed-off-by: Chang Liu (Enterprise Products) <[email protected]> Signed-off-by: Chang Liu <[email protected]>

…t perf regressions (NVIDIA#8099) Signed-off-by: Eran Geva <[email protected]>

…eedy sampling (NVIDIA#8110) Signed-off-by: ixlmar <[email protected]>

Signed-off-by: Erin Ho <[email protected]>

Signed-off-by: Yifei Zhang <[email protected]>

Signed-off-by: Dongfeng Yu <[email protected]> Signed-off-by: dongfengy <[email protected]> Co-authored-by: Jin Li <[email protected]>

…DIA#8114) Signed-off-by: Shunkang <[email protected]> Signed-off-by: Patrice Castonguay <[email protected]> Co-authored-by: Shunkang <[email protected]>

Signed-off-by: Daniel Campora <[email protected]>

…8116) Signed-off-by: Patrice Castonguay <[email protected]>

… backend (NVIDIA#8092) Signed-off-by: Eran Geva <[email protected]>

Copilot

Pull Request Overview

This PR transitions the AutoDeploy transform system from using positional arguments to keyword arguments throughout the pipeline. The change addresses challenges with handling mixed nn.Modules and GraphModules by standardizing on kwargs-based interfaces, making the system more robust and maintainable.

Key changes:

Replaced *cm.args with **cm.named_args across all model invocations
Removed strict forward method implementations and related configuration
Updated export functions to use kwargs instead of positional args

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tensorrt_llm/_torch/auto_deploy/export/export.py	Updated torch_export_to_gm to make args optional and handle empty args tuple
tensorrt_llm/_torch/auto_deploy/transform/library/*.py	Replaced args-based model calls with kwargs-based calls in transforms
tensorrt_llm/_torch/auto_deploy/models/*.py	Removed strict forward method implementations and related infrastructure
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py	Updated model invocation to use kwargs interface
tests/unittest/_torch/auto_deploy/unit/singlegpu/*/.py	Updated test cases to use kwargs-based model calls
tensorrt_llm/_torch/auto_deploy/config/*.yaml	Removed kwargs-related configuration options

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-02T22:13:42Z

tensorrt_llm/_torch/auto_deploy/transformations/_graph.py

    gm: GraphModule,
    name: str,
-    add_kwargs: bool = False,
+    add_kwargs: bool = True,


Changing the default value of add_kwargs from False to True is a breaking change that could affect existing callers who rely on the previous default behavior. Consider using a deprecation period or explicit migration strategy.

Suggested change

add_kwargs: bool = True,

add_kwargs: bool = False,

Copilot · 2025-10-02T22:13:42Z

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py

 DynamicShape = Dict[int, Dim]  # indicating the dynamic shape in tensor dimension
 DynamicShapeCallback = Callable[[], DynamicShape]

+Constant = Union[int, float, str, None]


The Constant type alias is moved from line 767 to line 27, but this creates a duplicate definition since the original at line 767 is not removed in the diff. This could lead to confusion about which definition to use.

Signed-off-by: greg-kwasniewski1 <[email protected]>

…8005) Signed-off-by: Yilin Fan <[email protected]> Signed-off-by: nv-yilinf <[email protected]>

Signed-off-by: Nikita Korobov <[email protected]>

NVIDIA#8129)

Signed-off-by: Robin Kobus <[email protected]>

…ble_block_reuse (NVIDIA#8108) Signed-off-by: Lucas Liebenwein <[email protected]>

…ackend (NVIDIA#8075) Signed-off-by: Aurelien Chartier <[email protected]>

…DIA#8120) Signed-off-by: Suyog Gupta <[email protected]>

…VIDIA#7998) Signed-off-by: ziyixiong-nv <[email protected]>

…#8126) Signed-off-by: Lucas Liebenwein <[email protected]>

Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Mike Iovine <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

Fridah-nv · 2025-10-03T17:32:28Z

Could you elaborate on what using kwargs everywhere means? Does it mainly indicate: the graph inputs are purely kwargs-based because of torch_export_to_gm(gm, args=(), kwargs=cm.named_args)? And the ops we inserted during graph transformations should not be affected.

I feel this is going to affect some pattern matcher behavior because we use torch_export_to_gm as the export function.

lucaslie · 2025-10-03T19:28:28Z

see NVIDIA#8137

Fridah-nv and others added 14 commits October 1, 2025 13:13

[NVIDIA#5860][feat] Add ModelOPT INT4 awq fake quant support in AutoD…

de99e23

…eploy (NVIDIA#7770) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

[None][fix] fix patchelf version issue (NVIDIA#8112)

e107749

Signed-off-by: Bo Deng <[email protected]>

[None][feat] Draft: Save state first pass (NVIDIA#7012)

1ad7bc4

Signed-off-by: Izzy Putterman <[email protected]>

[TRTLLM-7733][feat] Executor changes to support helix parallelism (NV…

bd3d0ad

…IDIA#7972) Signed-off-by: Balaram Buddharaju <[email protected]>

[https://nvbugs/5549081][fix] Fix device id assignment for some visio…

726ac07

…n models (NVIDIA#8070) Signed-off-by: Chang Liu (Enterprise Products) <[email protected]> Signed-off-by: Chang Liu <[email protected]>

[NVIDIA#7588][feat] lock gpu clocks in test_perf.py to reliably detec…

32c7f8c

…t perf regressions (NVIDIA#8099) Signed-off-by: Eran Geva <[email protected]>

[TRTLLM-8269][test] do not explicitly pass temperature=0 to select gr…

fc7f78c

…eedy sampling (NVIDIA#8110) Signed-off-by: ixlmar <[email protected]>

[https://nvbugs/5556020][chore] waive test_eagle3 (NVIDIA#8119)

293637e

Signed-off-by: Erin Ho <[email protected]>

[TRTLLM-6589][feat] Support CUDA graph for DeepEP (NVIDIA#7514)

34d158b

Signed-off-by: Yifei Zhang <[email protected]>

[TRTLLM-7775][feat] Integrate tinygemm2 for gpt-oss (NVIDIA#7916)

6568e56

Signed-off-by: Dongfeng Yu <[email protected]> Signed-off-by: dongfengy <[email protected]> Co-authored-by: Jin Li <[email protected]>

[None][feat] Support for cancelling requests with disaggregation (NVI…

fefa7d8

…DIA#8114) Signed-off-by: Shunkang <[email protected]> Signed-off-by: Patrice Castonguay <[email protected]> Co-authored-by: Shunkang <[email protected]>

[None][fix] Fix access to new tokens in sampler. (NVIDIA#7958)

ab433b7

Signed-off-by: Daniel Campora <[email protected]>

[None][chore] Adding install_tensorrt.sh script to pip wheel (NVIDIA#…

08a4791

…8116) Signed-off-by: Patrice Castonguay <[email protected]>

[NVIDIA#7588][fix] fixed the kv cache size parsing in test_perf.py AD…

4136942

… backend (NVIDIA#8092) Signed-off-by: Eran Geva <[email protected]>

lucaslie requested review from Fridah-nv, nvchenghaoz and suyoggupta October 2, 2025 22:09

lucaslie self-assigned this Oct 2, 2025

lucaslie requested a review from Copilot October 2, 2025 22:13

Copilot AI reviewed Oct 2, 2025

View reviewed changes

lucaslie mentioned this pull request Oct 2, 2025

[None][feat] AutoDeploy: per graph or whole module transform infrastructure #152

Closed

greg-kwasniewski1 and others added 9 commits October 2, 2025 18:41

[TRTLLM-6342][bug] Patched incorrect starcoder tp config (NVIDIA#8118)

a5b59fd

Signed-off-by: greg-kwasniewski1 <[email protected]>

[None][feat] perf_metrics endpoint functionality improvement (NVIDIA#…

01423ac

…8005) Signed-off-by: Yilin Fan <[email protected]> Signed-off-by: nv-yilinf <[email protected]>

[None][feat] Update TRT-LLM Gen MoE kernels (NVIDIA#7970)

9b3d7cc

Signed-off-by: Nikita Korobov <[email protected]>

[https://nvbugs/5548098][fix] Fix flakey unit test for dynamic spec d… (

ba3dbb6

NVIDIA#8129)

[None] [refactor] Minor cleanup and improvements (NVIDIA#7619)

e2f69c5

Signed-off-by: Robin Kobus <[email protected]>

[None][feat] AutoDeploy: dive deeper into token generation bugs + ena…

5faa5e9

…ble_block_reuse (NVIDIA#8108) Signed-off-by: Lucas Liebenwein <[email protected]>

[None][fix] Fix Qwen3 FP8 per-tensor when requesting TRTLLM-GEN MoE b…

9db4366

…ackend (NVIDIA#8075) Signed-off-by: Aurelien Chartier <[email protected]>

[None][feat] AutoDeploy add autotuning when capturing cudagraphs (NVI…

d821524

…DIA#8120) Signed-off-by: Suyog Gupta <[email protected]>

[https://nvbugs/5537878][fix] Reserve an extra slot for padded batch (N…

7bc2d9e

…VIDIA#7998) Signed-off-by: ziyixiong-nv <[email protected]>

lucaslie and others added 3 commits October 3, 2025 12:14

[None][feat] AutoDeploy: compiler backends based on nn.Module (NVIDIA…

aaf2c3c

…#8126) Signed-off-by: Lucas Liebenwein <[email protected]>

[None][fix] Fix MTP 2-model (NVIDIA#8115)

ca82911

Signed-off-by: Mike Iovine <[email protected]> Signed-off-by: Mike Iovine <[email protected]>

kwargs-first pipeline

0aed9b8

Signed-off-by: Lucas Liebenwein <[email protected]>

Fridah-nv approved these changes Oct 3, 2025

View reviewed changes

lucaslie force-pushed the ll/kwargs_first branch from 1dbb9fc to 0aed9b8 Compare October 3, 2025 19:27

lucaslie closed this Oct 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][feat] AutoDeploy: kwargs-first pipeline #151

[None][feat] AutoDeploy: kwargs-first pipeline #151

Uh oh!

lucaslie commented Oct 2, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 2, 2025

Uh oh!

Copilot AI Oct 2, 2025

Uh oh!

Fridah-nv commented Oct 3, 2025

Uh oh!

lucaslie commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

22 participants

[None][feat] AutoDeploy: kwargs-first pipeline #151

[None][feat] AutoDeploy: kwargs-first pipeline #151

Uh oh!

Conversation

lucaslie commented Oct 2, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Fridah-nv commented Oct 3, 2025

Uh oh!

lucaslie commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

22 participants