[None][feat] AutoDeploy: per graph or whole module transform infrastructure #152

lucaslie · 2025-10-02T22:13:08Z

will be transitioned to main repo after NVIDIA#8126 and #151 get merged

This PR adds infrastructure to have each transform be either applied over the nn.Module or each individual subgraph
For existing transforms, I have transitioned some of them to be over the whole module as appropriate

Copilot

Pull Request Overview

Introduces per-graph vs whole-module transform infrastructure for AutoDeploy, allowing each transform to be applied either to individual subgraphs or the entire module. This provides more granular control over transformation application and enables better optimization strategies.

Added run_per_gm configuration field to control transform application scope
Refactored transform interface to support both per-graph and whole-module operations
Migrated appropriate existing transforms to operate on whole modules

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`tensorrt_llm/_torch/auto_deploy/transform/interface.py`	Core infrastructure changes: added `run_per_gm` config, refactored transform application logic, and updated method signatures
`tensorrt_llm/_torch/auto_deploy/transform/optimizer.py`	Updated optimizer to work with nn.Module instead of GraphModule
`tensorrt_llm/_torch/auto_deploy/transform/library/`	Migrated transform implementations to use `_apply_to_full_model` method and nn.Module parameters
`tensorrt_llm/_torch/auto_deploy/transformations/_graph.py`	Updated graph utility functions to accept nn.Module instead of GraphModule
`tensorrt_llm/_torch/auto_deploy/custom_ops/`	Removed unused input_ids parameter from prepare_metadata functions
`tensorrt_llm/_torch/auto_deploy/config/`	Updated configuration files to set `run_per_gm: false` for appropriate transforms
`tests/unittest/_torch/auto_deploy/`	Updated test configurations and removed unused variables

Comments suppressed due to low confidence (1)

tensorrt_llm/_torch/auto_deploy/transform/interface.py:1

The comparison subgm is mod will always be False since subgm is a GraphModule from named_graphmodules(mod) but mod is an nn.Module. This should compare against the root GraphModule if that's the intended logic.

"""The interface for all transforms.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-02T22:13:47Z

tensorrt_llm/_torch/auto_deploy/custom_ops/mla.py

 @prepare_fused_mla_metadata.register_fake
 def prepare_fused_mla_metadata_fake(
-    input_ids, position_ids, seq_len, input_pos, cache_loc, pages_per_seq, page_size
+    position_ids, seq_len, input_pos, cache_loc, pages_per_seq, slot_idx, page_size


Missing slot_idx parameter in the original function signature. The fake function signature should match the real function signature exactly.

lucaslie · 2025-10-02T23:08:45Z

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py

+        # NOTE: for now we do _not_ include input_ids since we are not guaranteed that input_ids
+        # is part of the graph, e.g., in situations where the graph is a submodule of the overall
+        # model. In such instances, the graph usually sees inputs_embeds. However, we assume for
+        # now that position_ids is always part of the graph.
+        return ("position_ids",) + self._cached_arg_names


FYI: the reason why I am relying on position_ids instead of input_ids from now for some of these changes

Why do we need to assume position_ids is always part of the input? In other words, why do we need to process it differently than other args like input_ids/inputs_embeds? Is it just for convenience?

Fridah-nv · 2025-10-03T17:37:37Z

tensorrt_llm/_torch/auto_deploy/config/default.yaml

    stage: pattern_matcher
  match_eager_attention:
    stage: pattern_matcher
+    requires_shape_prop: true


Do you know why match_eager_attention requires shape propagation now?

NVIDIA#6806) Signed-off-by: Michal Guzek <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

…rgs (NVIDIA#8137) Signed-off-by: Lucas Liebenwein <[email protected]>

Signed-off-by: Erin Ho <[email protected]> Co-authored-by: Yuan Tong <[email protected]> Co-authored-by: Erin Ho <[email protected]>

Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

…pattern matcher utility; remove fuse_collective (NVIDIA#7545) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

…DIA#5543) Signed-off-by: Yan Chunwei <[email protected]> Signed-off-by: chunweiy <[email protected]> Signed-off-by: Superjomn <[email protected]> Signed-off-by: chunweiy <[email protected]>

…_multi_lora, fix its API use with pytorch flow LoRA (NVIDIA#8146) Signed-off-by: Amit Zuker <[email protected]>

Signed-off-by: Patrice Castonguay <[email protected]>

Signed-off-by: Yan Chunwei <[email protected]>

…VIDIA#8121) Signed-off-by: ixlmar <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie · 2025-10-06T14:11:21Z

see NVIDIA#8157

lucaslie requested review from Fridah-nv, Copilot, nvchenghaoz and suyoggupta October 2, 2025 22:13

lucaslie self-assigned this Oct 2, 2025

Copilot AI reviewed Oct 2, 2025

View reviewed changes

lucaslie mentioned this pull request Oct 2, 2025

[None][feat] AutoDeploy: compiler backends based on nn.Module NVIDIA/TensorRT-LLM#8126

Merged

1 task

lucaslie commented Oct 2, 2025

View reviewed changes

Fridah-nv reviewed Oct 3, 2025

View reviewed changes

[TRTLLM-6496][feat] Add LoRa Torch tests for the latest NIM model list (

38da871

NVIDIA#6806) Signed-off-by: Michal Guzek <[email protected]>

lucaslie force-pushed the ll/kwargs_first branch from 1dbb9fc to 0aed9b8 Compare October 3, 2025 19:27

lucaslie and others added 11 commits October 3, 2025 15:39

[None][feat] AutoDeploy: Nemotron-H accuracy test (NVIDIA#8133)

2c454e8

Signed-off-by: Lucas Liebenwein <[email protected]>

[None][feat] AutoDeploy: graph/module inputs with kwargs instead of a…

9d098e3

…rgs (NVIDIA#8137) Signed-off-by: Lucas Liebenwein <[email protected]>

[TRTLLM-7349][feat] Adding new orchestrator type -- ray (NVIDIA#7520)

88ea2c4

Signed-off-by: Erin Ho <[email protected]> Co-authored-by: Yuan Tong <[email protected]> Co-authored-by: Erin Ho <[email protected]>

[None][autodeploy] small refactors on attention matching (NVIDIA#8079)

744246d

Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

[NVIDIA#5255][autodeploy] Update FuseAllreduceResidualRMSNorm to use …

f6654f2

…pattern matcher utility; remove fuse_collective (NVIDIA#7545) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>

[TRTLLM-8189][chore] enhance GenerationExecutor with RPC (part1) (NVI…

fb51de6

…DIA#5543) Signed-off-by: Yan Chunwei <[email protected]> Signed-off-by: chunweiy <[email protected]> Signed-off-by: Superjomn <[email protected]> Signed-off-by: chunweiy <[email protected]>

[https://nvbugs/5521949][fix] Re-enable test_bielik_11b_v2_2_instruct…

8060aad

…_multi_lora, fix its API use with pytorch flow LoRA (NVIDIA#8146) Signed-off-by: Amit Zuker <[email protected]>

[None][fix] Adding docker folder to Dockerfile (NVIDIA#8138)

fba351a

Signed-off-by: Patrice Castonguay <[email protected]>

[None][chore] fix llmargs conflict (NVIDIA#8152)

54ab976

Signed-off-by: Yan Chunwei <[email protected]>

[TRTLLM-8413][chore] resolve sampling defaults in OpenAI API backend (N…

98b3af4

…VIDIA#8121) Signed-off-by: ixlmar <[email protected]>

subgraph pipeline

4bd6884

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie force-pushed the ll/module_or_graph_in_transform branch from f858382 to 4bd6884 Compare October 6, 2025 14:10

lucaslie closed this Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][feat] AutoDeploy: per graph or whole module transform infrastructure #152

[None][feat] AutoDeploy: per graph or whole module transform infrastructure #152

Uh oh!

lucaslie commented Oct 2, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 2, 2025

Uh oh!

lucaslie Oct 2, 2025

Uh oh!

Fridah-nv Oct 3, 2025

Uh oh!

Fridah-nv Oct 3, 2025

Uh oh!

lucaslie commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

[None][feat] AutoDeploy: per graph or whole module transform infrastructure #152

[None][feat] AutoDeploy: per graph or whole module transform infrastructure #152

Uh oh!

Conversation

lucaslie commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

lucaslie Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Fridah-nv Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Fridah-nv Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

lucaslie commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

lucaslie commented Oct 2, 2025 •

edited

Loading