Skip to content

Conversation

@lucaslie
Copy link
Collaborator

@lucaslie lucaslie commented Oct 2, 2025

will be transitioned to main repo after NVIDIA#8126 and #151 get merged

  • This PR adds infrastructure to have each transform be either applied over the nn.Module or each individual subgraph
  • For existing transforms, I have transitioned some of them to be over the whole module as appropriate

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Introduces per-graph vs whole-module transform infrastructure for AutoDeploy, allowing each transform to be applied either to individual subgraphs or the entire module. This provides more granular control over transformation application and enables better optimization strategies.

  • Added run_per_gm configuration field to control transform application scope
  • Refactored transform interface to support both per-graph and whole-module operations
  • Migrated appropriate existing transforms to operate on whole modules

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tensorrt_llm/_torch/auto_deploy/transform/interface.py Core infrastructure changes: added run_per_gm config, refactored transform application logic, and updated method signatures
tensorrt_llm/_torch/auto_deploy/transform/optimizer.py Updated optimizer to work with nn.Module instead of GraphModule
tensorrt_llm/_torch/auto_deploy/transform/library/ Migrated transform implementations to use _apply_to_full_model method and nn.Module parameters
tensorrt_llm/_torch/auto_deploy/transformations/_graph.py Updated graph utility functions to accept nn.Module instead of GraphModule
tensorrt_llm/_torch/auto_deploy/custom_ops/ Removed unused input_ids parameter from prepare_metadata functions
tensorrt_llm/_torch/auto_deploy/config/ Updated configuration files to set run_per_gm: false for appropriate transforms
tests/unittest/_torch/auto_deploy/ Updated test configurations and removed unused variables
Comments suppressed due to low confidence (1)

tensorrt_llm/_torch/auto_deploy/transform/interface.py:1

  • The comparison subgm is mod will always be False since subgm is a GraphModule from named_graphmodules(mod) but mod is an nn.Module. This should compare against the root GraphModule if that's the intended logic.
"""The interface for all transforms.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@prepare_fused_mla_metadata.register_fake
def prepare_fused_mla_metadata_fake(
input_ids, position_ids, seq_len, input_pos, cache_loc, pages_per_seq, page_size
position_ids, seq_len, input_pos, cache_loc, pages_per_seq, slot_idx, page_size
Copy link

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing slot_idx parameter in the original function signature. The fake function signature should match the real function signature exactly.

Copilot uses AI. Check for mistakes.
Comment on lines +306 to +332
# NOTE: for now we do _not_ include input_ids since we are not guaranteed that input_ids
# is part of the graph, e.g., in situations where the graph is a submodule of the overall
# model. In such instances, the graph usually sees inputs_embeds. However, we assume for
# now that position_ids is always part of the graph.
return ("position_ids",) + self._cached_arg_names
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: the reason why I am relying on position_ids instead of input_ids from now for some of these changes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to assume position_ids is always part of the input? In other words, why do we need to process it differently than other args like input_ids/inputs_embeds? Is it just for convenience?

stage: pattern_matcher
match_eager_attention:
stage: pattern_matcher
requires_shape_prop: true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why match_eager_attention requires shape propagation now?

lucaslie and others added 11 commits October 3, 2025 15:39
…pattern matcher utility; remove fuse_collective (NVIDIA#7545)

Signed-off-by: Frida Hou <[email protected]>
Signed-off-by: Fridah-nv <[email protected]>
…DIA#5543)

Signed-off-by: Yan Chunwei <[email protected]>
Signed-off-by: chunweiy <[email protected]>
Signed-off-by: Superjomn <[email protected]>
Signed-off-by: chunweiy <[email protected]>
…_multi_lora, fix its API use with pytorch flow LoRA (NVIDIA#8146)

Signed-off-by: Amit Zuker <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
@lucaslie lucaslie force-pushed the ll/module_or_graph_in_transform branch from f858382 to 4bd6884 Compare October 6, 2025 14:10
@lucaslie
Copy link
Collaborator Author

lucaslie commented Oct 6, 2025

see NVIDIA#8157

@lucaslie lucaslie closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants