-
Notifications
You must be signed in to change notification settings - Fork 1
[None][feat] AutoDeploy: per graph or whole module transform infrastructure #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Introduces per-graph vs whole-module transform infrastructure for AutoDeploy, allowing each transform to be applied either to individual subgraphs or the entire module. This provides more granular control over transformation application and enables better optimization strategies.
- Added
run_per_gmconfiguration field to control transform application scope - Refactored transform interface to support both per-graph and whole-module operations
- Migrated appropriate existing transforms to operate on whole modules
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
tensorrt_llm/_torch/auto_deploy/transform/interface.py |
Core infrastructure changes: added run_per_gm config, refactored transform application logic, and updated method signatures |
tensorrt_llm/_torch/auto_deploy/transform/optimizer.py |
Updated optimizer to work with nn.Module instead of GraphModule |
tensorrt_llm/_torch/auto_deploy/transform/library/ |
Migrated transform implementations to use _apply_to_full_model method and nn.Module parameters |
tensorrt_llm/_torch/auto_deploy/transformations/_graph.py |
Updated graph utility functions to accept nn.Module instead of GraphModule |
tensorrt_llm/_torch/auto_deploy/custom_ops/ |
Removed unused input_ids parameter from prepare_metadata functions |
tensorrt_llm/_torch/auto_deploy/config/ |
Updated configuration files to set run_per_gm: false for appropriate transforms |
tests/unittest/_torch/auto_deploy/ |
Updated test configurations and removed unused variables |
Comments suppressed due to low confidence (1)
tensorrt_llm/_torch/auto_deploy/transform/interface.py:1
- The comparison
subgm is modwill always be False sincesubgmis a GraphModule fromnamed_graphmodules(mod)butmodis an nn.Module. This should compare against the root GraphModule if that's the intended logic.
"""The interface for all transforms.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| @prepare_fused_mla_metadata.register_fake | ||
| def prepare_fused_mla_metadata_fake( | ||
| input_ids, position_ids, seq_len, input_pos, cache_loc, pages_per_seq, page_size | ||
| position_ids, seq_len, input_pos, cache_loc, pages_per_seq, slot_idx, page_size |
Copilot
AI
Oct 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing slot_idx parameter in the original function signature. The fake function signature should match the real function signature exactly.
| # NOTE: for now we do _not_ include input_ids since we are not guaranteed that input_ids | ||
| # is part of the graph, e.g., in situations where the graph is a submodule of the overall | ||
| # model. In such instances, the graph usually sees inputs_embeds. However, we assume for | ||
| # now that position_ids is always part of the graph. | ||
| return ("position_ids",) + self._cached_arg_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: the reason why I am relying on position_ids instead of input_ids from now for some of these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to assume position_ids is always part of the input? In other words, why do we need to process it differently than other args like input_ids/inputs_embeds? Is it just for convenience?
| stage: pattern_matcher | ||
| match_eager_attention: | ||
| stage: pattern_matcher | ||
| requires_shape_prop: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why match_eager_attention requires shape propagation now?
NVIDIA#6806) Signed-off-by: Michal Guzek <[email protected]>
1dbb9fc to
0aed9b8
Compare
Signed-off-by: Lucas Liebenwein <[email protected]>
…rgs (NVIDIA#8137) Signed-off-by: Lucas Liebenwein <[email protected]>
Signed-off-by: Erin Ho <[email protected]> Co-authored-by: Yuan Tong <[email protected]> Co-authored-by: Erin Ho <[email protected]>
Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>
…pattern matcher utility; remove fuse_collective (NVIDIA#7545) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>
…DIA#5543) Signed-off-by: Yan Chunwei <[email protected]> Signed-off-by: chunweiy <[email protected]> Signed-off-by: Superjomn <[email protected]> Signed-off-by: chunweiy <[email protected]>
…_multi_lora, fix its API use with pytorch flow LoRA (NVIDIA#8146) Signed-off-by: Amit Zuker <[email protected]>
Signed-off-by: Patrice Castonguay <[email protected]>
Signed-off-by: Yan Chunwei <[email protected]>
…VIDIA#8121) Signed-off-by: ixlmar <[email protected]>
Signed-off-by: Lucas Liebenwein <[email protected]>
f858382 to
4bd6884
Compare
|
see NVIDIA#8157 |
will be transitioned to main repo after NVIDIA#8126 and #151 get merged