[compile][cuda_graph]Add sym_size handling by folding them to constant#32960
[compile][cuda_graph]Add sym_size handling by folding them to constant#32960fxdawnn wants to merge 2 commits intovllm-project:mainfrom
Conversation
This reverts commit a409cf4.
…t to allow graph transfering Signed-off-by: Xiao Fu <xiaofu@meta.com>
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
| else: | ||
| break | ||
|
|
||
| fold_sym_size_to_constants(self.graph, concrete_inputs) |
There was a problem hiding this comment.
Graph mutation causes wrong constants for subsequent compilations
High Severity
The fold_sym_size_to_constants function mutates the shared self.graph by calling node.replace_all_uses_with(const_value), which replaces all uses of sym_size nodes with constants. After the first single-size compilation, these nodes have zero users (as the test explicitly verifies). Subsequent calls for different sizes find the same sym_size nodes but replace_all_uses_with has no effect since there are no users left to replace. This causes all single-size compilations after the first to use incorrect constant values from the initial compilation.
Additional Locations (1)
There was a problem hiding this comment.
Code Review
The pull request introduces functionality to fold symbolic sizes to constants within FX graphs, which is crucial for CUDA graph capture. It includes a new test case to validate this folding and integrates the functionality into the piecewise compilation backend. The changes improve the robustness of CUDA graph capture by ensuring sym_size values are constants, preventing potential address mismatch issues during replay. The PR also optimizes debugging by making input address tracking conditional on the debugging mode.
| concrete_inputs: dict[str, torch.Tensor] = {} | ||
| for node in gm.graph.nodes: | ||
| if node.op == "placeholder": | ||
| concrete_inputs[node.name] = x | ||
| break |
There was a problem hiding this comment.
The construction of concrete_inputs only adds the first placeholder's concrete value and then breaks. If the model_fn were to accept multiple input tensors (e.g., def model_fn(x, y): ...), this logic would incorrectly only provide the concrete input for x, potentially leading to fold_sym_size_to_constants failing or behaving unexpectedly for y's symbolic size operations. To ensure robustness for multi-input models, all placeholder nodes should be iterated over and their corresponding concrete inputs added.
| concrete_inputs: dict[str, torch.Tensor] = {} | |
| for node in gm.graph.nodes: | |
| if node.op == "placeholder": | |
| concrete_inputs[node.name] = x | |
| break | |
| concrete_inputs: dict[str, torch.Tensor] = {} | |
| for node in gm.graph.nodes: | |
| if node.op == "placeholder": | |
| # Assuming all placeholders should receive the same concrete tensor 'x' for this test. | |
| # If different inputs are needed, this logic would require adjustment. | |
| concrete_inputs[node.name] = x | |
| assert concrete_inputs, "No placeholder found in the graph." |
Purpose
Fix #31043
Test Plan
Test Result
Pass
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.