Skip to content

[compile][cuda_graph]Add sym_size handling by folding them to constant#32960

Closed
fxdawnn wants to merge 2 commits intovllm-project:mainfrom
fxdawnn:fold_sym_size_const
Closed

[compile][cuda_graph]Add sym_size handling by folding them to constant#32960
fxdawnn wants to merge 2 commits intovllm-project:mainfrom
fxdawnn:fold_sym_size_const

Conversation

@fxdawnn
Copy link
Copy Markdown
Contributor

@fxdawnn fxdawnn commented Jan 23, 2026

Purpose

Fix #31043

Test Plan

  • Comparing between the pro of the issues after reverting the previous temporary fix
  • Adding local test

Test Result

Pass

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…t to allow graph transfering

Signed-off-by: Xiao Fu <xiaofu@meta.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 23, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fxdawnn.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

else:
break

fold_sym_size_to_constants(self.graph, concrete_inputs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graph mutation causes wrong constants for subsequent compilations

High Severity

The fold_sym_size_to_constants function mutates the shared self.graph by calling node.replace_all_uses_with(const_value), which replaces all uses of sym_size nodes with constants. After the first single-size compilation, these nodes have zero users (as the test explicitly verifies). Subsequent calls for different sizes find the same sym_size nodes but replace_all_uses_with has no effect since there are no users left to replace. This causes all single-size compilations after the first to use incorrect constant values from the initial compilation.

Additional Locations (1)

Fix in Cursor Fix in Web

@fxdawnn fxdawnn marked this pull request as draft January 23, 2026 19:35
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces functionality to fold symbolic sizes to constants within FX graphs, which is crucial for CUDA graph capture. It includes a new test case to validate this folding and integrates the functionality into the piecewise compilation backend. The changes improve the robustness of CUDA graph capture by ensuring sym_size values are constants, preventing potential address mismatch issues during replay. The PR also optimizes debugging by making input address tracking conditional on the debugging mode.

Comment on lines +157 to +161
concrete_inputs: dict[str, torch.Tensor] = {}
for node in gm.graph.nodes:
if node.op == "placeholder":
concrete_inputs[node.name] = x
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The construction of concrete_inputs only adds the first placeholder's concrete value and then breaks. If the model_fn were to accept multiple input tensors (e.g., def model_fn(x, y): ...), this logic would incorrectly only provide the concrete input for x, potentially leading to fold_sym_size_to_constants failing or behaving unexpectedly for y's symbolic size operations. To ensure robustness for multi-input models, all placeholder nodes should be iterated over and their corresponding concrete inputs added.

Suggested change
concrete_inputs: dict[str, torch.Tensor] = {}
for node in gm.graph.nodes:
if node.op == "placeholder":
concrete_inputs[node.name] = x
break
concrete_inputs: dict[str, torch.Tensor] = {}
for node in gm.graph.nodes:
if node.op == "placeholder":
# Assuming all placeholders should receive the same concrete tensor 'x' for this test.
# If different inputs are needed, this logic would require adjustment.
concrete_inputs[node.name] = x
assert concrete_inputs, "No placeholder found in the graph."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[BugFix]: move torch.Size across graphs in split_graph

1 participant