[compile][graph_partition] Remove unused subgraph inputs after split_module#35251
[compile][graph_partition] Remove unused subgraph inputs after split_module#35251fxdawnn wants to merge 30 commits intovllm-project:mainfrom
Conversation
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Xiao Fu <xiaofu@meta.com>
Signed-off-by: Xiao Fu <xiaofu@meta.com>
This reverts commit a409cf4.
Signed-off-by: Xiao Fu <xiaofu@meta.com>
This reverts commit a409cf4.
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Xiao Fu <xiaofu@meta.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Xiao <31429901+fxdawnn@users.noreply.github.com>
Signed-off-by: Xiao Fu <xiaofu@meta.com>
Signed-off-by: Xiao Fu <xiaofu@meta.com>
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Xiao Fu <xiaofu@meta.com>
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Xiao Fu <xiaofu@meta.com>
Signed-off-by: Xiao Fu <xiaofu@meta.com>
Signed-off-by: Xiao Fu <xiaofu@meta.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Xiao <31429901+fxdawnn@users.noreply.github.com>
Signed-off-by: Xiao Fu <xiaofu@meta.com>
Signed-off-by: Xiao Fu <xiaofu@meta.com>
…nsor_size Signed-off-by: Xiao Fu <xiaofu@meta.com>
Signed-off-by: Xiao Fu <xiaofu@meta.com>
|
|
There was a problem hiding this comment.
Code Review
This pull request introduces a graph pass to move torch.ops.aten.sym_size.int operations to be immediately after their tensor operand. This is an optimization to ensure that when the graph is split, the sym_size operation is in the producer subgraph, avoiding the need to pass the full tensor to consumer subgraphs just for a size lookup. The change is implemented in vllm/compilation/backends.py and is accompanied by new tests.
My review found a couple of issues in the new tests: an unused pytest fixture and a verification loop that has no effect. I've left specific comments on these.
It's also worth noting that the PR title and description seem to describe a different feature - removing unused inputs from subgraphs after splitting. The implemented code is about repositioning sym_size nodes before splitting. You may want to update the PR description to reflect the actual changes.
| @pytest.fixture | ||
| def vllm_compile_env(monkeypatch): | ||
| """Set up vLLM compilation environment variables for testing.""" | ||
| monkeypatch.setenv("VLLM_ALL2ALL_BACKEND", "deepep_high_throughput") | ||
| monkeypatch.setenv("VLLM_USE_DEEP_GEMM", "1") | ||
| monkeypatch.setenv("VLLM_LOGGING_LEVEL", "debug") | ||
| yield |
| for ph in view_placeholders: | ||
| ev = ph.meta.get("example_value") | ||
| if isinstance(ev, torch.Tensor) and ev.shape == x.shape: | ||
| # This placeholder matches x's shape — it should be y or z, | ||
| # not x itself being passed just for .size() | ||
| pass # Allow tensors that are actually used for computation |
There was a problem hiding this comment.
This loop for verifying that the original tensor x is not passed to the consumer subgraph is currently ineffective as it only contains a pass statement and performs no assertions. This can be misleading as it looks like a verification is being done. The primary assertion assert len(sym_size_in_view_subgraph) == 0 already covers the main goal of this test. I recommend removing this loop to avoid confusion, unless a reliable assertion can be added.
Signed-off-by: Xiao Fu <xiaofu@meta.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Follow-up of #32747
Summary
split_modulethreads values through all subgraphs in the chain between producer and consumer, even subgraphs that don't reference them. For example, a SymInt value needed by a compute subgraph gets threaded through intermediate sigmoid subgraphs that never use it. These unused inputs add noise to subgraph signatures.This PR adds a cleanup pass in
split_graphthat strips unused placeholder inputs from submodules and their corresponding args fromcall_modulenodes in the parent graph.Before (without cleanup):
After (with cleanup):
Change
The cleanup is merged into the existing output-building loop in
split_graph(+14 net lines inbackends.py):For each submodule, it finds placeholders with no users, erases them, and removes the corresponding positional args from the parent's
call_modulenode to keep signatures in sync.Test plan
test_unused_subgraph_inputs_removed— single split withmake_fxsymbolic tracing, verifies every subgraph placeholder has at least one user after cleanuptest_unused_symint_inputs_removed_multi_split— multi split withtorch.compile+mark_dynamic, verifies sigmoid subgraphs don't receive SymInt inputs they don't use (depends on sym_size producer PR)Graph Outlook
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.