[Bugfix] Eliminate tuple inputs to submodules in graph partitioning by gmagogsfm · Pull Request #28533 · vllm-project/vllm

gmagogsfm · 2025-11-12T08:35:34Z

Move getitem operations to the same subgraph as their input to prevent submodules from receiving tuple arguments.

This is to fix issues like #24915 where split_graph partitioned graph at a node that produces a tuple, which is then passed to subsequent submodules as input. This unfortunately does not conform to the calling convention of AoT Autograd modules, which requires flattened arguments.

Signed-off-by: Yanan Cao gmagogsfm@gmail.com

gemini-code-assist

Code Review

This pull request effectively addresses the issue of tuple inputs to submodules in graph partitioning by moving getitem operations to the producer's subgraph. The changes in vllm/compilation/backends.py correctly implement this logic, and the new test file tests/compile/test_graph_partition.py provides good coverage for both single and multiple consumer scenarios, ensuring the fix works as intended. The addition of the test to the .buildkite/test-pipeline.yaml ensures continuous validation. The overall change improves the robustness of the graph partitioning logic for AoT Autograd modules. No critical or high-severity issues were found.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/compilation/backends.py

mergify · 2025-11-12T16:55:27Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @gmagogsfm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Move getitem operations to the same subgraph as their input to prevent submodules from receiving tuple arguments. This is done by reassigning getitem nodes during partition assignment before calling split_module. PyTorch AoT compile expects graphs that conform to AoTAutograd spec, which prohibits tuple-type inputs to (sub)modules. Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

ProExpertProg · 2025-11-12T18:27:59Z

vllm/compilation/backends.py

+            if input_node in node_to_subgraph_id:
+                node_to_subgraph_id[node] = node_to_subgraph_id[input_node]
+                continue


This should always be true, right?

Suggested change

if input_node in node_to_subgraph_id:

node_to_subgraph_id[node] = node_to_subgraph_id[input_node]

continue

assert input_node in node_to_subgraph_id

node_to_subgraph_id[node] = node_to_subgraph_id[input_node]

continue

I think placeholder nodes would not be in node_to_subgraph_id map.

hmm I think the claim is that we should never be getitem on a placeholder node or an output node so we can just assert that the input_node is in node_to_subgraph id?

Because Dynamo produces a graph where all placeholders should be Tensors or symints or scriptobjects

So I agree with Luka's suggestion

I see, changed.

@zou3519 @ProExpertProg I had to relax the assert to only when producer node is not a placeholder, because getitem is also used to fetch item/slice from a tensor as well. In that case, there can be legitimate getitem calls on placeholders. This is seen in qwen2.

PTAL.

oh no, this is dynamo IR we're talking about... let me think about this

Do we have a good way of distinguishing between getitem calls on Tensors and getitem calls on Tuples?

Checking type of node.meta['val'] is the way. However I don't think there is a strong guarantee about its presence.

I think the current assert is strict and correct because it is by design that all placeholder nodes are not in node_to_subgraph_id, and this check asserts exactly that.

OK, I tested with node.meta['val']. It is not always available, so we can't rely on it to tell us if something is a tuple or tensor.

The next best thing we can do is what is currently implemented, i.e. checking input_node.op == "placeholder" and relying on Dynamo doing the right thing, which is currently the case according to CI results.

I am fine with loosening the assert. I ran into this a while ago with something in standard torch.compile

tests/compile/test_graph_partition.py

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

vllm/compilation/backends.py

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

zou3519

code reasonable, I think Luka's comment makes sense

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

…llm-project#28533) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

gmagogsfm requested review from ProExpertProg, youkaichao and zou3519 as code owners November 12, 2025 08:35

mergify bot added the ci/build label Nov 12, 2025

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 12, 2025

View reviewed changes

vllm/compilation/backends.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Nov 12, 2025

gmagogsfm force-pushed the cache_key branch from 5a55fb1 to d78debd Compare November 12, 2025 16:58

gmagogsfm added 2 commits November 12, 2025 09:27

fix lint

4dffe34

Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>

gmagogsfm force-pushed the cache_key branch from d78debd to 4dffe34 Compare November 12, 2025 17:28

mergify bot removed the needs-rebase label Nov 12, 2025

ProExpertProg reviewed Nov 12, 2025

View reviewed changes