[bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph by ChenCangtao · Pull Request #6015 · vllm-project/vllm-ascend

ChenCangtao · 2026-01-19T14:09:18Z

What this PR does / why we need it?

When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model.
However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this.
Previously, we manually modified the fx graph, which introduced an abnormality in the model output type.
In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies.

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2c24bc6

Signed-off-by: chencangtao <chencangtao@huawei.com>

github-actions · 2026-01-19T14:09:36Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request addresses a bug where manual modification of the FX graph to ensure a tuple output was causing an incorrect output type at runtime. The changes replace the manual graph surgery with a call to a dedicated function, which is a much cleaner approach. Consequently, the workarounds in the model runner to handle the incorrect output type have been removed. The changes are logical, improve code quality by removing hacks, and should resolve the underlying issue. The code looks good.

Signed-off-by: chencangtao <chencangtao@huawei.com>

github-actions · 2026-01-20T13:38:56Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

# Conflicts: # vllm_ascend/compilation/compiler_interface.py

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits) [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032) [BugFix] fix 3vl dense model load quant weight (vllm-project#6100) [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641) [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145) [CI]Install clang in dokerfile for triton ascend (vllm-project#4409) [Main] Upgrade PTA to 2.9.0 (vllm-project#6112) [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721) [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124) [BugFix]converting pa get_workspace back to capturing (vllm-project#5833) [CI] optimize lint term (vllm-project#5986) [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042) [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015) [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097) [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110) [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758) [bugfix] adapt_remote_request_id (vllm-project#6051) [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143) [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702) [CI] Upgrade CANN to 8.5.0 (vllm-project#6070) Default enable MLAPO (vllm-project#5952) ...

…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>

…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>

…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>

fix the model output type issue caused by manually modify FX graph

192b151

Signed-off-by: chencangtao <chencangtao@huawei.com>

ChenCangtao requested review from MengqingCao and yiz-liu as code owners January 19, 2026 14:09

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

update note

621efe3

Signed-off-by: chencangtao <chencangtao@huawei.com>

wangxiyuan approved these changes Jan 20, 2026

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 20, 2026

wangxiyuan enabled auto-merge (squash) January 20, 2026 08:48

github-actions bot added the merge-conflicts label Jan 20, 2026

Merge branch 'main' into fix-model-output-type-issue

5a9228b

# Conflicts: # vllm_ascend/compilation/compiler_interface.py

github-actions bot removed the merge-conflicts label Jan 21, 2026

chencangtao added 2 commits January 21, 2026 17:35

Merge branch 'main' into fix-model-output-type-issue

1add443

Merge branch 'main' into fix-model-output-type-issue

f20de39

wangxiyuan merged commit 38edfd5 into vllm-project:main Jan 22, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph#6015

[bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph#6015
wangxiyuan merged 5 commits intovllm-project:mainfrom
ChenCangtao:fix-model-output-type-issue

ChenCangtao commented Jan 19, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChenCangtao commented Jan 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChenCangtao commented Jan 19, 2026 •

edited by github-actions bot

Loading