[bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph#6015
Conversation
Signed-off-by: chencangtao <chencangtao@huawei.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where manual modification of the FX graph to ensure a tuple output was causing an incorrect output type at runtime. The changes replace the manual graph surgery with a call to a dedicated function, which is a much cleaner approach. Consequently, the workarounds in the model runner to handle the incorrect output type have been removed. The changes are logical, improve code quality by removing hacks, and should resolve the underlying issue. The code looks good.
Signed-off-by: chencangtao <chencangtao@huawei.com>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
# Conflicts: # vllm_ascend/compilation/compiler_interface.py
…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits) [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032) [BugFix] fix 3vl dense model load quant weight (vllm-project#6100) [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641) [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145) [CI]Install clang in dokerfile for triton ascend (vllm-project#4409) [Main] Upgrade PTA to 2.9.0 (vllm-project#6112) [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721) [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124) [BugFix]converting pa get_workspace back to capturing (vllm-project#5833) [CI] optimize lint term (vllm-project#5986) [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042) [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015) [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097) [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110) [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758) [bugfix] adapt_remote_request_id (vllm-project#6051) [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143) [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702) [CI] Upgrade CANN to 8.5.0 (vllm-project#6070) Default enable MLAPO (vllm-project#5952) ...
…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>
…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>
…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ly modify FX graph (vllm-project#6015) ### What this PR does / why we need it? When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model. However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this. Previously, we manually modified the fx graph, which introduced an abnormality in the model output type. In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>
What this PR does / why we need it?
When using the full_decode_only mode, the vllm framework will still use the torch.fx.passes.split_module.split_module API to process the corresponding GraphModule of the model.
However, the output of this API may cause the output of the fx graph to no longer be a tuple, and torch.compile enforces strict checks on this.
Previously, we manually modified the fx graph, which introduced an abnormality in the model output type.
In this PR, we switched to using PyTorch's native API to modify the FX graph, and removed the code that was previously added to handle output type anomalies.
Does this PR introduce any user-facing change?
No
How was this patch tested?