[Main2Main] Upgrade vllm commit to 0102 by wjunLu · Pull Request #5546 · vllm-project/vllm-ascend

wjunLu · 2025-12-31T02:45:30Z

What this PR does / why we need it?

Upgrade vllm commit to 0102

Remove maybe_padded_num_tokens arg in model_runner_v1.py due to [Core] Remove unused num_tokens parameter from _init_model_kwargs vllm#31517
Remove Qwen/Qwen3-0.6B in tests/e2e/multicard/test_aclgraph_capture_replay.py because that Offline data parallel mode will be not supported/useful for dense models

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@7157596

gemini-code-assist

Code Review

This pull request updates the compatible vLLM commit hash for the main branch in the versioning policy documentation. The change is straightforward, but to improve long-term maintainability, I've suggested abstracting the hardcoded commit hash into a substitution variable. This aligns with the project's own documented policy for managing version-specific information in the documentation.

gemini-code-assist · 2025-12-31T02:46:21Z

docs/source/community/versioning_policy.md

 | vLLM Ascend | vLLM         | Python           | Stable CANN | PyTorch/torch_npu  |
 |-------------|--------------|------------------|-------------|--------------------|
-|     main    | 7157596103666ee7ccb7008acee8bff8a8ff1731, v0.13.0 tag | >= 3.10, < 3.12   | 8.3.RC2 | 2.8.0 / 2.8.0 |
+|     main    | ecd49ce7e69a50892be7f9841941ca2d7e3b12ea, v0.13.0 tag | >= 3.10, < 3.12   | 8.3.RC2 | 2.8.0 / 2.8.0 |


For better maintainability, this commit hash should be defined as a substitution variable in docs/source/conf.py. This aligns with the project's documentation policy stated in this same file on line 138: 'To reduce maintenance costs, all branch documentation content should remain consistent, and version differences can be controlled via variables in docs/source/conf.py'.

You would need to add a new key to the myst_substitutions dictionary in docs/source/conf.py, for example:

myst_substitutions = { # ... existing substitutions 'vllm_main_commit': 'ecd49ce7e69a50892be7f9841941ca2d7e3b12ea', }

Then you can use the substitution here as suggested.

Suggested change

| main | ecd49ce7e69a50892be7f9841941ca2d7e3b12ea, v0.13.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 |

| main | {{ vllm_main_commit }}, v0.13.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 |

jianzs · 2025-12-31T02:47:01Z

Can we use GitHub Actions to automatically sync the latest commits in vllm?

wjunLu · 2025-12-31T02:57:57Z

Can we use GitHub Actions to automatically sync the latest commits in vllm?

We have this workflow https://github.com/vllm-project/vllm-ascend/actions/workflows/schedule_test_vllm_main.yaml to automatically sync and verfiy, but there are always some error so that we have to process them manually

github-actions · 2025-12-31T03:42:23Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: wjunLu <wjunlu217@gmail.com>

…efill scenario (vllm-project#3072) By converting the KV cache from ND to NZ format when the decode node receives it, this PR ensures that the KV NZ feature works correctly during the decoding phase in disagg-prefill scenario. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Co-authored-by: ghphotoframe <854746559@qq.com> Co-authored-by: alex101-ops <alex1015718386@gmail.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? Improve the performance of Layerwise Connector, mainly includes the following points: 1. Use event synchronize to replace stream synchronize. 2. Access metaserver when scheduling. 3. Transfer kvcache each Chunk prefill segmentation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By CI. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d --------- Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

… reuse of the workspace in certain scenarios (vllm-project#5522) ### What this PR does / why we need it? In the current process of implementing attention updates, the FIA operator shares a single workspace among different layers within the same computation graph. To enable memory reuse, we adopt the weak_ref_tensor mechanism. However, this approach may lead to precision anomalies in certain scenarios. To address this issue, different layers in the same computation graph are assigned independent workspaces. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: WithHades <244036962@qq.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? Add LongCat-Flash support. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: chuyuelin <923822139@qq.com> Co-authored-by: chuyuelin <chuyuelin1@huawei.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? This PR builds upon PR vllm-project#5011 and aims to further enhance the npu_graph_ex_passes module. Based on prior work, we have added graph optimization support for the add_rms_quant fused operator in scenarios where a bias term is present—ensuring the fusion pattern is correctly registered and matched into the computation graph. For validation, we switched to the Qwen3-235B-A22B-W8A8 model. Benchmark results show that, compared to the unfused baseline, enabling this fusion pass significantly improves inference throughput for W8A8 quantized models. For more details can refer to the RFC:vllm-project#4715 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ``` llm = LLM( model=model, tensor_parallel_size=GPUs_per_dp_rank, enforce_eager=False, enable_expert_parallel=enable_expert_parallel, trust_remote_code=trust_remote_code, gpu_memory_utilization=0.98, max_num_batched_tokens=512, # load_format="dummy", max_model_len=2048, max_num_seqs=16, quantization="ascend", additional_config={ "refresh": True, "enable_npugraph_ex": True }, compilation_config={ "cudagraph_capture_sizes": [8, 16], "cudagraph_mode": "FULL_DECODE_ONLY", }, ) if profile_dir: llm.start_profile() outputs = llm.generate(prompts, sampling_params) if profile_dir: llm.stop_profile() for i, output in enumerate(outputs): if i >= 5: break prompt = output.prompt generated_text = output.outputs[0].text print( f"DP rank {global_dp_rank}, Prompt: {prompt!r}, " f"Generated text: {generated_text!r}" ) ``` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 Signed-off-by: cjian <2318164299@qq.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? Currently, when the MooncakeConnector interacts via ZeroMQ, it throws the following exception upon send/receive failure: **Issue 1:** The currently used `zmq.REQ` socket follows a strict request-reply pattern, requiring an alternating sequence of send → receive → send → receive... If either a send() or receive() operation fails, the ZeroMQ socket becomes unusable. **Solution:** When a send() or receive() exception occurs, close and delete the ZeroMQ socket, and recreate it upon next use. **Issue 2:** In `_handle_request`, if `_send_done_recv_signal` raises an exception, the exception is thrown immediately and subsequent code is not executed, causing the decode logic to fail to properly release the request. **Solution:** Move the call to `_send_done_recv_signal` to the end of the function. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: LCAIZJ <leichao139636@163.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

### What this PR does / why we need it? We should also trigger image build when nightly test related files are changed to ensure the image is valid for nightly tests. Please note that this only applies to image with the tag `main*`(which means build triggered by PR). ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 6. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4 to 7. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

…replay.py Signed-off-by: wjunLu <wjunlu217@gmail.com>

…-W8A8 (vllm-project#5381) ### What this PR does / why we need it? add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and longseq (PCP&DCP) scenario - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: daishixun <dsxsteven@sina.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

vllm-ascend-ci added ready read for review ready-for-test start test by label for PR labels Dec 31, 2025

github-actions bot added documentation Improvements or additions to documentation ci/build labels Dec 31, 2025

wjunLu changed the title ~~[Main2Main] Upgrade vllm commit to 1231~~ [Main2Main] Upgrade vllm commit to 0102 Jan 4, 2026

wjunLu and others added 16 commits January 4, 2026 11:43

Upgrade vllm commit to 1231

d63b2df

Signed-off-by: wjunLu <wjunlu217@gmail.com>

Fix

ab573d5

Signed-off-by: wjunLu <wjunlu217@gmail.com>

Upgrade to 0102

605936a

Signed-off-by: wjunLu <wjunlu217@gmail.com>

Fix args

4af875d

Signed-off-by: wjunLu <wjunlu217@gmail.com>

remove Qwen/Qwen3-0.6B for tests/e2e/multicard/test_aclgraph_capture_…

d04f504

…replay.py Signed-off-by: wjunLu <wjunlu217@gmail.com>

wjunLu force-pushed the main_upgrade branch from d4acb14 to c6a6996 Compare January 4, 2026 03:43

wjunLu closed this Jan 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Main2Main] Upgrade vllm commit to 0102#5546

[Main2Main] Upgrade vllm commit to 0102#5546
wjunLu wants to merge 16 commits intovllm-project:mainfrom
wjunLu:main_upgrade

wjunLu commented Dec 31, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

jianzs commented Dec 31, 2025

Uh oh!

wjunLu commented Dec 31, 2025

Uh oh!

github-actions bot commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

	\| main \| ecd49ce7e69a50892be7f9841941ca2d7e3b12ea, v0.13.0 tag \| >= 3.10, < 3.12 \| 8.3.RC2 \| 2.8.0 / 2.8.0 \|
	\| main \| {{ vllm_main_commit }}, v0.13.0 tag \| >= 3.10, < 3.12 \| 8.3.RC2 \| 2.8.0 / 2.8.0 \|

Conversation

wjunLu commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

jianzs commented Dec 31, 2025

Uh oh!

wjunLu commented Dec 31, 2025

Uh oh!

github-actions bot commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

wjunLu commented Dec 31, 2025 •

edited

Loading