Skip to content

[Main2Main] Upgrade vllm commit to 0105#5595

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
wjunLu:main0105
Jan 6, 2026
Merged

[Main2Main] Upgrade vllm commit to 0105#5595
wangxiyuan merged 1 commit intovllm-project:mainfrom
wjunLu:main0105

Conversation

@wjunLu
Copy link
Collaborator

@wjunLu wjunLu commented Jan 5, 2026

What this PR does / why we need it?

Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)

  1. Remove maybe_padded_num_tokens arg in model_runner_v1.py since [Core] Remove unused num_tokens parameter from _init_model_kwargs vllm#31517 deleted unused arg

  2. Remove dense Qwen/Qwen3-0.6B in tests/e2e/multicard/test_aclgraph_capture_replay.py and tests/e2e/multicard/test_data_parallel.py due to [BugFix] Support online dense model DP without overhead vllm#30739
    where offline data parallel mode will not be supported/useful for dense models

  3. Adapt vllm_ascend/worker/worker.py due to [BugFix] Fix async scheduling for pooling models vllm#31584

  4. Adapt self.block_size calling due to [Bugfix] Fix block size used in EAGLE slot mapping vllm#31540

  5. Modify test_mla_v1.py due to [Core] Parse vLLM engine required fields from hf_config to model_arch_config vllm#28454 , which refactorred get_head_size()

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades the vllm dependency to a newer commit. The changes primarily consist of adaptations to the updated vllm API, including modifications to function signatures and return types in the worker and model runner components. Additionally, some tests have been updated to reflect changes in model attributes and to improve test robustness. I've identified a potential bug in vllm_ascend/spec_decode/eagle_proposer.py where an incorrect set of layers is used to determine the attention layer name, which could lead to runtime errors. Other changes appear to be correct and necessary for the upgrade.

@wjunLu wjunLu added ready-for-test start test by label for PR ready read for review labels Jan 5, 2026
@github-actions github-actions bot added documentation Improvements or additions to documentation ci/build module:tests labels Jan 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: hfadzxy <starmoon_zhang@163.com>

Signed-off-by: wjunLu <wjunlu217@gmail.com>
@wangxiyuan wangxiyuan merged commit 3cf059a into vllm-project:main Jan 6, 2026
19 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 6, 2026
…to FIA_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (58 commits)
  [Main2Main] Upgrade vllm commit to 0106 (vllm-project#5617)
  [CI]update bisheng version (vllm-project#5621)
  [UT][PCP&DCP] UT for block_table.py (vllm-project#5032)
  [Main2Main] Upgrade vllm commit to 0105 (vllm-project#5595)
  [CI] mv ops to correct path (vllm-project#5615)
  [BugFix] Fix Smoke Testing Bug for DSR1 longseq (vllm-project#5613)
  Revert "[Feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5545)" (vllm-project#5611)
  [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (vllm-project#5267)
  [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (vllm-project#5192)
  [docs] Correct image about prefill phase of PCP (vllm-project#5598)
  [CI] update triton-ascend version (vllm-project#5584)
  [P/D]Remove mooncake kvpool unused parameter `local_hostname` (vllm-project#5574)
  [Bugfix] record cos and sin cache in AscendRotaryEmbedding (vllm-project#5516)
  [bugfix] fix test_camem failed with triton-ascend (vllm-project#5492)
  [UT]add triton ops ut :  test_fused_qkvzba_split_reshape_cat (vllm-project#5474)
  [CI] Download models from ms (vllm-project#5405)
  Docs: Add A3 Docker image guidance for Atlas A3 machines (vllm-project#5256)
  [Doc] Add NNAL installation guide and requirements (vllm-project#5235)
  Add the requirement of arctic-inference which  speculative decoding with suffix_decode  (vllm-project#5045)
  [BugFix][Fusion] Fix graph fusion failure problem (vllm-project#5253)
  ...
Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request Jan 8, 2026
### What this PR does / why we need it?

Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)

1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since
vllm-project/vllm#31517 deleted unused arg

2. Remove dense `Qwen/Qwen3-0.6B` in
`tests/e2e/multicard/test_aclgraph_capture_replay.py` and
`tests/e2e/multicard/test_data_parallel.py` due to
vllm-project/vllm#30739
where offline data parallel mode will not be supported/useful for dense
models

3. Adapt `vllm_ascend/worker/worker.py` due to
vllm-project/vllm#31584

4. Adapt `self.block_size` calling due to
vllm-project/vllm#31540

5. Modify `test_mla_v1.py` due to
vllm-project/vllm#28454 , which refactorred
`get_head_size()`

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: wjunLu <wjunlu217@gmail.com>
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
### What this PR does / why we need it?

Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)

1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since
vllm-project/vllm#31517 deleted unused arg

2. Remove dense `Qwen/Qwen3-0.6B` in
`tests/e2e/multicard/test_aclgraph_capture_replay.py` and
`tests/e2e/multicard/test_data_parallel.py` due to
vllm-project/vllm#30739
where offline data parallel mode will not be supported/useful for dense
models

3. Adapt `vllm_ascend/worker/worker.py` due to
vllm-project/vllm#31584

4. Adapt `self.block_size` calling due to
vllm-project/vllm#31540

5. Modify `test_mla_v1.py` due to
vllm-project/vllm#28454 , which refactorred
`get_head_size()`

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: wjunLu <wjunlu217@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)

1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since
vllm-project/vllm#31517 deleted unused arg

2. Remove dense `Qwen/Qwen3-0.6B` in
`tests/e2e/multicard/test_aclgraph_capture_replay.py` and
`tests/e2e/multicard/test_data_parallel.py` due to
vllm-project/vllm#30739
where offline data parallel mode will not be supported/useful for dense
models

3. Adapt `vllm_ascend/worker/worker.py` due to
vllm-project/vllm#31584

4. Adapt `self.block_size` calling due to
vllm-project/vllm#31540

5. Modify `test_mla_v1.py` due to
vllm-project/vllm#28454 , which refactorred
`get_head_size()`

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?

Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)

1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since
vllm-project/vllm#31517 deleted unused arg

2. Remove dense `Qwen/Qwen3-0.6B` in
`tests/e2e/multicard/test_aclgraph_capture_replay.py` and
`tests/e2e/multicard/test_data_parallel.py` due to
vllm-project/vllm#30739
where offline data parallel mode will not be supported/useful for dense
models

3. Adapt `vllm_ascend/worker/worker.py` due to
vllm-project/vllm#31584

4. Adapt `self.block_size` calling due to
vllm-project/vllm#31540

5. Modify `test_mla_v1.py` due to
vllm-project/vllm#28454 , which refactorred
`get_head_size()`

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: wjunLu <wjunlu217@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)

1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since
vllm-project/vllm#31517 deleted unused arg

2. Remove dense `Qwen/Qwen3-0.6B` in
`tests/e2e/multicard/test_aclgraph_capture_replay.py` and
`tests/e2e/multicard/test_data_parallel.py` due to
vllm-project/vllm#30739
where offline data parallel mode will not be supported/useful for dense
models

3. Adapt `vllm_ascend/worker/worker.py` due to
vllm-project/vllm#31584

4. Adapt `self.block_size` calling due to
vllm-project/vllm#31540

5. Modify `test_mla_v1.py` due to
vllm-project/vllm#28454 , which refactorred
`get_head_size()`

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?

Upgrade vllm commit to 0105 (8be6432bdaf6275664d857b1e5e9bf8ed1ce299e)

1. Remove `maybe_padded_num_tokens` arg in `model_runner_v1.py` since
vllm-project/vllm#31517 deleted unused arg

2. Remove dense `Qwen/Qwen3-0.6B` in
`tests/e2e/multicard/test_aclgraph_capture_replay.py` and
`tests/e2e/multicard/test_data_parallel.py` due to
vllm-project/vllm#30739
where offline data parallel mode will not be supported/useful for dense
models

3. Adapt `vllm_ascend/worker/worker.py` due to
vllm-project/vllm#31584

4. Adapt `self.block_size` calling due to
vllm-project/vllm#31540

5. Modify `test_mla_v1.py` due to
vllm-project/vllm#28454 , which refactorred
`get_head_size()`

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants