[main2main] upgrade vllm main 0202 by Meihan-chen · Pull Request #6560 · vllm-project/vllm-ascend

Meihan-chen · 2026-02-05T07:35:56Z

What this PR does / why we need it?

Fix TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel' due to [MoE Refactor] Integrate Naive Prepare Finalize into MK vllm#32567
Fix TypeError: '>' not supported between instances of 'MagicMock' and 'int' due to feature: support eagle3 for HunyuanVL & Hunyuan vllm#33035
Fix TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa due to [Attention] Move MLA forward from backend to layer vllm#33284
AttributeError: 'AscendMLAImpl' object has no attribute 'W_UK_T' and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to [5/N][Attention] Finish eliminating vllm/attention folder vllm#32064
Fix 'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'due to [MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) vllm#32790
Fix NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras' due to Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. vllm#32005
Fix the problem caused by 'tuple' object has no attribute 'job_id' due to [Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next vllm#27492
Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to [torch.compile] Speed up MOE handling in forward_context vllm#33184
Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to [Frontend] Use new Renderer for Completions and Tokenize API vllm#32863

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.15.0
vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

…_special_tokens' Signed-off-by: hfadzxy <starmoon_zhang@163.com>

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

github-actions · 2026-02-05T07:36:14Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist · 2026-02-05T07:36:38Z

Summary of Changes

Hello @Meihan-chen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on adapting the vllm-ascend project to maintain compatibility with evolving vLLM upstream versions, specifically targeting changes introduced in v0.15.0. It involves widespread adjustments to API usage, conditional logic for different vLLM versions, and specific fixes for model behaviors, including a new patch for the HunYuanVL model. The changes aim to ensure the stability and correctness of the vllm-ascend implementation across various vLLM environments.

Highlights

vLLM Version Compatibility: Extensive modifications have been implemented across various files to ensure compatibility with different versions of the upstream vLLM library, particularly addressing API changes and class relocations in v0.15.0. This is achieved through conditional imports and logic based on the vllm_version_is utility.
Speculative Decoding Enhancements: Updates to speculative decoding tests and core logic in eagle_proposer.py and mtp_proposer.py include the addition of uses_xdrope_dim and data_parallel_rank to VllmConfig and conditional resetting of moe_layer_index to maintain correct behavior across vLLM versions.
Attention Mechanism Refinements: Explicit NotImplementedError methods (forward_mha, forward_mqa) have been added to MLA and SFA attention implementations, clarifying that these specific forward paths are not supported. Additionally, a wrapper for process_weights_after_loading was introduced in MLA attention for improved weight processing.
HunYuanVL Model Patch: A new patch file (patch_huanyuan_vl.py) has been added to modify the HunYuanVLProcessor to remove the add_special_tokens requirement, improving support for the HunYuanVL model.
Test Infrastructure Improvements: Unit tests for MLA attention now use more realistic mock objects, and the vl_config fixture in end-to-end tests supports conditional skipping based on configuration, enhancing testing flexibility.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/source/community/versioning_policy.md
- Updated the vLLM version reference in the versioning policy table to include a specific commit hash for clarity.
tests/e2e/conftest.py
- Modified the vl_config pytest fixture to allow conditional skipping of tests based on a 'skip' key in the prompt configuration.
tests/ut/eplb/core/test_eplb_utils.py
- Imported vllm_version_is for version-dependent logic.
- Added conditional initialization for FusedMoEParallelConfig to include is_sequence_parallel=False for vLLM versions older than v0.15.0.
tests/ut/ops/test_mla.py
- Updated MLAAttention mock objects in test_initialization and test_forward to include process_weights_after_loading and impl.process_weights_after_loading methods, making mocks more accurate.
tests/ut/quantization/test_modelslim_config.py
- Implemented version-conditional import for the Attention class, switching between vllm.attention.layer and vllm.model_executor.layers.attention based on vLLM version.
tests/ut/spec_decode/test_eagle_proposer.py
- Added uses_xdrope_dim = 0 to model_config and draft_model_config in VllmConfig setup for speculative decoding tests.
- Added data_parallel_rank = 0 to parallel_config in VllmConfig setup for speculative decoding tests.
tests/ut/spec_decode/test_mtp_proposer.py
- Added uses_xdrope_dim = 0 to model_config and draft_model_config in VllmConfig setup for speculative decoding tests.
- Added data_parallel_rank = 0 to parallel_config in VllmConfig setup for speculative decoding tests.
vllm_ascend/attention/mla_v1.py
- Added forward_mha and forward_mqa methods that raise NotImplementedError, indicating these are not directly supported for MLA attention.
vllm_ascend/attention/sfa_v1.py
- Added forward_mha and forward_mqa methods that raise NotImplementedError, indicating these are not directly supported for SFA attention.
vllm_ascend/compilation/npugraph_ex_passes/graphex_qknorm_rope_fusion_pass.py
- Updated the import of the Attention class to be conditional on the vLLM version.
vllm_ascend/compilation/passes/qknorm_rope_fusion_pass.py
- Updated the import of the Attention class to be conditional on the vLLM version.
vllm_ascend/distributed/kv_transfer/kv_pool/cpu_offload/cpu_offload_connector.py
- Updated imports for Attention and MLAAttention to be conditional on the vLLM version.
vllm_ascend/kv_offload/cpu_npu.py
- Imported vllm_version_is to enable version-specific logic.
- Modified the get_finished method to handle TransferResult differently based on the vLLM version (v0.15.0 vs. others).
vllm_ascend/ops/mla.py
- Updated the import of MLAAttention to be conditional on the vLLM version.
- Wrapped mla_attn.process_weights_after_loading to ensure impl.process_weights_after_loading is called, with a check for AscendSFAImpl.
vllm_ascend/patch/worker/init.py
- Added an import for the new patch_huanyuan_vl module.
vllm_ascend/patch/worker/patch_huanyuan_vl.py
- New file added to patch HunYuanVLProcessor.__call__ to remove the add_special_tokens argument.
vllm_ascend/patch/worker/patch_qwen3_next_mtp.py
- Updated the import of the Attention class to be conditional on the vLLM version.
vllm_ascend/quantization/modelslim_config.py
- Updated the import of the Attention class to be conditional on the vLLM version.
vllm_ascend/spec_decode/eagle_proposer.py
- Imported vllm_version_is for version-dependent logic.
- Added conditional resetting of forward_context.moe_layer_index to 0 for vLLM versions older than v0.15.0 during speculative decoding runs.
vllm_ascend/spec_decode/mtp_proposer.py
- Imported vllm_version_is for version-dependent logic.
- Added conditional resetting of forward_context.moe_layer_index to 0 for vLLM versions older than v0.15.0 during MTP speculative decoding runs.
vllm_ascend/worker/model_runner_v1.py
- Updated imports for Attention and MLAAttention to be conditional on the vLLM version.
- Added num_active_loras: int = 0 as a new parameter to the _dummy_run method signature.

Ignored Files

Ignored by pattern: .github/workflows/** (6)
- .github/workflows/_pre_commit.yml
- .github/workflows/bot_pr_create.yaml
- .github/workflows/dockerfiles/Dockerfile.lint
- .github/workflows/pr_test_full.yaml
- .github/workflows/pr_test_light.yaml
- .github/workflows/schedule_codecov_refresh.yaml

Activity

No specific review comments or activity have been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces several changes to ensure compatibility with vLLM v0.15.0 and a newer commit from the vLLM main branch. The modifications primarily involve conditional imports, version-specific logic for API changes, and updates to tests. While the changes are necessary and look correct, I've identified an opportunity to improve maintainability by refactoring duplicated code.

As per the repository's style guide, the pull request title and summary should be updated for clarity. Here are my suggestions:

Suggested PR Title:

[Misc][BugFix] Fix compatibility with vLLM v0.15.0

Suggested PR Summary:

### What this PR does / why we need it?
This PR introduces several changes to ensure compatibility with vLLM v0.15.0 and a recent main branch commit (`d7e17aaacd5...`). The changes include:
- Conditional imports for modules that have been moved in vLLM.
- Version-specific logic to handle API changes in `FusedMoEParallelConfig` and `TransferResult`.
- Updates to unit tests to mock new attributes and satisfy new interfaces.
- Addition of `forward_mha` and `forward_mqa` methods to attention implementations to conform to the new interface.
- A patch for `HunYuanVLProcessor` to handle changes in its `__call__` method signature.
- Resetting `moe_layer_index` for compatibility with newer vLLM versions.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test. Tested with vLLM v0.15.0.

gemini-code-assist · 2026-02-05T07:39:55Z

+            if not vllm_version_is("v0.15.0"):
+                # Reset MOE layer index before first model call
+                forward_context = get_forward_context()
+                if forward_context is not None:
+                    forward_context.moe_layer_index = 0


This block of code to reset moe_layer_index is duplicated in multiple places within this file (in _propose and _run_merged_draft) and also in vllm_ascend/spec_decode/mtp_proposer.py. To improve maintainability and reduce redundancy, consider extracting this logic into a helper function. For example:

def _reset_moe_layer_index_if_needed(): if not vllm_version_is("v0.15.0"): forward_context = get_forward_context() if forward_context is not None: forward_context.moe_layer_index = 0

This would make the code cleaner and easier to maintain in the future.

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (59 commits) [Feat.]: 310p support MOE models (vllm-project#6530) [Doc] backport 0.13.0 release note (vllm-project#6584) [CI] Update UT CANN version to 8.5.0 for main branch (vllm-project#6564) [CI] Change A2 runner (vllm-project#6557) [Bugfix] Fix the incorrect use of the output parameter in _forward_fia_slidingwindow (vllm-project#6469) [main2main] upgrade vllm main 0202 (vllm-project#6560) [CI][npugraph_ex]Fix npugraph ex e2e test (vllm-project#6553) [Feature]KV pool supports sparse attention (vllm-project#6339) [bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (vllm-project#6491) perf: adaptive block size selection in linear_persistent kernel (vllm-project#6537) [ModelRunner][Fix] Pads query_start_loc to satisfy FIA/TND constraint (vllm-project#6475) [Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide (vllm-project#6126) [Fusion] Add rmsnorm dynamic quant fusion pass (vllm-project#6274) [Bugfix] Synchronize only the current stream to avoid device sync (vllm-project#6432) [CI] Add long and short prompt tests for DeepSeek-V3.2 (vllm-project#6499) [Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (vllm-project#6442) [bugfix][npugraph_ex]duplicate pattern issue (vllm-project#6513) [bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass (vllm-project#6430) [Quant] GLM4.7-Flash Support W8A8 (vllm-project#6492) [Nightly][BugFix] Remove kv_cache nz test case for test_mla_preprocess_nq.py (vllm-project#6505) ...

### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

wangxiyuan and others added 9 commits February 5, 2026 15:32

[Main2Main] Upgrade to newest vLLM

5136b7a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

fix import error

7a6ee8b

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

fix ut

9978421

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

[Main2Main] Upgrade to newest vLLM 0228

e2706b9

Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

fix AscendMLAImpl Can't instantiate

d23e410

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

update vllm to 0205

8104ca2

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

fix unexpected num_active_loras and skip hunyuan-vl

87ca18f

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

add hunyuan_vl patch to fix multiple values for keyword argument 'add…

46365b0

…_special_tokens' Signed-off-by: hfadzxy <starmoon_zhang@163.com>

revert skip npugraph_ex

d56c254

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

Meihan-chen requested review from LCAIZJ, MengqingCao, Yikun, nalinaly, realliujiaxu, wangxiyuan, weijinqian0, whx-sjtu, yiz-liu and zzzzwwjj as code owners February 5, 2026 07:35

github-actions bot added documentation Improvements or additions to documentation ci/build module:tests module:ops module:quantization labels Feb 5, 2026

vllm-ascend-ci added ready read for review ready-for-test start test by label for PR labels Feb 5, 2026

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

fix no attribute routed_input_transform

19691ba

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>

vllm-ascend-ci changed the title ~~Main0202~~ [main2main] upgrade vllm main 0202 Feb 5, 2026

wangxiyuan merged commit 922e5c1 into vllm-project:main Feb 5, 2026
26 checks passed

This was referenced Feb 9, 2026

[Main2Main] Upgrade to newest vLLM 0205 #6511

Closed

[bugfix]Fix no attribute 'data' when MLAPO is enable #6601

Merged

wangxiyuan mentioned this pull request Feb 24, 2026

[Misc]: test #6787

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[main2main] upgrade vllm main 0202#6560

[main2main] upgrade vllm main 0202#6560
wangxiyuan merged 10 commits intovllm-project:mainfrom
Meihan-chen:main0202

Meihan-chen commented Feb 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Meihan-chen commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Meihan-chen commented Feb 5, 2026 •

edited

Loading