Skip to content

[main2main] upgrade vllm main 0202#6560

Merged
wangxiyuan merged 10 commits intovllm-project:mainfrom
Meihan-chen:main0202
Feb 5, 2026
Merged

[main2main] upgrade vllm main 0202#6560
wangxiyuan merged 10 commits intovllm-project:mainfrom
Meihan-chen:main0202

Conversation

@Meihan-chen
Copy link
Copy Markdown
Contributor

@Meihan-chen Meihan-chen commented Feb 5, 2026

What this PR does / why we need it?

  1. Fix TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel' due to [MoE Refactor] Integrate Naive Prepare Finalize into MK vllm#32567
  2. Fix TypeError: '>' not supported between instances of 'MagicMock' and 'int' due to feature: support eagle3 for HunyuanVL & Hunyuan vllm#33035
  3. Fix TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa due to [Attention] Move MLA forward from backend to layer vllm#33284
  4. AttributeError: 'AscendMLAImpl' object has no attribute 'W_UK_T' and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to [5/N][Attention] Finish eliminating vllm/attention folder vllm#32064
  5. Fix 'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'due to [MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) vllm#32790
  6. Fix NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras' due to Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. vllm#32005
  7. Fix the problem caused by 'tuple' object has no attribute 'job_id' due to [Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next vllm#27492
  8. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to [torch.compile] Speed up MOE handling in forward_context vllm#33184
  9. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to [Frontend] Use new Renderer for Completions and Tokenize API vllm#32863

Does this PR introduce any user-facing change?

How was this patch tested?

wangxiyuan and others added 9 commits February 5, 2026 15:32
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
…_special_tokens'

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 5, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Meihan-chen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on adapting the vllm-ascend project to maintain compatibility with evolving vLLM upstream versions, specifically targeting changes introduced in v0.15.0. It involves widespread adjustments to API usage, conditional logic for different vLLM versions, and specific fixes for model behaviors, including a new patch for the HunYuanVL model. The changes aim to ensure the stability and correctness of the vllm-ascend implementation across various vLLM environments.

Highlights

  • vLLM Version Compatibility: Extensive modifications have been implemented across various files to ensure compatibility with different versions of the upstream vLLM library, particularly addressing API changes and class relocations in v0.15.0. This is achieved through conditional imports and logic based on the vllm_version_is utility.
  • Speculative Decoding Enhancements: Updates to speculative decoding tests and core logic in eagle_proposer.py and mtp_proposer.py include the addition of uses_xdrope_dim and data_parallel_rank to VllmConfig and conditional resetting of moe_layer_index to maintain correct behavior across vLLM versions.
  • Attention Mechanism Refinements: Explicit NotImplementedError methods (forward_mha, forward_mqa) have been added to MLA and SFA attention implementations, clarifying that these specific forward paths are not supported. Additionally, a wrapper for process_weights_after_loading was introduced in MLA attention for improved weight processing.
  • HunYuanVL Model Patch: A new patch file (patch_huanyuan_vl.py) has been added to modify the HunYuanVLProcessor to remove the add_special_tokens requirement, improving support for the HunYuanVL model.
  • Test Infrastructure Improvements: Unit tests for MLA attention now use more realistic mock objects, and the vl_config fixture in end-to-end tests supports conditional skipping based on configuration, enhancing testing flexibility.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/source/community/versioning_policy.md
    • Updated the vLLM version reference in the versioning policy table to include a specific commit hash for clarity.
  • tests/e2e/conftest.py
    • Modified the vl_config pytest fixture to allow conditional skipping of tests based on a 'skip' key in the prompt configuration.
  • tests/ut/eplb/core/test_eplb_utils.py
    • Imported vllm_version_is for version-dependent logic.
    • Added conditional initialization for FusedMoEParallelConfig to include is_sequence_parallel=False for vLLM versions older than v0.15.0.
  • tests/ut/ops/test_mla.py
    • Updated MLAAttention mock objects in test_initialization and test_forward to include process_weights_after_loading and impl.process_weights_after_loading methods, making mocks more accurate.
  • tests/ut/quantization/test_modelslim_config.py
    • Implemented version-conditional import for the Attention class, switching between vllm.attention.layer and vllm.model_executor.layers.attention based on vLLM version.
  • tests/ut/spec_decode/test_eagle_proposer.py
    • Added uses_xdrope_dim = 0 to model_config and draft_model_config in VllmConfig setup for speculative decoding tests.
    • Added data_parallel_rank = 0 to parallel_config in VllmConfig setup for speculative decoding tests.
  • tests/ut/spec_decode/test_mtp_proposer.py
    • Added uses_xdrope_dim = 0 to model_config and draft_model_config in VllmConfig setup for speculative decoding tests.
    • Added data_parallel_rank = 0 to parallel_config in VllmConfig setup for speculative decoding tests.
  • vllm_ascend/attention/mla_v1.py
    • Added forward_mha and forward_mqa methods that raise NotImplementedError, indicating these are not directly supported for MLA attention.
  • vllm_ascend/attention/sfa_v1.py
    • Added forward_mha and forward_mqa methods that raise NotImplementedError, indicating these are not directly supported for SFA attention.
  • vllm_ascend/compilation/npugraph_ex_passes/graphex_qknorm_rope_fusion_pass.py
    • Updated the import of the Attention class to be conditional on the vLLM version.
  • vllm_ascend/compilation/passes/qknorm_rope_fusion_pass.py
    • Updated the import of the Attention class to be conditional on the vLLM version.
  • vllm_ascend/distributed/kv_transfer/kv_pool/cpu_offload/cpu_offload_connector.py
    • Updated imports for Attention and MLAAttention to be conditional on the vLLM version.
  • vllm_ascend/kv_offload/cpu_npu.py
    • Imported vllm_version_is to enable version-specific logic.
    • Modified the get_finished method to handle TransferResult differently based on the vLLM version (v0.15.0 vs. others).
  • vllm_ascend/ops/mla.py
    • Updated the import of MLAAttention to be conditional on the vLLM version.
    • Wrapped mla_attn.process_weights_after_loading to ensure impl.process_weights_after_loading is called, with a check for AscendSFAImpl.
  • vllm_ascend/patch/worker/init.py
    • Added an import for the new patch_huanyuan_vl module.
  • vllm_ascend/patch/worker/patch_huanyuan_vl.py
    • New file added to patch HunYuanVLProcessor.__call__ to remove the add_special_tokens argument.
  • vllm_ascend/patch/worker/patch_qwen3_next_mtp.py
    • Updated the import of the Attention class to be conditional on the vLLM version.
  • vllm_ascend/quantization/modelslim_config.py
    • Updated the import of the Attention class to be conditional on the vLLM version.
  • vllm_ascend/spec_decode/eagle_proposer.py
    • Imported vllm_version_is for version-dependent logic.
    • Added conditional resetting of forward_context.moe_layer_index to 0 for vLLM versions older than v0.15.0 during speculative decoding runs.
  • vllm_ascend/spec_decode/mtp_proposer.py
    • Imported vllm_version_is for version-dependent logic.
    • Added conditional resetting of forward_context.moe_layer_index to 0 for vLLM versions older than v0.15.0 during MTP speculative decoding runs.
  • vllm_ascend/worker/model_runner_v1.py
    • Updated imports for Attention and MLAAttention to be conditional on the vLLM version.
    • Added num_active_loras: int = 0 as a new parameter to the _dummy_run method signature.
Ignored Files
  • Ignored by pattern: .github/workflows/** (6)
    • .github/workflows/_pre_commit.yml
    • .github/workflows/bot_pr_create.yaml
    • .github/workflows/dockerfiles/Dockerfile.lint
    • .github/workflows/pr_test_full.yaml
    • .github/workflows/pr_test_light.yaml
    • .github/workflows/schedule_codecov_refresh.yaml
Activity
  • No specific review comments or activity have been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@vllm-ascend-ci vllm-ascend-ci added ready read for review ready-for-test start test by label for PR labels Feb 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several changes to ensure compatibility with vLLM v0.15.0 and a newer commit from the vLLM main branch. The modifications primarily involve conditional imports, version-specific logic for API changes, and updates to tests. While the changes are necessary and look correct, I've identified an opportunity to improve maintainability by refactoring duplicated code.

As per the repository's style guide, the pull request title and summary should be updated for clarity. Here are my suggestions:

Suggested PR Title:

[Misc][BugFix] Fix compatibility with vLLM v0.15.0

Suggested PR Summary:

### What this PR does / why we need it?
This PR introduces several changes to ensure compatibility with vLLM v0.15.0 and a recent main branch commit (`d7e17aaacd5...`). The changes include:
- Conditional imports for modules that have been moved in vLLM.
- Version-specific logic to handle API changes in `FusedMoEParallelConfig` and `TransferResult`.
- Updates to unit tests to mock new attributes and satisfy new interfaces.
- Addition of `forward_mha` and `forward_mqa` methods to attention implementations to conform to the new interface.
- A patch for `HunYuanVLProcessor` to handle changes in its `__call__` method signature.
- Resetting `moe_layer_index` for compatibility with newer vLLM versions.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test. Tested with vLLM v0.15.0.

Comment on lines +403 to +407
if not vllm_version_is("v0.15.0"):
# Reset MOE layer index before first model call
forward_context = get_forward_context()
if forward_context is not None:
forward_context.moe_layer_index = 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block of code to reset moe_layer_index is duplicated in multiple places within this file (in _propose and _run_merged_draft) and also in vllm_ascend/spec_decode/mtp_proposer.py. To improve maintainability and reduce redundancy, consider extracting this logic into a helper function. For example:

def _reset_moe_layer_index_if_needed():
    if not vllm_version_is("v0.15.0"):
        forward_context = get_forward_context()
        if forward_context is not None:
            forward_context.moe_layer_index = 0

This would make the code cleaner and easier to maintain in the future.

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
@vllm-ascend-ci vllm-ascend-ci changed the title Main0202 [main2main] upgrade vllm main 0202 Feb 5, 2026
@wangxiyuan wangxiyuan merged commit 922e5c1 into vllm-project:main Feb 5, 2026
26 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Feb 6, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (59 commits)
  [Feat.]: 310p support MOE models (vllm-project#6530)
  [Doc] backport 0.13.0 release note (vllm-project#6584)
  [CI] Update UT CANN version to 8.5.0 for main branch (vllm-project#6564)
  [CI] Change A2 runner (vllm-project#6557)
  [Bugfix] Fix the incorrect use of the output parameter in _forward_fia_slidingwindow (vllm-project#6469)
  [main2main] upgrade vllm main 0202 (vllm-project#6560)
  [CI][npugraph_ex]Fix npugraph ex e2e test (vllm-project#6553)
  [Feature]KV pool supports sparse attention (vllm-project#6339)
  [bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (vllm-project#6491)
  perf: adaptive block size selection in linear_persistent kernel (vllm-project#6537)
  [ModelRunner][Fix] Pads query_start_loc to satisfy FIA/TND constraint (vllm-project#6475)
  [Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide (vllm-project#6126)
  [Fusion] Add rmsnorm dynamic quant fusion pass (vllm-project#6274)
  [Bugfix] Synchronize only the current stream to avoid device sync (vllm-project#6432)
  [CI] Add long and short prompt tests for DeepSeek-V3.2 (vllm-project#6499)
  [Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (vllm-project#6442)
  [bugfix][npugraph_ex]duplicate pattern issue (vllm-project#6513)
  [bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass (vllm-project#6430)
  [Quant] GLM4.7-Flash Support W8A8 (vllm-project#6492)
  [Nightly][BugFix] Remove kv_cache nz test case for test_mla_preprocess_nq.py (vllm-project#6505)
  ...
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Feb 12, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: momochenchuw <chenchuw@huawei.com>
@wangxiyuan wangxiyuan mentioned this pull request Feb 24, 2026
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
vllm-project/vllm#32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to vllm-project/vllm#33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
vllm-project/vllm#33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
vllm-project/vllm#32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
vllm-project/vllm#32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to vllm-project/vllm#27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
vllm-project/vllm#33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
vllm-project/vllm#32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation module:ops module:quantization module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants