[Doc]Add chinese doc by Potabk · Pull Request #10 · vllm-project/vllm-ascend

Potabk · 2025-02-06T02:58:51Z

What this PR does / why we need it?

This PR adds Chinese documents for vllm-ascend for Chinese-speaking developers

Does this PR introduce any user-facing change?

Change as follows

add README.zh.md
add environment.zh.md
add CONTRIBUTING.zh.md

How was this patch tested?

By CI

Yikun · 2025-02-06T03:29:52Z

README.zh.md

+</p>
+
+<h3 align="center">
+vLLM 昇腾插件


Suggested change

vLLM 昇腾插件

vLLM Ascend Plugin

Yikun · 2025-02-06T03:34:05Z

README.zh.md

+
+vLLM 昇腾插件 (`vllm-ascend`) 是一个运行在昇腾NPU上的后端插件。
+
+此插件是 vLLM 社区中支持 Ascend 后端推荐的方法。它遵循[[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162)中概述的原则：硬件可插拔，提供硬件可插拔接口，解耦 Ascend NPU 与 vLLM 的集成。


Suggested change

此插件是 vLLM 社区中支持 Ascend 后端推荐的方法。它遵循[[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162)中概述的原则：硬件可插拔，提供硬件可插拔接口，解耦 Ascend NPU 与 vLLM 的集成。

此插件是 vLLM 社区中支持昇腾后端的推荐方式。它遵循[[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162)所述原则：通过解耦的方式提供了vLLM对Ascend NPU的支持。

Yikun · 2025-02-06T03:35:05Z

README.zh.md

+</p>
+
+<p align="center">
+<a href="README.md"><b>English</b></a> | <a href="README.zh.md"><b>中文</b></a>


Suggested change

<a href="README.md">English</a> | <a href="README.zh.md">中文</a>

<a href="README.md">English</a> | <a>中文</a>

Yikun · 2025-02-06T03:36:52Z

README.zh.md

+---
+## 总览
+
+vLLM 昇腾插件 (`vllm-ascend`) 是一个运行在昇腾NPU上的后端插件。


Suggested change

vLLM 昇腾插件 (`vllm-ascend`) 是一个运行在昇腾NPU上的后端插件。

vLLM 昇腾插件 (`vllm-ascend`) 是一个让vLLM在Ascend NPU无缝运行的后端插件。

Yikun · 2025-02-06T03:38:13Z

README.zh.md

+
+此插件是 vLLM 社区中支持 Ascend 后端推荐的方法。它遵循[[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162)中概述的原则：硬件可插拔，提供硬件可插拔接口，解耦 Ascend NPU 与 vLLM 的集成。
+
+使用 vLLM Ascend 插件，包括类Transformer、混合专家(MOE)、嵌入、多模态等类型大语言模型在内的流行开源模型可以在 Ascend NPU 上无缝运行。


Suggested change

使用 vLLM Ascend 插件，包括类Transformer、混合专家(MOE)、嵌入、多模态等类型大语言模型在内的流行开源模型可以在 Ascend NPU 上无缝运行。

使用 vLLM 昇腾插件，可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。

Yikun · 2025-02-06T03:44:47Z

CONTRIBUTING.zh.md

+# 构建:
+# - 仅支持Linux (torch_npu 限制)
+# pip install -e .
+# - 在其他操作系统上进行调试构建（无需安装依赖）


Suggested change

# - 在其他操作系统上进行调试构建（无需安装依赖）

# - 在其他操作系统上构建安装，需要跳过依赖

Yikun · 2025-02-06T03:46:24Z

CONTRIBUTING.zh.md

+bash format.sh
+
+# 构建:
+# - 仅支持Linux (torch_npu 限制)


目前仅支持在Linux上进行完整构建（torch_npu 限制）

Yikun · 2025-02-06T03:48:33Z

CONTRIBUTING.zh.md

+
+## 其他
+
+您可以在 [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html) 上找到有关为 vLLM Ascend 后端插件做出贡献的更多信息。


Suggested change

您可以在 [docs.vllm.ai](https://docs.vllm.ai/en/latest/contributing/overview.html) 上找到有关为 vLLM Ascend 后端插件做出贡献的更多信息。

您可以在 [docs.vllm.ai](https://docs.vllm.ai/en/latest/contributing/overview.html) 上找到更多有关为 vLLM 昇腾插件贡献的信息。

Yikun · 2025-02-06T03:49:03Z

docs/environment.zh.md

+
+#### 手动安装
+
+按照[昇腾安装指南](https://ascend.github.io/docs/sources/ascend/quick_install.html)中提供的说明配置环境。


Suggested change

按照[昇腾安装指南](https://ascend.github.io/docs/sources/ascend/quick_install.html)中提供的说明配置环境。

您也可以选择手动安装，按照[昇腾安装指南](https://ascend.github.io/docs/sources/ascend/quick_install.html)中提供的说明配置环境。

Yikun · 2025-02-06T03:49:28Z

CONTRIBUTING.zh.md

+# vllm昇腾插件贡献
+
+## 构建与测试
+在提交PR之前建议在本地开发环境进行构建和测试。


Suggested change

在提交PR之前建议在本地开发环境进行构建和测试。

我们推荐您在提交PR之前在本地开发环境进行构建和测试。

Yikun · 2025-02-06T06:11:42Z

README.zh.md

+
+## 开始使用
+
+> [!注意]


Suggested change

> [!注意]

> [!NOTE]

wangxiyuan · 2025-02-06T06:26:38Z

The README is updated, please rebase

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk · 2025-02-06T06:41:47Z

The README is updated, please rebase

done

wangxiyuan · 2025-02-06T03:06:27Z

README.zh.md

+</p>
+
+<p align="center">
+<a href="README.md"><b>English</b></a> | <a href="README.zh.md"><b>中文</b></a>


the link for 中文 is useless here.

### What this PR does / why we need it? This PR adds Chinese documents for vllm-ascend for Chinese-speaking developers ### Does this PR introduce _any_ user-facing change? Change as follows - add README.zh.md - add environment.zh.md - add CONTRIBUTING.zh.md ### How was this patch tested? By CI --------- Signed-off-by: wangli <wangli858794774@gmail.com>

remove sync when building attn metadata

### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |`vllm_ascend/ops/layer_shard_linear.py`| |`vllm_ascend/ops/linear.py`| |`vllm_ascend/ops/linear_op.py`| |`vllm_ascend/worker/worker.py`| | ` vllm_ascend/patch/worker/patch_bert.py` | | ` vllm_ascend/patch/worker/patch_deepseek.py` | | ` vllm_ascend/patch/worker/patch_distributed.py` | | ` vllm_ascend/patch/worker/patch_module.py` | | ` vllm_ascend/patch/worker/patch_multimodal_merge.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next_mtp.py` | | ` vllm_ascend/patch/worker/patch_rejection_sampler.py` | | ` vllm_ascend/patch/worker/patch_rope.py` | | ` vllm_ascend/patch/worker/patch_triton.py` | | ` vllm_ascend/patch/worker/patch_unquantized_gemm.py` | | ` vllm_ascend/patch/worker/patch_v2_egale.py` | |` vllm_ascend/worker/npu_input_batch.py`| |` vllm_ascend/worker/v2/aclgraph_utils.py`| |` vllm_ascend/worker/v2/attn_utils.py`| |` vllm_ascend/worker/v2/model_runner.py`| |` vllm_ascend/worker/v2/sample/gumbel.py`| |` vllm_ascend/worker/v2/sample/penalties.py`| |` vllm_ascend/worker/v2/sample/sampler.py`| |` vllm_ascend/worker/v2/spec_decode/__init__.py`| |` vllm_ascend/worker/v2/spec_decode/eagle.py`| |` vllm_ascend/worker/v2/states.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: SILONG ZENG <2609716663@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Patch] Remove the patch of MiniCPM (vllm-project#5975) [P/D] layerwise connector support recompute scheduler (vllm-project#5900) [CI] Add workflow support for lint image build (vllm-project#6489) [Bugfix] Fix problematic dummy_run & improper input_batch_size in eagle (vllm-project#6517) [Refactor]310p_e2e test case update (vllm-project#6539) [Refactor]refactor p2p connector (vllm-project#6551) [Refactor]refactor 310p attention impl and add ut (vllm-project#6579) [Refactor]refactor 310p ops and add ut (vllm-project#6591) [Ops][Refactor] Remove custom rotary_embedding operator (vllm-project#6523) [Lint]Style: Convert `vllm-ascend/` to ruff format(new Batch vllm-project#8) (vllm-project#6604) [Test] Add initial multi modal cases of Qwen2.5-VL-7B-Instruct for disaggregated encoder (vllm-project#5301) [CI] Fix broken CI (vllm-project#6599) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#10) (vllm-project#6173) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#11) (vllm-project#6176) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#8) (vllm-project#6129) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#7) (vllm-project#6023) [CI][Misc] Some improvement for github action (vllm-project#6587) [Image] Bump mooncake version to v0.3.8.post1 (vllm-project#6428)

) (vllm-project#6173) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |`vllm_ascend/ops/layer_shard_linear.py`| |`vllm_ascend/ops/linear.py`| |`vllm_ascend/ops/linear_op.py`| |`vllm_ascend/worker/worker.py`| | ` vllm_ascend/patch/worker/patch_bert.py` | | ` vllm_ascend/patch/worker/patch_deepseek.py` | | ` vllm_ascend/patch/worker/patch_distributed.py` | | ` vllm_ascend/patch/worker/patch_module.py` | | ` vllm_ascend/patch/worker/patch_multimodal_merge.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next_mtp.py` | | ` vllm_ascend/patch/worker/patch_rejection_sampler.py` | | ` vllm_ascend/patch/worker/patch_rope.py` | | ` vllm_ascend/patch/worker/patch_triton.py` | | ` vllm_ascend/patch/worker/patch_unquantized_gemm.py` | | ` vllm_ascend/patch/worker/patch_v2_egale.py` | |` vllm_ascend/worker/npu_input_batch.py`| |` vllm_ascend/worker/v2/aclgraph_utils.py`| |` vllm_ascend/worker/v2/attn_utils.py`| |` vllm_ascend/worker/v2/model_runner.py`| |` vllm_ascend/worker/v2/sample/gumbel.py`| |` vllm_ascend/worker/v2/sample/penalties.py`| |` vllm_ascend/worker/v2/sample/sampler.py`| |` vllm_ascend/worker/v2/spec_decode/__init__.py`| |` vllm_ascend/worker/v2/spec_decode/eagle.py`| |` vllm_ascend/worker/v2/states.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: SILONG ZENG <2609716663@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

) (vllm-project#6173) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |`vllm_ascend/ops/layer_shard_linear.py`| |`vllm_ascend/ops/linear.py`| |`vllm_ascend/ops/linear_op.py`| |`vllm_ascend/worker/worker.py`| | ` vllm_ascend/patch/worker/patch_bert.py` | | ` vllm_ascend/patch/worker/patch_deepseek.py` | | ` vllm_ascend/patch/worker/patch_distributed.py` | | ` vllm_ascend/patch/worker/patch_module.py` | | ` vllm_ascend/patch/worker/patch_multimodal_merge.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next_mtp.py` | | ` vllm_ascend/patch/worker/patch_rejection_sampler.py` | | ` vllm_ascend/patch/worker/patch_rope.py` | | ` vllm_ascend/patch/worker/patch_triton.py` | | ` vllm_ascend/patch/worker/patch_unquantized_gemm.py` | | ` vllm_ascend/patch/worker/patch_v2_egale.py` | |` vllm_ascend/worker/npu_input_batch.py`| |` vllm_ascend/worker/v2/aclgraph_utils.py`| |` vllm_ascend/worker/v2/attn_utils.py`| |` vllm_ascend/worker/v2/model_runner.py`| |` vllm_ascend/worker/v2/sample/gumbel.py`| |` vllm_ascend/worker/v2/sample/penalties.py`| |` vllm_ascend/worker/v2/sample/sampler.py`| |` vllm_ascend/worker/v2/spec_decode/__init__.py`| |` vllm_ascend/worker/v2/spec_decode/eagle.py`| |` vllm_ascend/worker/v2/states.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: SILONG ZENG <2609716663@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

) (vllm-project#6173) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |`vllm_ascend/ops/layer_shard_linear.py`| |`vllm_ascend/ops/linear.py`| |`vllm_ascend/ops/linear_op.py`| |`vllm_ascend/worker/worker.py`| | ` vllm_ascend/patch/worker/patch_bert.py` | | ` vllm_ascend/patch/worker/patch_deepseek.py` | | ` vllm_ascend/patch/worker/patch_distributed.py` | | ` vllm_ascend/patch/worker/patch_module.py` | | ` vllm_ascend/patch/worker/patch_multimodal_merge.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next_mtp.py` | | ` vllm_ascend/patch/worker/patch_rejection_sampler.py` | | ` vllm_ascend/patch/worker/patch_rope.py` | | ` vllm_ascend/patch/worker/patch_triton.py` | | ` vllm_ascend/patch/worker/patch_unquantized_gemm.py` | | ` vllm_ascend/patch/worker/patch_v2_egale.py` | |` vllm_ascend/worker/npu_input_batch.py`| |` vllm_ascend/worker/v2/aclgraph_utils.py`| |` vllm_ascend/worker/v2/attn_utils.py`| |` vllm_ascend/worker/v2/model_runner.py`| |` vllm_ascend/worker/v2/sample/gumbel.py`| |` vllm_ascend/worker/v2/sample/penalties.py`| |` vllm_ascend/worker/v2/sample/sampler.py`| |` vllm_ascend/worker/v2/spec_decode/__init__.py`| |` vllm_ascend/worker/v2/spec_decode/eagle.py`| |` vllm_ascend/worker/v2/states.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: SILONG ZENG <2609716663@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

) (vllm-project#6173) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |`vllm_ascend/ops/layer_shard_linear.py`| |`vllm_ascend/ops/linear.py`| |`vllm_ascend/ops/linear_op.py`| |`vllm_ascend/worker/worker.py`| | ` vllm_ascend/patch/worker/patch_bert.py` | | ` vllm_ascend/patch/worker/patch_deepseek.py` | | ` vllm_ascend/patch/worker/patch_distributed.py` | | ` vllm_ascend/patch/worker/patch_module.py` | | ` vllm_ascend/patch/worker/patch_multimodal_merge.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next_mtp.py` | | ` vllm_ascend/patch/worker/patch_rejection_sampler.py` | | ` vllm_ascend/patch/worker/patch_rope.py` | | ` vllm_ascend/patch/worker/patch_triton.py` | | ` vllm_ascend/patch/worker/patch_unquantized_gemm.py` | | ` vllm_ascend/patch/worker/patch_v2_egale.py` | |` vllm_ascend/worker/npu_input_batch.py`| |` vllm_ascend/worker/v2/aclgraph_utils.py`| |` vllm_ascend/worker/v2/attn_utils.py`| |` vllm_ascend/worker/v2/model_runner.py`| |` vllm_ascend/worker/v2/sample/gumbel.py`| |` vllm_ascend/worker/v2/sample/penalties.py`| |` vllm_ascend/worker/v2/sample/sampler.py`| |` vllm_ascend/worker/v2/spec_decode/__init__.py`| |` vllm_ascend/worker/v2/spec_decode/eagle.py`| |` vllm_ascend/worker/v2/states.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: SILONG ZENG <2609716663@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

) (vllm-project#6173) ### What this PR does / why we need it? **Scope of Changes**: | File Path | | :--- | |`vllm_ascend/ops/layer_shard_linear.py`| |`vllm_ascend/ops/linear.py`| |`vllm_ascend/ops/linear_op.py`| |`vllm_ascend/worker/worker.py`| | ` vllm_ascend/patch/worker/patch_bert.py` | | ` vllm_ascend/patch/worker/patch_deepseek.py` | | ` vllm_ascend/patch/worker/patch_distributed.py` | | ` vllm_ascend/patch/worker/patch_module.py` | | ` vllm_ascend/patch/worker/patch_multimodal_merge.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next.py` | | ` vllm_ascend/patch/worker/patch_qwen3_next_mtp.py` | | ` vllm_ascend/patch/worker/patch_rejection_sampler.py` | | ` vllm_ascend/patch/worker/patch_rope.py` | | ` vllm_ascend/patch/worker/patch_triton.py` | | ` vllm_ascend/patch/worker/patch_unquantized_gemm.py` | | ` vllm_ascend/patch/worker/patch_v2_egale.py` | |` vllm_ascend/worker/npu_input_batch.py`| |` vllm_ascend/worker/v2/aclgraph_utils.py`| |` vllm_ascend/worker/v2/attn_utils.py`| |` vllm_ascend/worker/v2/model_runner.py`| |` vllm_ascend/worker/v2/sample/gumbel.py`| |` vllm_ascend/worker/v2/sample/penalties.py`| |` vllm_ascend/worker/v2/sample/sampler.py`| |` vllm_ascend/worker/v2/spec_decode/__init__.py`| |` vllm_ascend/worker/v2/spec_decode/eagle.py`| |` vllm_ascend/worker/v2/states.py`| ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: MrZ20 <2609716663@qq.com> Signed-off-by: SILONG ZENG <2609716663@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

Potabk changed the title ~~[Doc]add chinese doc~~ [Doc]Add chinese doc Feb 6, 2025

Yikun reviewed Feb 6, 2025

View reviewed changes

README.zh.md Outdated

## 开始使用

> [!注意]

Copy link

Member

Yikun Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

> [!注意]

> [!NOTE]

Potabk added 2 commits February 6, 2025 14:27

add chinese doc

6446f15

Signed-off-by: wangli <wangli858794774@gmail.com>

fix doc expression

bcd7915

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk force-pushed the main branch from e677eb7 to bcd7915 Compare February 6, 2025 06:34

wangxiyuan approved these changes Feb 6, 2025

View reviewed changes

wangxiyuan merged commit 8cb5615 into vllm-project:main Feb 6, 2025
2 checks passed

wangxiyuan mentioned this pull request Feb 6, 2025

Add CN version for readme doc #4

Closed

quanliu1991 mentioned this pull request Apr 10, 2025

[Bug]: RuntimeError: setup failed! #393

Closed

fengxu-sz mentioned this pull request Apr 28, 2025

[Feature]: GLM-4-32B-0414 #686

Open

bouyeijiang mentioned this pull request Jul 29, 2025

[Bug]: 300I DUO OUT OF MEMORY #1993

Closed

Keithwwa mentioned this pull request Aug 6, 2025

[Bug]: Qwen2.5-7B The process exits for this inner error, and the current working operator name is SelfAttentionOperation #2239

Open

shen-shanshan mentioned this pull request Aug 28, 2025

[Release]: Release checklist for v0.10.1rc1 #2525

Closed

48 tasks

1448163534 mentioned this pull request Sep 3, 2025

[Usage]:openEuler22.03 aarch64系统，Atlas300I-Duo(310P3）单卡，使用v0.10.0rc1的时候，openEuler镜像和Ubuntu镜像在容器内启动报错 #2720

Open

bigbosskai mentioned this pull request Sep 5, 2025

[Bug]: Qwen2.5-72B torchair graph [rank2]:[E905 06:51:44.673826892 compiler_depend.ts:429] SelfAttentionOperation setup failed! #2779

Open

Deepmie mentioned this pull request Sep 9, 2025

[Bug]: When I run the example.py on the Atlas 800I A2, it still reports the error: call aclnnSwiGlu failed. #2845

Open

leijie-ww mentioned this pull request Oct 17, 2025

[Bug]: vllm:EngineCore process coredump while testing TextVQA dataset for both Qwen3-VL-30B-A3B-Instruct and Qwen2.5-VL-7B-Instruct #3513

Closed

menogrey mentioned this pull request Oct 23, 2025

[Bug]: Failed to run offline Pangu Pro MOE in tutorials #3563

Open

changdawei1 mentioned this pull request Dec 1, 2025

[Bug]: master分支aclgraph decode only测试qwen3 32B 3.5k输入长序列时，服务崩溃SelfAttentionOperation setup failed #4594

Closed

BruceFan08 mentioned this pull request Dec 4, 2025

[Bug]: Qwen3-VL-32B跑一半会报500错误 #4678

Closed

This was referenced Dec 11, 2025

[Bug]: v0.11.0rc0部署qwen3-vl-8b时崩溃（310P） #4074

Closed

Fix 310P issues in main #3779

Closed

hust17yixuan pushed a commit to hust17yixuan/vllm-ascend that referenced this pull request Feb 5, 2026

Merge pull request vllm-project#10 from GDzhu01/liujiaxu-fix-sync

8e2991d

remove sync when building attn metadata

fengle-great mentioned this pull request Feb 27, 2026

[Bug]: 310P 使用quay.io/ascend/vllm-ascend:v0.14.0rc1-310p-openeuler部署qwen3-vl时报错 #6842

Closed


		vLLM 昇腾插件 (`vllm-ascend`) 是一个运行在昇腾NPU上的后端插件。

		此插件是 vLLM 社区中支持 Ascend 后端推荐的方法。它遵循[[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162)中概述的原则：硬件可插拔，提供硬件可插拔接口，解耦 Ascend NPU 与 vLLM 的集成。

	此插件是 vLLM 社区中支持 Ascend 后端推荐的方法。它遵循[[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162)中概述的原则：硬件可插拔，提供硬件可插拔接口，解耦 Ascend NPU 与 vLLM 的集成。
	此插件是 vLLM 社区中支持昇腾后端的推荐方式。它遵循[[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162)所述原则：通过解耦的方式提供了vLLM对Ascend NPU的支持。

	<a href="README.md"><b>English</b></a> \| <a href="README.zh.md"><b>中文</b></a>
	<a href="README.md"><b>English</b></a> \| <a><b>中文</b></a>

	vLLM 昇腾插件 (`vllm-ascend`) 是一个运行在昇腾NPU上的后端插件。
	vLLM 昇腾插件 (`vllm-ascend`) 是一个让vLLM在Ascend NPU无缝运行的后端插件。


		此插件是 vLLM 社区中支持 Ascend 后端推荐的方法。它遵循[[RFC]: Hardware pluggable](https://github.com/vllm-project/vllm/issues/11162)中概述的原则：硬件可插拔，提供硬件可插拔接口，解耦 Ascend NPU 与 vLLM 的集成。

		使用 vLLM Ascend 插件，包括类Transformer、混合专家(MOE)、嵌入、多模态等类型大语言模型在内的流行开源模型可以在 Ascend NPU 上无缝运行。

	使用 vLLM Ascend 插件，包括类Transformer、混合专家(MOE)、嵌入、多模态等类型大语言模型在内的流行开源模型可以在 Ascend NPU 上无缝运行。
	使用 vLLM 昇腾插件，可以让类Transformer、混合专家(MOE)、嵌入、多模态等流行的大语言模型在 Ascend NPU 上无缝运行。

	# - 在其他操作系统上进行调试构建（无需安装依赖）
	# - 在其他操作系统上构建安装，需要跳过依赖


		## 其他

		您可以在 [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html) 上找到有关为 vLLM Ascend 后端插件做出贡献的更多信息。


		#### 手动安装

		按照[昇腾安装指南](https://ascend.github.io/docs/sources/ascend/quick_install.html)中提供的说明配置环境。

	按照[昇腾安装指南](https://ascend.github.io/docs/sources/ascend/quick_install.html)中提供的说明配置环境。
	您也可以选择手动安装，按照[昇腾安装指南](https://ascend.github.io/docs/sources/ascend/quick_install.html)中提供的说明配置环境。

	在提交PR之前建议在本地开发环境进行构建和测试。
	我们推荐您在提交PR之前在本地开发环境进行构建和测试。

Conversation

Potabk commented Feb 6, 2025 • edited by wangxiyuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Feb 6, 2025

Uh oh!

Potabk commented Feb 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Potabk commented Feb 6, 2025 •

edited by wangxiyuan

Loading