[Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models by Tflowers-0129 · Pull Request #5776 · vllm-project/vllm-ascend

Tflowers-0129 · 2026-01-10T13:10:48Z

What this PR does / why we need it?

Add basic 310p support. Only dense models work with eager mode now.

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@2f4e654

github-actions · 2026-01-10T13:11:03Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces support for the 310P device, enabling it to run qwen2.5/3 dense and qwen2.5vl models. The changes are extensive, adding a new _310p directory with device-specific implementations and updating the build system and platform code to conditionally use these new components. While the overall approach is sound, I've identified a critical issue in the attention implementation that could lead to incorrect outputs, and another high-severity issue related to a hardcoded sequence length. Addressing these will be important for the stability and correctness of the 310P support.

vllm_ascend/_310p/attention/attention_v1.py

vllm_ascend/_310p/attention/attention_mask.py

vllm_ascend/_310p/attention/attention_v1.py

vllm_ascend/_310p/modelrunner_310p.py

vllm_ascend/_310p/attention/attention_mask.py

vllm_ascend/_310p/ops/mm_encoder_attention.py

vllm_ascend/_310p/ops/activation.py

vllm_ascend/_310p/attention/attention_mask.py

vllm_ascend/_310p/modelrunner_310p.py

vllm_ascend/_310p/attention/attention_mask.py

vllm_ascend/_310p/modelrunner_310p.py

vllm_ascend/_310p/worker_310p.py

github-actions · 2026-01-16T13:02:24Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2026-01-16T16:08:28Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: Tflowers-0129 <2906339855@qq.com>

Signed-off-by: Shaoxu Cheng <2906339855@qq.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (110 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (637 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

…ls (vllm-project#5776) ### What this PR does / why we need it? Add basic 310p support. Only dense models work with eager mode now. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com> Signed-off-by: Shaoxu Cheng <2906339855@qq.com>

…ls (vllm-project#5776) ### What this PR does / why we need it? Add basic 310p support. Only dense models work with eager mode now. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com> Signed-off-by: Shaoxu Cheng <2906339855@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…ls (vllm-project#5776) ### What this PR does / why we need it? Add basic 310p support. Only dense models work with eager mode now. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com> Signed-off-by: Shaoxu Cheng <2906339855@qq.com>

…ls (vllm-project#5776) ### What this PR does / why we need it? Add basic 310p support. Only dense models work with eager mode now. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com> Signed-off-by: Shaoxu Cheng <2906339855@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…ls (vllm-project#5776) ### What this PR does / why we need it? Add basic 310p support. Only dense models work with eager mode now. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 --------- Signed-off-by: Tflowers-0129 <2906339855@qq.com> Signed-off-by: Shaoxu Cheng <2906339855@qq.com>

github-actions bot added the module:core label Jan 10, 2026

gemini-code-assist bot reviewed Jan 10, 2026

View reviewed changes

vllm_ascend/_310p/attention/attention_v1.py Outdated Show resolved Hide resolved

vllm_ascend/_310p/attention/attention_mask.py Outdated Show resolved Hide resolved