Skip to content

[CustomOp] Register AscendApplyRotaryEmb CustomOp and remove related patch#4667

Merged
wangxiyuan merged 8 commits intovllm-project:mainfrom
shen-shanshan:vit
Dec 23, 2025
Merged

[CustomOp] Register AscendApplyRotaryEmb CustomOp and remove related patch#4667
wangxiyuan merged 8 commits intovllm-project:mainfrom
shen-shanshan:vit

Conversation

@shen-shanshan
Copy link
Copy Markdown
Collaborator

@shen-shanshan shen-shanshan commented Dec 3, 2025

What this PR does / why we need it?

Following vllm-project/vllm#29873, register AscendApplyRotaryEmb CustomOp and remove related patch.

Does this PR introduce any user-facing change?

How was this patch tested?

✅ Test Qwen2.5-VL

Run:

vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \
--max_model_len 16384

Output:

{"id":"chatcmpl-b02c1ff3415d2462","object":"chat.completion","created":1766129265,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-In struct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is \"TONGYI Qwen.\" The word \"TONGYI\" is writ  ten in blue, and \"Qwen\" is written in gray. The text appears to be part of a logo or branding design.","refusal":null,"annotations":null,"audio":   null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"tok    en_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":78,"total_tokens":129,"completion_tokens":51,"prompt_tokens_d

✅ Test Qwen3-VL

Run:

vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct \
--max_model_len 16384

Output:

{"id":"chatcmpl-a3a7de5a900a9321","object":"chat.completion","created":1766129586,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is **“TONGYI Qwen”**.\n\n### How it looks:\n- **“TONGYI”** is written in **uppercase letters** in a **bold, modern sans-serif font**, colored **blue**.\n- **“Qwen”** is written in **lowercase letters** in a **slightly thinner, elegant sans-serif font**, colored **dark gray**.\n- The two lines of text are stacked vertically, with “TONG","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":112,"total_tokens":212,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 3, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a custom operator AscendApplyRotaryEmb for applying rotary embeddings on Ascend hardware and registers it. The implementation refactors existing logic from patch_qwen2_5_vl.py. However, I've found a critical bug in the new AscendApplyRotaryEmb implementation where incorrect tensor shape manipulation will lead to a runtime error. The logic for preparing cos and sin tensors was copied from an implementation for a different model and is not compatible with the input tensor shapes for Qwen2.5-VL.

Comment thread vllm_ascend/ops/rotary_embedding.py Outdated
Comment thread vllm_ascend/patch/worker/patch_qwen2_5_vl.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 4, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@shen-shanshan shen-shanshan changed the title [CustomOp] Implement ApplyRotaryEmb CustomOp and register it [CustomOp] Register ApplyRotaryEmb CustomOp and remove related patch Dec 19, 2025
@shen-shanshan shen-shanshan changed the title [CustomOp] Register ApplyRotaryEmb CustomOp and remove related patch [CustomOp] Register AscendApplyRotaryEmb CustomOp and remove related patch Dec 19, 2025
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@shen-shanshan shen-shanshan added ready read for review ready-for-test start test by label for PR and removed merge-conflicts labels Dec 22, 2025
@wangxiyuan
Copy link
Copy Markdown
Collaborator

wangxiyuan commented Dec 22, 2025

https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/patch/__init__.py should be updated as well. And delete related comment by this #5196 as well.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: shen-shanshan <467638484@qq.com>
@shen-shanshan shen-shanshan added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR merge-conflicts labels Dec 22, 2025
@shen-shanshan
Copy link
Copy Markdown
Collaborator Author

shen-shanshan commented Dec 22, 2025

CC @wangxiyuan All CI passed.

Signed-off-by: shen-shanshan <467638484@qq.com>
@wangxiyuan wangxiyuan merged commit 6c47853 into vllm-project:main Dec 23, 2025
14 checks passed
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…patch (vllm-project#4667)

### What this PR does / why we need it?

Following vllm-project/vllm#29873, register
`AscendApplyRotaryEmb` CustomOp and remove related patch.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

#### ✅ Test Qwen2.5-VL

Run:

```bash
vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \
--max_model_len 16384
```

Output:

```
{"id":"chatcmpl-b02c1ff3415d2462","object":"chat.completion","created":1766129265,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-In struct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is \"TONGYI Qwen.\" The word \"TONGYI\" is writ  ten in blue, and \"Qwen\" is written in gray. The text appears to be part of a logo or branding design.","refusal":null,"annotations":null,"audio":   null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"tok    en_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":78,"total_tokens":129,"completion_tokens":51,"prompt_tokens_d
```

#### ✅ Test Qwen3-VL

Run:

```bash
vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct \
--max_model_len 16384
```

Output:

```
{"id":"chatcmpl-a3a7de5a900a9321","object":"chat.completion","created":1766129586,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is **“TONGYI Qwen”**.\n\n### How it looks:\n- **“TONGYI”** is written in **uppercase letters** in a **bold, modern sans-serif font**, colored **blue**.\n- **“Qwen”** is written in **lowercase letters** in a **slightly thinner, elegant sans-serif font**, colored **dark gray**.\n- The two lines of text are stacked vertically, with “TONG","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":112,"total_tokens":212,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
```

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…patch (vllm-project#4667)

### What this PR does / why we need it?

Following vllm-project/vllm#29873, register
`AscendApplyRotaryEmb` CustomOp and remove related patch.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

#### ✅ Test Qwen2.5-VL

Run:

```bash
vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \
--max_model_len 16384
```

Output:

```
{"id":"chatcmpl-b02c1ff3415d2462","object":"chat.completion","created":1766129265,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-In struct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is \"TONGYI Qwen.\" The word \"TONGYI\" is writ  ten in blue, and \"Qwen\" is written in gray. The text appears to be part of a logo or branding design.","refusal":null,"annotations":null,"audio":   null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"tok    en_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":78,"total_tokens":129,"completion_tokens":51,"prompt_tokens_d
```

#### ✅ Test Qwen3-VL

Run:

```bash
vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct \
--max_model_len 16384
```

Output:

```
{"id":"chatcmpl-a3a7de5a900a9321","object":"chat.completion","created":1766129586,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is **“TONGYI Qwen”**.\n\n### How it looks:\n- **“TONGYI”** is written in **uppercase letters** in a **bold, modern sans-serif font**, colored **blue**.\n- **“Qwen”** is written in **lowercase letters** in a **slightly thinner, elegant sans-serif font**, colored **dark gray**.\n- The two lines of text are stacked vertically, with “TONG","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":112,"total_tokens":212,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
```

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:core module:ops ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants