Add the requirement of arctic-inference which speculative decoding with suffix_decode by frankie-ys · Pull Request #5045 · vllm-project/vllm-ascend

frankie-ys · 2025-12-15T12:01:59Z

Does this PR introduce any user-facing change?

suffix spec decode method rely on arctic-inference library. This PR add it into requirements to make sure the function works by default

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

…irements Signed-off-by: frankie-ys <yongshengwang@cmbchina.com>

…irements and pyproject.toml Signed-off-by: frankie-ys <yongshengwang@cmbchina.com>

gemini-code-assist

Code Review

This pull request adds arctic-inference==0.1.1 as a new dependency to both pyproject.toml and requirements.txt. The change is straightforward and correct. I have one suggestion for requirements.txt to add a comment explaining the purpose of the new dependency, which improves code maintainability and follows the existing convention in the file.

gemini-code-assist · 2025-12-15T12:04:09Z


 transformers<=4.57.1
+
+arctic-inference==0.1.1


For better maintainability and to follow the existing convention in this file, it's good practice to add a comment explaining the purpose of this new dependency. The PR title suggests it's for 'speculative decoding with suffix_decode'.

# Required for speculative decoding with suffix_decode arctic-inference==0.1.1

github-actions · 2025-12-15T12:19:54Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2025-12-16T03:42:36Z

please consider gemini's review

github-actions · 2025-12-16T08:29:33Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: frankie <wangyongsheng686@gmail.com>

frankie-ys · 2025-12-17T03:50:20Z

please consider gemini's review

OK, I have resovle this conflict.

github-actions · 2025-12-23T15:57:15Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: frankie <wangyongsheng686@gmail.com>

…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (58 commits) [Main2Main] Upgrade vllm commit to 0106 (vllm-project#5617) [CI]update bisheng version (vllm-project#5621) [UT][PCP&DCP] UT for block_table.py (vllm-project#5032) [Main2Main] Upgrade vllm commit to 0105 (vllm-project#5595) [CI] mv ops to correct path (vllm-project#5615) [BugFix] Fix Smoke Testing Bug for DSR1 longseq (vllm-project#5613) Revert "[Feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5545)" (vllm-project#5611) [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (vllm-project#5267) [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (vllm-project#5192) [docs] Correct image about prefill phase of PCP (vllm-project#5598) [CI] update triton-ascend version (vllm-project#5584) [P/D]Remove mooncake kvpool unused parameter `local_hostname` (vllm-project#5574) [Bugfix] record cos and sin cache in AscendRotaryEmbedding (vllm-project#5516) [bugfix] fix test_camem failed with triton-ascend (vllm-project#5492) [UT]add triton ops ut : test_fused_qkvzba_split_reshape_cat (vllm-project#5474) [CI] Download models from ms (vllm-project#5405) Docs: Add A3 Docker image guidance for Atlas A3 machines (vllm-project#5256) [Doc] Add NNAL installation guide and requirements (vllm-project#5235) Add the requirement of arctic-inference which speculative decoding with suffix_decode (vllm-project#5045) [BugFix][Fusion] Fix graph fusion failure problem (vllm-project#5253) ...

…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com>

…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com>

…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com>

frankie-ys and others added 5 commits December 15, 2025 17:36

[feat] fix speculative decoding with suffix_decoding initial the requ…

03f3ef3

…irements Signed-off-by: frankie-ys <yongshengwang@cmbchina.com>

Merge branch 'vllm-project:main' into main

cd0a270

[feat] fix speculative decoding with suffix_decoding initial the requ…

31b9f2a

…irements and pyproject.toml Signed-off-by: frankie-ys <yongshengwang@cmbchina.com>

Merge remote-tracking branch 'origin/main'

80781e8

Merge branch 'vllm-project:main' into main

6062a15

gemini-code-assist Bot reviewed Dec 15, 2025

View reviewed changes

wangxiyuan approved these changes Dec 16, 2025

View reviewed changes

github-actions Bot added the merge-conflicts label Dec 16, 2025

Merge branch 'main' into main

bd7e777

Signed-off-by: frankie <wangyongsheng686@gmail.com>

github-actions Bot removed the merge-conflicts label Dec 17, 2025

frankie-ys added 2 commits December 17, 2025 15:28

Merge branch 'main' into main

49c4b42

Merge branch 'main' into main

e097f89

github-actions Bot added the merge-conflicts label Dec 23, 2025

Merge branch 'main' into main

e7cd981

Signed-off-by: frankie <wangyongsheng686@gmail.com>

github-actions Bot removed the merge-conflicts label Dec 29, 2025

wangxiyuan merged commit ec35633 into vllm-project:main Jan 5, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the requirement of arctic-inference which speculative decoding with suffix_decode #5045

Add the requirement of arctic-inference which speculative decoding with suffix_decode #5045
wangxiyuan merged 9 commits intovllm-project:mainfrom
frankie-ys:main

frankie-ys commented Dec 15, 2025 •

edited by wangxiyuan

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Dec 15, 2025

Uh oh!

github-actions Bot commented Dec 15, 2025

Uh oh!

wangxiyuan commented Dec 16, 2025

Uh oh!

github-actions Bot commented Dec 16, 2025

Uh oh!

frankie-ys commented Dec 17, 2025

Uh oh!

github-actions Bot commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		transformers<=4.57.1

		arctic-inference==0.1.1

Conversation

frankie-ys commented Dec 15, 2025 • edited by wangxiyuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Dec 15, 2025

Uh oh!

wangxiyuan commented Dec 16, 2025

Uh oh!

github-actions Bot commented Dec 16, 2025

Uh oh!

frankie-ys commented Dec 17, 2025

Uh oh!

github-actions Bot commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frankie-ys commented Dec 15, 2025 •

edited by wangxiyuan

Loading