Add the requirement of arctic-inference which speculative decoding with suffix_decode #5045
Add the requirement of arctic-inference which speculative decoding with suffix_decode #5045wangxiyuan merged 9 commits intovllm-project:mainfrom
Conversation
…irements Signed-off-by: frankie-ys <yongshengwang@cmbchina.com>
…irements and pyproject.toml Signed-off-by: frankie-ys <yongshengwang@cmbchina.com>
There was a problem hiding this comment.
Code Review
This pull request adds arctic-inference==0.1.1 as a new dependency to both pyproject.toml and requirements.txt. The change is straightforward and correct. I have one suggestion for requirements.txt to add a comment explaining the purpose of the new dependency, which improves code maintainability and follows the existing convention in the file.
|
|
||
| transformers<=4.57.1 | ||
|
|
||
| arctic-inference==0.1.1 |
There was a problem hiding this comment.
For better maintainability and to follow the existing convention in this file, it's good practice to add a comment explaining the purpose of this new dependency. The PR title suggests it's for 'speculative decoding with suffix_decode'.
# Required for speculative decoding with suffix_decode
arctic-inference==0.1.1
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
please consider gemini's review |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: frankie <wangyongsheng686@gmail.com>
OK, I have resovle this conflict. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: frankie <wangyongsheng686@gmail.com>
…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com>
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (58 commits) [Main2Main] Upgrade vllm commit to 0106 (vllm-project#5617) [CI]update bisheng version (vllm-project#5621) [UT][PCP&DCP] UT for block_table.py (vllm-project#5032) [Main2Main] Upgrade vllm commit to 0105 (vllm-project#5595) [CI] mv ops to correct path (vllm-project#5615) [BugFix] Fix Smoke Testing Bug for DSR1 longseq (vllm-project#5613) Revert "[Feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5545)" (vllm-project#5611) [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (vllm-project#5267) [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (vllm-project#5192) [docs] Correct image about prefill phase of PCP (vllm-project#5598) [CI] update triton-ascend version (vllm-project#5584) [P/D]Remove mooncake kvpool unused parameter `local_hostname` (vllm-project#5574) [Bugfix] record cos and sin cache in AscendRotaryEmbedding (vllm-project#5516) [bugfix] fix test_camem failed with triton-ascend (vllm-project#5492) [UT]add triton ops ut : test_fused_qkvzba_split_reshape_cat (vllm-project#5474) [CI] Download models from ms (vllm-project#5405) Docs: Add A3 Docker image guidance for Atlas A3 machines (vllm-project#5256) [Doc] Add NNAL installation guide and requirements (vllm-project#5235) Add the requirement of arctic-inference which speculative decoding with suffix_decode (vllm-project#5045) [BugFix][Fusion] Fix graph fusion failure problem (vllm-project#5253) ...
…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com>
…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com>
…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com>
…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…th suffix_decode (vllm-project#5045) ### Does this PR introduce _any_ user-facing change? suffix spec decode method rely on `arctic-inference` library. This PR add it into requirements to make sure the function works by default ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: frankie-ys <yongshengwang@cmbchina.com> Signed-off-by: frankie <wangyongsheng686@gmail.com>
Does this PR introduce any user-facing change?
suffix spec decode method rely on
arctic-inferencelibrary. This PR add it into requirements to make sure the function works by defaultHow was this patch tested?