[Main2Main] Upgrade vllm commit to 0109#5752
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request updates the vllm dependency and introduces compatibility code. The changes look reasonable, but there is a critical issue with how the vllm version is being checked. The new code uses vllm_version_is, which performs an exact version match. This is very brittle and will likely break with older or newer patch/minor versions of vllm. I've left comments with suggestions to use version range comparisons for more robust and future-proof code.
1d83826 to
07020bf
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
07020bf to
c5650dd
Compare
485d307 to
1b6c2e7
Compare
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
1b6c2e7 to
8482bbd
Compare
…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: [CI] Fix lint CI (vllm-project#5880) [Feature] implement eagle spec decoding for model runner v2 (vllm-project#5840) [Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (vllm-project#5718) [EPLB][Bugfix] Get expert map from layers (vllm-project#5817) [Bugfix] Fixed an accuracy problem of sp with eagle3 (vllm-project#5816) [P/D] bugfix for p node force free requset (vllm-project#5431) [Lint]Style: Convert `example` to `ruff format` (vllm-project#5863) [Main2Main] Upgrade vllm commit to 0109 (vllm-project#5752) [Bugfix][P/D] fix layerwise connector for decoder tp size > num kv heads (vllm-project#5846) [Test][e2e][LoRA] Add more e2e tests to cover scenarios of LoRA (vllm-project#4075) [CustomOp][Perf] Merge Q/K split to simplify AscendApplyRotaryEmb for better performance (vllm-project#5799) [Lint]Style: Convert `root`, `benchmarks`, `tools` and `docs` to `ruff format` (vllm-project#5843) enable ep32 for dispatch_ffn_combine (vllm-project#5787)
### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df) 1. remove `init_cached_hf_modules ` due to vllm-project/vllm#31786 2. fix spec_decode e2e test due to vllm-project/vllm#29821 break 3. fix `vllm.v1.attention.backends.utils` duo to vllm-project/vllm#31891 4. fix `self.seq_lens - query_lens` on same device due to vllm-project/vllm#31773 5. skip model_runner_v2 e2e test due to `'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'` - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: hfadzxy <starmoon_zhang@163.com>
- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if the test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>
- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>
- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if the test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>
What this PR does / why we need it?
Upgrade vllm commit to 0109 (bde38c11df0ea066a740efe9b77fff5418be45df)
init_cached_hf_modulesdue to [Chore] Try removeinit_cached_hf_modulesvllm#31786vllm.v1.attention.backends.utilsduo to [Chore] Migrate V0 attention utils vllm#31891self.seq_lens - query_lenson same device due to [Attention][1/n] Remove usage of deprecatedseq_lens_cpuandnum_computed_tokens_cpuCommonAttentionMetadata properties vllm#31773'_OpNamespace' '_C' object has no attribute 'get_cuda_view_from_cpu_tensor'Does this PR introduce any user-facing change?
How was this patch tested?