[bugfix] fix the complex and potentially problematic generate_kv_idx.#5957
[bugfix] fix the complex and potentially problematic generate_kv_idx.#5957wangxiyuan merged 1 commit intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request refactors the logic for generating KV cache indices in chunked prefill scenarios by removing the complex generate_kv_idx function. The new approach derives the necessary indices directly from existing metadata, which simplifies the code and fixes a dimension misalignment issue. The changes look good and improve maintainability. I've identified one area where the new implementation can be made more efficient by avoiding a redundant argsort operation.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
eb48feb to
871ab62
Compare
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
…atic generate_kv_idx. (#5955) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. ref: #5957 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
…vllm-project#5957) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: huangning1995 <huangning12@huawei.com>
…_kv_idx. (vllm-project#5957)" This reverts commit 3d916f4.
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [CI] Upgrade CANN to 8.5.0 (vllm-project#6070) Default enable MLAPO (vllm-project#5952) [Doc] Supplement PD separation parameters of DeepSeek V3.1 (vllm-project#6053) [Ascend] perf: optimize rope embedding with triton kernel for huge performance gain (vllm-project#5918) [Ops] update causal_conv1d_update (vllm-project#5984) [CI]Update triton ascend version in 3.2.0 (vllm-project#6067) [bugfix] fix the complex and potentially problematic generate_kv_idx. (vllm-project#5957)
…vllm-project#5957) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
…atic generate_kv_idx. (vllm-project#5955) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. ref: vllm-project#5957 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
…vllm-project#5957) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
…atic generate_kv_idx. (vllm-project#5955) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. ref: vllm-project#5957 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
…atic generate_kv_idx. (vllm-project#5955) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. ref: vllm-project#5957 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
…vllm-project#5957) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…vllm-project#5957) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
…vllm-project#5957) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…vllm-project#5957) ### What this PR does / why we need it? In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2c24bc6 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
What this PR does / why we need it?
In long-sequence scenarios, the chunked-prefill component may encounter dimension misalignment issues, which previously occurred during precision testing on the code_generate_lite dataset. This PR removes redundant computations and instead derives the value using existing results and straightforward calculations.