Skip to content

[Performance] Change the shape of kv_cache to avoid view of k_cache and v_cache.#204

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
whx-sjtu:change_kv_cache_shape
Mar 5, 2025
Merged

[Performance] Change the shape of kv_cache to avoid view of k_cache and v_cache.#204
wangxiyuan merged 1 commit intovllm-project:mainfrom
whx-sjtu:change_kv_cache_shape

Conversation

@whx-sjtu
Copy link
Copy Markdown
Collaborator

This PR changes the shape of kv cache to avoid the view of k_cache and v_cache.
What's more, cache the metadata of k_cache and v_cache to avoid duplicative slice operations to improve performance.

@whx-sjtu whx-sjtu force-pushed the change_kv_cache_shape branch 2 times, most recently from 894b170 to fb6686e Compare February 28, 2025 07:50
Comment thread vllm_ascend/worker.py Outdated
num_layers = len(self.cache_engine[ve].gpu_cache)
for i in range(num_layers):
self.cache_engine[ve].gpu_cache[i] = torch_npu.npu_format_cast(
self.cache_engine[ve].gpu_cache[i], 2)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official doc states that torch_npu.npu_format_cast is a inplace op, maybe we have no need to receive the return value.

And the next question, why 2 here?
https://www.hiascend.com/document/detail/zh/Pytorch/600/apiref/apilist/ptaoplist_000449.html#ZH-CN_TOPIC_0000002173377585__zh-cn_topic_0000002137977492_section1261113545317

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll change it to inplace.
Using 2 here is to cast the format from NCDHW to ND (2 indicates ND). This is because when we construct a tensor of more than 4 dimensions, the format of this tensor is NCDHW, while our operations only supports data in ND format.

@whx-sjtu whx-sjtu force-pushed the change_kv_cache_shape branch 3 times, most recently from fbb81fb to cb3c3ab Compare February 28, 2025 10:07
@wangxiyuan
Copy link
Copy Markdown
Collaborator

cc @ganyi1996ppo

@wangxiyuan
Copy link
Copy Markdown
Collaborator

worker.py has been moved to worker module, need a rebase please.

…_cache; cache the metadata of k_cache and v_cache to avoid replicated slice operation

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
@whx-sjtu whx-sjtu force-pushed the change_kv_cache_shape branch from cb3c3ab to a2277a7 Compare March 5, 2025 02:00
@wangxiyuan wangxiyuan merged commit 0d34634 into vllm-project:main Mar 5, 2025
wangxiyuan pushed a commit to wangxiyuan/vllm-ascend that referenced this pull request Mar 7, 2025
…nd v_cache. (vllm-project#204)

This PR changes the shape of kv cache to avoid the view of k_cache and
v_cache.
What's more, cache the metadata of k_cache and v_cache to avoid
duplicative slice operations to improve performance.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
wangxiyuan pushed a commit to wangxiyuan/vllm-ascend that referenced this pull request Mar 7, 2025
…nd v_cache. (vllm-project#204)

This PR changes the shape of kv cache to avoid the view of k_cache and
v_cache.
What's more, cache the metadata of k_cache and v_cache to avoid
duplicative slice operations to improve performance.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
ttanzhiqiang pushed a commit to ttanzhiqiang/vllm-ascend that referenced this pull request Apr 27, 2025
…nd v_cache. (vllm-project#204)

This PR changes the shape of kv cache to avoid the view of k_cache and
v_cache.
What's more, cache the metadata of k_cache and v_cache to avoid
duplicative slice operations to improve performance.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants