[Performance] Change the shape of kv_cache to avoid view of k_cache and v_cache. by whx-sjtu · Pull Request #204 · vllm-project/vllm-ascend

whx-sjtu · 2025-02-28T07:37:15Z

This PR changes the shape of kv cache to avoid the view of k_cache and v_cache.
What's more, cache the metadata of k_cache and v_cache to avoid duplicative slice operations to improve performance.

MengqingCao · 2025-02-28T07:53:13Z

+            num_layers = len(self.cache_engine[ve].gpu_cache)
+            for i in range(num_layers):
+                self.cache_engine[ve].gpu_cache[i] = torch_npu.npu_format_cast(
+                    self.cache_engine[ve].gpu_cache[i], 2)


The official doc states that torch_npu.npu_format_cast is a inplace op, maybe we have no need to receive the return value.

And the next question, why 2 here?
https://www.hiascend.com/document/detail/zh/Pytorch/600/apiref/apilist/ptaoplist_000449.html#ZH-CN_TOPIC_0000002173377585__zh-cn_topic_0000002137977492_section1261113545317

That makes sense. I'll change it to inplace.
Using 2 here is to cast the format from NCDHW to ND (2 indicates ND). This is because when we construct a tensor of more than 4 dimensions, the format of this tensor is NCDHW, while our operations only supports data in ND format.

wangxiyuan · 2025-03-03T01:21:30Z

cc @ganyi1996ppo

wangxiyuan · 2025-03-04T08:23:13Z

worker.py has been moved to worker module, need a rebase please.

…_cache; cache the metadata of k_cache and v_cache to avoid replicated slice operation Signed-off-by: hw_whx <wanghexiang7@huawei.com>

…nd v_cache. (vllm-project#204) This PR changes the shape of kv cache to avoid the view of k_cache and v_cache. What's more, cache the metadata of k_cache and v_cache to avoid duplicative slice operations to improve performance. Signed-off-by: hw_whx <wanghexiang7@huawei.com>

…nd v_cache. (vllm-project#204) This PR changes the shape of kv cache to avoid the view of k_cache and v_cache. What's more, cache the metadata of k_cache and v_cache to avoid duplicative slice operations to improve performance. Signed-off-by: hw_whx <wanghexiang7@huawei.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

…nd v_cache. (vllm-project#204) This PR changes the shape of kv cache to avoid the view of k_cache and v_cache. What's more, cache the metadata of k_cache and v_cache to avoid duplicative slice operations to improve performance. Signed-off-by: hw_whx <wanghexiang7@huawei.com>

github-actions Bot added module:core module:quantization labels Feb 28, 2025

whx-sjtu force-pushed the change_kv_cache_shape branch 2 times, most recently from 894b170 to fb6686e Compare February 28, 2025 07:50

MengqingCao reviewed Feb 28, 2025

View reviewed changes

whx-sjtu force-pushed the change_kv_cache_shape branch 3 times, most recently from fbb81fb to cb3c3ab Compare February 28, 2025 10:07

wangxiyuan approved these changes Mar 3, 2025

View reviewed changes

feat: change the shape of kv cache to avoid the view of k_cache and v…

a2277a7

…_cache; cache the metadata of k_cache and v_cache to avoid replicated slice operation Signed-off-by: hw_whx <wanghexiang7@huawei.com>

whx-sjtu force-pushed the change_kv_cache_shape branch from cb3c3ab to a2277a7 Compare March 5, 2025 02:00

wangxiyuan approved these changes Mar 5, 2025

View reviewed changes

wangxiyuan merged commit 0d34634 into vllm-project:main Mar 5, 2025

wangxiyuan mentioned this pull request Jan 26, 2026

[Community] Nominate whx-sjtu as maintainer #6268

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Change the shape of kv_cache to avoid view of k_cache and v_cache.#204

[Performance] Change the shape of kv_cache to avoid view of k_cache and v_cache.#204
wangxiyuan merged 1 commit intovllm-project:mainfrom
whx-sjtu:change_kv_cache_shape

whx-sjtu commented Feb 28, 2025

Uh oh!

MengqingCao Feb 28, 2025

Uh oh!

whx-sjtu Feb 28, 2025

Uh oh!

wangxiyuan commented Mar 3, 2025

Uh oh!

wangxiyuan commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

whx-sjtu commented Feb 28, 2025

Uh oh!

MengqingCao Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Mar 3, 2025

Uh oh!

wangxiyuan commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants