[Core] Avoid KVCacheBlock.eq invocations in FreeKVCacheBlockQueue #21005

JialinOuyang-Meta · 2025-07-15T17:37:58Z

Summary:

Optimizations

As a common trick for doubly linked list implementation, introducing fake head and tail nodes would significantly reduce the implementation overhead, and help us to get rid of dataclass.eq comparison easily.

No dataclass.eq invocation
Shorter code
Branchless

All these combined should yield significant perf improvement for this piece of code.

Observations

Per vLLM profiling, kv_cache_manager.allocate_slots consumed non-negligible cost for each prefill.

|{F1980260529}|{F1980260481}|{F1980260497}|

By zooming in, we could see the stack of FreeKVCacheBlockQueue.popleft is non-trivial. popleft -> remove -> string.eq which is mainly coming from dataclasses (i.e. KVCacheBlock) equal comparison.

Per dataclasses python library doc

dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)

eq: If true (the default), an __eq__() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type.

If the class already defines __eq__(), this parameter is ignored.

Test Plan:

Result

Typically, block_size is set to 16, so in production usage, we might likely allocate 10-1000 blocks. In this range, the optimization gave us up to ~1ms TTFT savings (the improvements are more significant on long inputs).

Benchmark

After

Before
|

Stack

After

Before

Rollback Plan:

Reviewed By: CuiCoco

Differential Revision: D78292345

github-actions · 2025-07-15T17:38:06Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

facebook-github-bot · 2025-07-15T17:38:13Z

This pull request was exported from Phabricator. Differential Revision: D78292345

gemini-code-assist

Code Review

This pull request introduces a significant performance optimization to the FreeKVCacheBlockQueue by implementing a doubly linked list with sentinel nodes. This change effectively removes expensive __eq__ comparisons on KVCacheBlock dataclasses, which should improve performance as demonstrated by the new benchmark. The implementation is a classic and well-executed approach.

My review focuses on ensuring the robustness of this new implementation. I've identified a couple of areas where adding validation checks could prevent potential crashes from state inconsistencies, making the system more resilient. These changes should have a negligible performance impact while significantly improving debuggability and correctness guarantees.

vllm/v1/core/kv_cache_utils.py

facebook-github-bot · 2025-07-15T17:49:48Z

This pull request was exported from Phabricator. Differential Revision: D78292345

facebook-github-bot · 2025-07-15T17:53:57Z

This pull request was exported from Phabricator. Differential Revision: D78292345

…project#21005) Summary: Pull Request resolved: vllm-project#21005 # Optimizations As a common trick for doubly linked list implementation, introducing fake head and tail nodes would significantly reduce the implementation overhead, and help us to get rid of dataclass.__eq__ comparison easily. - No dataclass.__eq__ invocation - Shorter code - Branchless All these combined should yield significant perf improvement for this piece of code. # Observations Per vLLM profiling, kv_cache_manager.allocate_slots consumed non-negligible cost for each prefill. |{F1980260529}|{F1980260481}|{F1980260497}| By zooming in, we could see the stack of FreeKVCacheBlockQueue.popleft is non-trivial. popleft -> remove -> string.__eq__ which is mainly coming from dataclasses (i.e. KVCacheBlock) equal comparison. Per [dataclasses python library doc](https://docs.python.org/3/library/dataclasses.html#dataclasses.dataclass) ``` dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False) eq: If true (the default), an __eq__() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If the class already defines __eq__(), this parameter is ignored. ``` Test Plan: # Result Typically, block_size is set to 16, so in production usage, we might likely allocate 10-1000 blocks. In this range, the optimization gave us up to ~1ms TTFT savings (the improvements are more significant on long inputs). |After|Before| |{F1980286936}|{F1980286941}| Rollback Plan: Reviewed By: CuiCoco Differential Revision: D78292345 Signed-off-by: Jialin Ouyang <[email protected]>

Signed-off-by: Jialin Ouyang <[email protected]>

yeqcharlotte

LGTM

@JialinOuyang-Meta some graph in your test plan is broken could you fix it?

cc: @njhill @WoosukKwon could you take another look?

vllm/v1/core/kv_cache_utils.py

benchmarks/kv_cache/benchmark_block_pool.py

Jialin · 2025-07-17T21:02:03Z

resolve #21141

Signed-off-by: Jialin Ouyang <[email protected]>

njhill

Thanks @JialinOuyang-Meta! I have a few small comments, perhaps could be done as a follow-on.

Sorry for the delay, I was on vacation for the last week.

njhill · 2025-07-19T08:13:13Z

vllm/v1/core/kv_cache_utils.py

-        if not self.free_list_head:
+        if (self.fake_free_list_head.next_free_block
+                is self.fake_free_list_tail
+                or self.fake_free_list_head.next_free_block is None):


Why is this second check needed? Would self.fake_free_list_head.next_free_block ever be None ?

Logically, it should NOT be needed. But this would make pyre happy, otherwise, it would complain about we can't assign Optional[KVCacheBlock] to KVCacheBlock in L256.

Just curious, what's the typically way in vLLM to suppress pyre without such extra checks.

Using asserts or selective ignore directives

njhill · 2025-07-19T08:16:02Z

vllm/v1/core/kv_cache_utils.py

+        if self.fake_free_list_tail.prev_free_block is None:
+            raise RuntimeError(
+                "prev_free_block of fake_free_list_tail should always exist")


I don't think this check is needed, or it should be an assert

Ditto, only added to make pyre stop complaining against Optional[KVCacheBlock] -> KVCacheBlock assignments in L305.

That's fine but it should be an assert in this case.

njhill · 2025-07-19T08:18:11Z

vllm/v1/core/kv_cache_utils.py

+        if self.fake_free_list_head.next_free_block is None:
+            raise RuntimeError(
+                "next_free_block of fake_free_list_head should always exist")


Same comment. If we want integrity check here it should be an assert.

Similar reply as above. Would love to hear the best practice to suppress pyre without extra checks or computations.

And I will definitely address all your comments once I get some guidances on this point :)

assert should work the same way and is more correct since this is an integrity check.

njhill · 2025-07-19T08:20:56Z

vllm/v1/core/kv_cache_utils.py

+        # Create a fake head and a tail block for the doubly linked list to
+        # reduce branching in the code
+        #
+        # The implementation garenteed that the fake head and tail
+        # are NEVER got popped, so we could safely assume each real blocks
+        # in the queue has prev and next blocks.


suggested rewording of the comment

# Create a fake head tail blocks for the doubly linked list to # reduce branching in the code. # # The implementation guarantees that the fake head and tail # are NEVER popped, so we can safely assume each real block # in the queue has prev and next blocks.

Appreciate that! Will address all your comments in a followup PR.

Jialin · 2025-07-19T09:53:15Z

Thanks @JialinOuyang-Meta! I have a few small comments, perhaps could be done as a follow-on.

Sorry for the delay, I was on vacation for the last week.

No worry. And appreciate your inputs!

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: x22x22 <[email protected]>

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]>

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Paul Pak <[email protected]>

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]>

JialinOuyang-Meta requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners July 15, 2025 17:37

mergify bot added performance Performance-related issues v1 labels Jul 15, 2025

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

vllm/v1/core/kv_cache_utils.py Outdated Show resolved Hide resolved

vllm/v1/core/kv_cache_utils.py Outdated Show resolved Hide resolved

JialinOuyang-Meta force-pushed the export-D78292345 branch from 574454a to 53e5f05 Compare July 15, 2025 17:49

JialinOuyang-Meta force-pushed the export-D78292345 branch from 53e5f05 to f0a6c84 Compare July 15, 2025 17:49

JialinOuyang-Meta force-pushed the export-D78292345 branch from f0a6c84 to 36411c3 Compare July 15, 2025 17:54

JialinOuyang-Meta changed the title ~~Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue~~ [Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue Jul 15, 2025

JialinOuyang-Meta force-pushed the export-D78292345 branch from 36411c3 to 608f1f5 Compare July 15, 2025 18:32

JialinOuyang-Meta added 2 commits July 15, 2025 12:33

Address precommit errors

13254b4

Signed-off-by: Jialin Ouyang <[email protected]>

Address precommit failures

b061697

Signed-off-by: Jialin Ouyang <[email protected]>

yeqcharlotte approved these changes Jul 16, 2025

View reviewed changes

vllm/v1/core/kv_cache_utils.py Show resolved Hide resolved

benchmarks/kv_cache/benchmark_block_pool.py Show resolved Hide resolved

yeqcharlotte requested review from houseroad and zou3519 July 16, 2025 20:10

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 16, 2025

JialinOuyang-Meta mentioned this pull request Jul 17, 2025

[Performance]: Reduce overhead of FreeKVCacheBlockQueue #21140

Closed

4 tasks

Jialin mentioned this pull request Jul 17, 2025

[Performance]: Opportunities to speed up BlockPool processing #21141

Closed

5 tasks

JialinOuyang-Meta added 2 commits July 17, 2025 15:43

Fix unit tests

9446d1b

Signed-off-by: Jialin Ouyang <[email protected]>

Fix more unit tests

b25b558

Signed-off-by: Jialin Ouyang <[email protected]>

simon-mo merged commit 0f199f1 into vllm-project:main Jul 18, 2025
63 of 65 checks passed

njhill reviewed Jul 19, 2025

View reviewed changes

Jialin mentioned this pull request Jul 22, 2025

[Core] Minor comments and asserts changes in block pool #21351

Closed

4 tasks

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (…

4842821

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: x22x22 <[email protected]>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (…

9c95036

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (…

a57ec06

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (…

4a59192

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Paul Pak <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025

[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (…

07c2308

…vllm-project#21005) Signed-off-by: Jialin Ouyang <[email protected]>

Uh oh!

[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue #21005

[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue #21005

Uh oh!

Conversation

JialinOuyang-Meta commented Jul 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optimizations

Observations

Result

Benchmark

Stack

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

facebook-github-bot commented Jul 15, 2025

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jialin commented Jul 17, 2025

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jialin commented Jul 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[Core] Avoid KVCacheBlock.eq invocations in FreeKVCacheBlockQueue #21005

[Core] Avoid KVCacheBlock.eq invocations in FreeKVCacheBlockQueue #21005

JialinOuyang-Meta commented Jul 15, 2025 •

edited by github-actions bot

Loading