[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs by Jialin · Pull Request #24964 · vllm-project/vllm

Jialin · 2025-09-16T12:31:33Z

Purpose

Majority of the the Generation 0 collect objects are related to KVCacheBlocks, and we found most of them are actually referring to empty blocks.

In this PR, we mainly made 2 changes

change KVCacheBlocks.blocks from tuple[list] to tuple[tuple] to indicate immutable after creating
pre-construct a empty KVCacheBlock member variable in KVCacheManager for reuse (in order to avoid GC)

Test Plan & Test Result

Model: facebook/opt-125m
Prefill-Heavy workload: Input 2000, Output 48
Decode-Heavy workload: Input 48, Output 2000

We could see, GC costs are significantly smaller with the PR (especially for decode-heavy workload). It could roughly convert to 3-4% throughput improvements.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Jialin · 2025-09-16T16:18:38Z

Resolve #24321

Jialin · 2025-09-19T18:49:43Z

Discussed with @heheda12345 offline. The main discussion point is to come up with solutions to minimize the code signature changes and avoid if else:

A potential solution would be let the allocate_new_blocks interface to returned a static KVCacheBlocks contains ImmutableEmptyList (where ImmutableEmptyList extends list with all mutations got removed). In this way, all signatures are kept as is, and all object allocations around empty KVCacheBlocks are removed.

Change the PR to draft before addressing the concern.

heheda12345

Nice catch. Can't imagine it can be implemented in such a clean way.

vllm/v1/core/kv_cache_manager.py

njhill

Thanks @Jialin

vllm/v1/core/kv_cache_coordinator.py

vllm/v1/core/kv_cache_manager.py

mergify · 2025-10-02T18:50:08Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Jialin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

heheda12345 · 2025-10-06T03:34:48Z

@Jialin can you rebase the PR?
See https://vllm-dev.slack.com/archives/C07R5Q1Q2BB/p1759663228844749 for details.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

vllm/v1/core/kv_cache_coordinator.py

houseroad

Looks good to me.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

njhill

Thanks @Jialin, LGTM too, sorry for the delay.

vllm/v1/core/kv_cache_manager.py

…mitigate GC costs (vllm-project#24964) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

…mitigate GC costs (vllm-project#24964) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

…mitigate GC costs (vllm-project#24964) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…mitigate GC costs (vllm-project#24964) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

mergify bot added frontend multi-modality Related to multi-modality (#4194) v1 tpu Related to Google TPUs and removed tpu Related to Google TPUs labels Sep 16, 2025

Jialin changed the title ~~Gc none block~~ [Core] Replace empty list with None in KVCacheBlocks for GC optimization Sep 16, 2025

Jialin marked this pull request as ready for review September 16, 2025 14:14

Jialin requested a review from heheda12345 as a code owner September 16, 2025 14:14

Jialin force-pushed the gc_none_block branch from 200514f to aa2e98d Compare September 16, 2025 16:14

Jialin requested review from ApostaC, WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 19, 2025 12:28

Jialin force-pushed the gc_none_block branch 2 times, most recently from 3967910 to 01c383f Compare September 19, 2025 12:37

Jialin marked this pull request as draft September 19, 2025 18:50

Jialin force-pushed the gc_none_block branch from 98c812b to ab0ecb6 Compare September 28, 2025 14:42

Jialin marked this pull request as ready for review September 28, 2025 14:55

heheda12345 reviewed Sep 30, 2025

View reviewed changes

vllm/v1/core/kv_cache_manager.py Outdated Show resolved Hide resolved

vllm/v1/core/kv_cache_manager.py Outdated Show resolved Hide resolved

njhill reviewed Oct 1, 2025

View reviewed changes

vllm/v1/core/kv_cache_coordinator.py Outdated Show resolved Hide resolved

vllm/v1/core/kv_cache_manager.py Outdated Show resolved Hide resolved

vllm/v1/core/kv_cache_manager.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Oct 2, 2025

Jialin force-pushed the gc_none_block branch from 4ffa0db to 2bbb718 Compare October 3, 2025 18:39

mergify bot removed the needs-rebase label Oct 3, 2025

Jialin added 7 commits October 13, 2025 11:42

code comment

ba94126

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Address comments

371f2f7

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

tuple[list[KVCacheBlock]] -> tuple[Sequence[KVCacheBlock]]

dbfbc6c

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Fix import

e08e898

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

format

e1bba14

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Aoivd unnecessary iterations and copies in KVCacheBlocks.__add__

692acf6

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Get rid of custom type and use Sequence[KVCacheBlock] directly

4b781af

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Jialin force-pushed the gc_none_block branch from 0479b95 to 4b781af Compare October 13, 2025 18:45

mergify bot removed the needs-rebase label Oct 13, 2025

fix precommit error

80d758b

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

houseroad reviewed Oct 14, 2025

View reviewed changes

vllm/v1/core/kv_cache_coordinator.py Outdated Show resolved Hide resolved

houseroad approved these changes Oct 14, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2025

Jialin added 2 commits October 14, 2025 00:02

Address comments

bad5bf3

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Remove unrelated changes

1f110b1

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

njhill approved these changes Oct 14, 2025

View reviewed changes

vllm/v1/core/kv_cache_manager.py Show resolved Hide resolved

njhill merged commit acaa2c0 into vllm-project:main Oct 14, 2025
45 checks passed

Jialin mentioned this pull request Oct 14, 2025

[Easy] Get rid of unnecessary paraenthesis in kv_cache_manager #26842

Merged

5 tasks

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Core] Reuse empty block lists whenever possible in KVCacheBlocks to …

2ec139d

…mitigate GC costs (vllm-project#24964) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Core] Reuse empty block lists whenever possible in KVCacheBlocks to …

d8be403

…mitigate GC costs (vllm-project#24964) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Core] Reuse empty block lists whenever possible in KVCacheBlocks to …

85a6c50

…mitigate GC costs (vllm-project#24964) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Core] Reuse empty block lists whenever possible in KVCacheBlocks to …

68ed91c

…mitigate GC costs (vllm-project#24964) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

ywang96 mentioned this pull request Feb 9, 2026

[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching #34183

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs#24964

[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs#24964
njhill merged 12 commits intovllm-project:mainfrom
Jialin:gc_none_block

Jialin commented Sep 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

Jialin commented Sep 16, 2025

Uh oh!

Jialin commented Sep 19, 2025 •

edited

Loading

Uh oh!

heheda12345 left a comment

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 2, 2025

Uh oh!

heheda12345 commented Oct 6, 2025

Uh oh!

Uh oh!

houseroad left a comment

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Jialin commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Test Result

Uh oh!

Jialin commented Sep 16, 2025

Uh oh!

Jialin commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 2, 2025

Uh oh!

heheda12345 commented Oct 6, 2025

Uh oh!

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jialin commented Sep 16, 2025 •

edited by github-actions bot

Loading

Jialin commented Sep 19, 2025 •

edited

Loading