Skip to content

[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs#24964

Merged
njhill merged 12 commits intovllm-project:mainfrom
Jialin:gc_none_block
Oct 14, 2025
Merged

[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs#24964
njhill merged 12 commits intovllm-project:mainfrom
Jialin:gc_none_block

Conversation

@Jialin
Copy link
Copy Markdown
Collaborator

@Jialin Jialin commented Sep 16, 2025

Purpose

Majority of the the Generation 0 collect objects are related to KVCacheBlocks, and we found most of them are actually referring to empty blocks.
Screenshot 2025-09-28 at 7 52 23 AM

In this PR, we mainly made 2 changes

  • change KVCacheBlocks.blocks from tuple[list] to tuple[tuple] to indicate immutable after creating
  • pre-construct a empty KVCacheBlock member variable in KVCacheManager for reuse (in order to avoid GC)

Test Plan & Test Result

Model: facebook/opt-125m
Prefill-Heavy workload: Input 2000, Output 48
Decode-Heavy workload: Input 48, Output 2000

We could see, GC costs are significantly smaller with the PR (especially for decode-heavy workload). It could roughly convert to 3-4% throughput improvements.
Screenshot 2025-09-28 at 7 41 05 AM


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added frontend multi-modality Related to multi-modality (#4194) v1 tpu Related to Google TPUs and removed tpu Related to Google TPUs labels Sep 16, 2025
@Jialin Jialin changed the title Gc none block [Core] Replace empty list with None in KVCacheBlocks for GC optimization Sep 16, 2025
@Jialin Jialin marked this pull request as ready for review September 16, 2025 14:14
@Jialin Jialin requested a review from heheda12345 as a code owner September 16, 2025 14:14
@Jialin
Copy link
Copy Markdown
Collaborator Author

Jialin commented Sep 16, 2025

Resolve #24321

@Jialin
Copy link
Copy Markdown
Collaborator Author

Jialin commented Sep 19, 2025

Discussed with @heheda12345 offline. The main discussion point is to come up with solutions to minimize the code signature changes and avoid if else:

  • A potential solution would be let the allocate_new_blocks interface to returned a static KVCacheBlocks contains ImmutableEmptyList (where ImmutableEmptyList extends list with all mutations got removed). In this way, all signatures are kept as is, and all object allocations around empty KVCacheBlocks are removed.

Change the PR to draft before addressing the concern.

@Jialin Jialin marked this pull request as draft September 19, 2025 18:50
@Jialin Jialin marked this pull request as ready for review September 28, 2025 14:55
Copy link
Copy Markdown
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Can't imagine it can be implemented in such a clean way.

Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jialin

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Oct 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Jialin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@heheda12345
Copy link
Copy Markdown
Collaborator

@Jialin can you rebase the PR?
See https://vllm-dev.slack.com/archives/C07R5Q1Q2BB/p1759663228844749 for details.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Copy link
Copy Markdown
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@houseroad houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2025
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jialin, LGTM too, sorry for the delay.

@njhill njhill merged commit acaa2c0 into vllm-project:main Oct 14, 2025
45 checks passed
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…mitigate GC costs (vllm-project#24964)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…mitigate GC costs (vllm-project#24964)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…mitigate GC costs (vllm-project#24964)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…mitigate GC costs (vllm-project#24964)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…mitigate GC costs (vllm-project#24964)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…mitigate GC costs (vllm-project#24964)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…mitigate GC costs (vllm-project#24964)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants