Skip to content

[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead#28245

Merged
zhuohan123 merged 5 commits intovllm-project:mainfrom
Jialin:logprob-nested-list
Nov 11, 2025
Merged

[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead#28245
zhuohan123 merged 5 commits intovllm-project:mainfrom
Jialin:logprob-nested-list

Conversation

@Jialin
Copy link
Copy Markdown
Collaborator

@Jialin Jialin commented Nov 6, 2025

Purpose

LogprobsLists would introduced 3 nested list[list[int]] which would invoke severe GC costs for large batch size use cases.

In this PR, we're simply changing the nested list to np.ndarray, and ideally the interface should be mostly identical compared to the original one.

Test Plan & Test Result

Ensure logprob e2e testing is still running, and we've confirmed the types are changed in LogprobsProcessor._update_sample_logprobs.

HF_HUB_DISABLE_XET=1 pytest -s tests/samplers/test_ranks.py 
Screenshot 2025-11-06 at 12 43 28 PM
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the v1 label Nov 6, 2025
@Jialin
Copy link
Copy Markdown
Collaborator Author

Jialin commented Nov 6, 2025

Resolve #28239

@Jialin Jialin marked this pull request as ready for review November 6, 2025 20:44
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@zhuohan123 zhuohan123 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2025
@Jialin Jialin force-pushed the logprob-nested-list branch from cad1650 to ed8546f Compare November 6, 2025 23:28
@zhuohan123 zhuohan123 enabled auto-merge (squash) November 6, 2025 23:36
@njhill
Copy link
Copy Markdown
Member

njhill commented Nov 8, 2025

Thanks @Jialin, the change LGTM but numpy array isn't always a GC win since objects are created when accessing the elements (depends on how/where it's used).

For these optimizations are there workloads that we can demonstrate measurable perf improvement?

@Jialin
Copy link
Copy Markdown
Collaborator Author

Jialin commented Nov 8, 2025

Thanks @Jialin, the change LGTM but numpy array isn't always a GC win since objects are created when accessing the elements (depends on how/where it's used).

You're right. If later on, we continue using .tolist() on the numpy array, then we're just delaying the GC cost instead. But I think most of the time, we're using the nested list in the following way

nested_list = tensor_numpy.tolist()
for row in nested_list:
  # process row

And the ideal usage should be the following, as each row_list is short living, and deleted right after each iteration.

for row in tensor_numpy:
  row_list = row.tolist()
  # process row_list

As GC0 is triggered based on (# allocated) - (# deallocated) >= threshold, the former one would garentee to trigger GC0 if the batch size is larger than threshold. While the later one, would most likely avoid GC), as (# allocated) - (# deallocated) is only 1 in the later approach.

For these optimizations are there workloads that we can demonstrate measurable perf improvement?

We had internal RL use case to justify the win. But let me also verify the win via small model and large batch size setup with logprob enabled.

auto-merge was automatically disabled November 10, 2025 14:22

Head branch was pushed to by a user without write access

@Jialin Jialin force-pushed the logprob-nested-list branch from ed8546f to b1d02bd Compare November 10, 2025 14:22
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
@Jialin Jialin force-pushed the logprob-nested-list branch from b1d02bd to d9ba4cb Compare November 10, 2025 20:38
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
@zhuohan123 zhuohan123 merged commit 4228be7 into vllm-project:main Nov 11, 2025
46 checks passed
geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025
vllm-project#28245)

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: George D. Torres <gdavtor@gmail.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants