[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead by Jialin · Pull Request #28245 · vllm-project/vllm

Jialin · 2025-11-06T20:34:40Z

Purpose

LogprobsLists would introduced 3 nested list[list[int]] which would invoke severe GC costs for large batch size use cases.

In this PR, we're simply changing the nested list to np.ndarray, and ideally the interface should be mostly identical compared to the original one.

Test Plan & Test Result

Ensure logprob e2e testing is still running, and we've confirmed the types are changed in LogprobsProcessor._update_sample_logprobs.

HF_HUB_DISABLE_XET=1 pytest -s tests/samplers/test_ranks.py

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Jialin · 2025-11-06T20:35:18Z

Resolve #28239

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/v1/outputs.py

njhill · 2025-11-08T00:08:45Z

Thanks @Jialin, the change LGTM but numpy array isn't always a GC win since objects are created when accessing the elements (depends on how/where it's used).

For these optimizations are there workloads that we can demonstrate measurable perf improvement?

Jialin · 2025-11-08T01:04:41Z

Thanks @Jialin, the change LGTM but numpy array isn't always a GC win since objects are created when accessing the elements (depends on how/where it's used).

You're right. If later on, we continue using .tolist() on the numpy array, then we're just delaying the GC cost instead. But I think most of the time, we're using the nested list in the following way

nested_list = tensor_numpy.tolist()
for row in nested_list:
  # process row

And the ideal usage should be the following, as each row_list is short living, and deleted right after each iteration.

for row in tensor_numpy:
  row_list = row.tolist()
  # process row_list

As GC0 is triggered based on (# allocated) - (# deallocated) >= threshold, the former one would garentee to trigger GC0 if the batch size is larger than threshold. While the later one, would most likely avoid GC), as (# allocated) - (# deallocated) is only 1 in the later approach.

For these optimizations are there workloads that we can demonstrate measurable perf improvement?

We had internal RL use case to justify the win. But let me also verify the win via small model and large batch size setup with logprob enabled.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

vllm-project#28245) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>

vllm-project#28245) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

mergify bot added the v1 label Nov 6, 2025

Jialin marked this pull request as ready for review November 6, 2025 20:44

Jialin requested review from 22quinn, houseroad, njhill, yeqcharlotte and zhuohan123 November 6, 2025 20:45

chatgpt-codex-connector bot reviewed Nov 6, 2025

View reviewed changes

vllm/v1/outputs.py Outdated Show resolved Hide resolved

zhuohan123 approved these changes Nov 6, 2025

View reviewed changes

zhuohan123 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 6, 2025

Jialin force-pushed the logprob-nested-list branch from cad1650 to ed8546f Compare November 6, 2025 23:28

zhuohan123 enabled auto-merge (squash) November 6, 2025 23:36

auto-merge was automatically disabled November 10, 2025 14:22
Head branch was pushed to by a user without write access

Jialin force-pushed the logprob-nested-list branch from ed8546f to b1d02bd Compare November 10, 2025 14:22

Jialin added 2 commits November 10, 2025 12:37

[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead

f284ed0

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Call .cpu() before .numpy() to ensure the tensor is on CPU

d9ba4cb

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Jialin force-pushed the logprob-nested-list branch from b1d02bd to d9ba4cb Compare November 10, 2025 20:38

Jialin added 3 commits November 10, 2025 15:27

Convert numpy to list before appending

afe7186

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Fix MockEngineCore signature

9793b90

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

to('cpu') -> cpu()

cb327f8

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

zhuohan123 merged commit 4228be7 into vllm-project:main Nov 11, 2025
46 checks passed

wenxindongwork mentioned this pull request Nov 16, 2025

[Bug fix] Fix log probabilities handling vllm-project/tpu-inference#1114

Merged

Jialin mentioned this pull request Nov 19, 2025

[Core] Avoid list[int] in EngineCoreOutput for GC efficiency #29033

Closed

5 tasks

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead (

032ca5f

vllm-project#28245) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead#28245

[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead#28245
zhuohan123 merged 5 commits intovllm-project:mainfrom
Jialin:logprob-nested-list

Jialin commented Nov 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

Jialin commented Nov 6, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

njhill commented Nov 8, 2025

Uh oh!

Jialin commented Nov 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Jialin commented Nov 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Test Result

Uh oh!

Jialin commented Nov 6, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

njhill commented Nov 8, 2025

Uh oh!

Jialin commented Nov 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jialin commented Nov 6, 2025 •

edited by github-actions bot

Loading