[Core] Avoid list[int] in EngineCoreOutput for GC efficiency by Jialin · Pull Request #29033 · vllm-project/vllm

Jialin · 2025-11-19T21:04:03Z

Purpose

There're <batch_size> of EngineCoreOutput generated per decode batch, in order to avoid GC, we should avoid using objects (e.g. list) inside EngineCoreOutput for large batch size scenarios.

This is a continuous effort on top of #28245

Test Plan & Test Result

CI signals

As a followup, we should really need to introduce e2e tests to ensure GC costs are not regressing.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

gemini-code-assist

Code Review

This pull request effectively addresses garbage collection overhead by replacing list[int] with np.ndarray for new_token_ids in EngineCoreOutput. The changes are consistently applied across the scheduler, engine, output processor, and metrics components, which is great. My review includes one suggestion to further optimize performance by avoiding an unnecessary ndarray -> list -> ndarray conversion cycle in the scheduler, which could reduce overhead in this hot path.

gemini-code-assist · 2025-11-19T21:06:00Z

vllm/v1/core/sched/scheduler.py

                    EngineCoreOutput(
                        request_id=req_id,
-                        new_token_ids=new_token_ids,
+                        new_token_ids=np.array(new_token_ids),


While this change is correct, there's an opportunity to further improve performance by avoiding the np.ndarray -> list -> np.ndarray conversion cycle.

Currently, sampled_token_ids[req_index] (which is a numpy array) is converted to a list using .tolist(), then processed by _update_request_with_output, and finally converted back to a numpy array here.

Consider refactoring _update_request_with_output to operate directly on numpy arrays. This would eliminate the intermediate list conversion, reducing overhead.

A refactored _update_request_with_output might look like this:

def _update_request_with_output( self, request: Request, new_token_ids: np.ndarray, ) -> tuple[np.ndarray, bool]: stopped = False for num_new, output_token_id in enumerate(new_token_ids, 1): request.append_output_token_ids(int(output_token_id)) stopped = check_stop(request, self.max_model_len) if stopped: # Truncate the array and break new_token_ids = new_token_ids[:num_new] break return new_token_ids, stopped

Then, in update_from_output, you could work with numpy arrays throughout, avoiding the .tolist() and np.array() calls.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-19T21:07:55Z

vllm/v1/engine/output_processor.py

            )

-            new_token_ids = engine_core_output.new_token_ids
+            new_token_ids: list[int] = engine_core_output.new_token_ids.tolist()


Handle non-ndarray EngineCoreOutput token_ids

In process_outputs we now call engine_core_output.new_token_ids.tolist(), assuming every EngineCoreOutput carries a NumPy array. Several producers still emit plain Python lists (e.g., MockEngineCore.get_outputs in tests/v1/engine/utils.py builds EngineCoreOutput(new_token_ids=[...])), so when those outputs are processed—such as in the log-stats variants of the output-processor tests or any external mock engine—this line raises AttributeError before detokenization/logprob handling. Either convert producers to np.ndarray or accept list inputs here.

Useful? React with 👍 / 👎.

mergify · 2025-11-21T17:02:29Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Jialin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Jialin · 2025-11-21T21:20:37Z

@bangshengtang found that seems gc.disable + periodical manual gc.collect could yield better TPGS. So will turn down all the ongoing GC optimization efforts now. CC @njhill @zhuohan123

[Core] Avoid list[int] in EngineCoreOutput for GC efficiency

3d1ea5a

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>

Jialin requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, markmc, njhill, robertgshaw2-redhat and ywang96 as code owners November 19, 2025 21:04

Jialin requested review from 22quinn and zhuohan123 November 19, 2025 21:04

mergify bot added the v1 label Nov 19, 2025

gemini-code-assist bot reviewed Nov 19, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 19, 2025

View reviewed changes

22quinn approved these changes Nov 19, 2025

View reviewed changes

22quinn enabled auto-merge (squash) November 19, 2025 22:05

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025

Jialin closed this Nov 20, 2025

auto-merge was automatically disabled November 20, 2025 21:37
Pull request was closed

Jialin reopened this Nov 20, 2025

Jialin marked this pull request as draft November 20, 2025 21:37

Jialin mentioned this pull request Nov 20, 2025

Revert "[Redo] #26368 (#28771)" #29121

Merged

5 tasks

mergify bot added the needs-rebase label Nov 21, 2025

Jialin closed this Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Avoid list[int] in EngineCoreOutput for GC efficiency#29033

[Core] Avoid list[int] in EngineCoreOutput for GC efficiency#29033
Jialin wants to merge 1 commit intovllm-project:mainfrom
Jialin:outputs

Jialin commented Nov 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 19, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 19, 2025

Uh oh!

Jialin Nov 19, 2025

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

Jialin commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Jialin commented Nov 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Jialin Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

Jialin commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jialin commented Nov 19, 2025 •

edited by github-actions bot

Loading