[Core] Avoid list[int] in EngineCoreOutput for GC efficiency#29033
[Core] Avoid list[int] in EngineCoreOutput for GC efficiency#29033Jialin wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request effectively addresses garbage collection overhead by replacing list[int] with np.ndarray for new_token_ids in EngineCoreOutput. The changes are consistently applied across the scheduler, engine, output processor, and metrics components, which is great. My review includes one suggestion to further optimize performance by avoiding an unnecessary ndarray -> list -> ndarray conversion cycle in the scheduler, which could reduce overhead in this hot path.
| EngineCoreOutput( | ||
| request_id=req_id, | ||
| new_token_ids=new_token_ids, | ||
| new_token_ids=np.array(new_token_ids), |
There was a problem hiding this comment.
While this change is correct, there's an opportunity to further improve performance by avoiding the np.ndarray -> list -> np.ndarray conversion cycle.
Currently, sampled_token_ids[req_index] (which is a numpy array) is converted to a list using .tolist(), then processed by _update_request_with_output, and finally converted back to a numpy array here.
Consider refactoring _update_request_with_output to operate directly on numpy arrays. This would eliminate the intermediate list conversion, reducing overhead.
A refactored _update_request_with_output might look like this:
def _update_request_with_output(
self,
request: Request,
new_token_ids: np.ndarray,
) -> tuple[np.ndarray, bool]:
stopped = False
for num_new, output_token_id in enumerate(new_token_ids, 1):
request.append_output_token_ids(int(output_token_id))
stopped = check_stop(request, self.max_model_len)
if stopped:
# Truncate the array and break
new_token_ids = new_token_ids[:num_new]
break
return new_token_ids, stoppedThen, in update_from_output, you could work with numpy arrays throughout, avoiding the .tolist() and np.array() calls.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ) | ||
|
|
||
| new_token_ids = engine_core_output.new_token_ids | ||
| new_token_ids: list[int] = engine_core_output.new_token_ids.tolist() |
There was a problem hiding this comment.
Handle non-ndarray EngineCoreOutput token_ids
In process_outputs we now call engine_core_output.new_token_ids.tolist(), assuming every EngineCoreOutput carries a NumPy array. Several producers still emit plain Python lists (e.g., MockEngineCore.get_outputs in tests/v1/engine/utils.py builds EngineCoreOutput(new_token_ids=[...])), so when those outputs are processed—such as in the log-stats variants of the output-processor tests or any external mock engine—this line raises AttributeError before detokenization/logprob handling. Either convert producers to np.ndarray or accept list inputs here.
Useful? React with 👍 / 👎.
Pull request was closed
|
This pull request has merge conflicts that must be resolved before it can be |
|
@bangshengtang found that seems gc.disable + periodical manual gc.collect could yield better TPGS. So will turn down all the ongoing GC optimization efforts now. CC @njhill @zhuohan123 |
Purpose
There're <batch_size> of EngineCoreOutput generated per decode batch, in order to avoid GC, we should avoid using objects (e.g. list) inside EngineCoreOutput for large batch size scenarios.
This is a continuous effort on top of #28245
Test Plan & Test Result
CI signals
As a followup, we should really need to introduce e2e tests to ensure GC costs are not regressing.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.