[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead#28245
[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead#28245zhuohan123 merged 5 commits intovllm-project:mainfrom
Conversation
|
Resolve #28239 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
cad1650 to
ed8546f
Compare
|
Thanks @Jialin, the change LGTM but numpy array isn't always a GC win since objects are created when accessing the elements (depends on how/where it's used). For these optimizations are there workloads that we can demonstrate measurable perf improvement? |
You're right. If later on, we continue using .tolist() on the numpy array, then we're just delaying the GC cost instead. But I think most of the time, we're using the nested list in the following way And the ideal usage should be the following, as each row_list is short living, and deleted right after each iteration. As GC0 is triggered based on (# allocated) - (# deallocated) >= threshold, the former one would garentee to trigger GC0 if the batch size is larger than threshold. While the later one, would most likely avoid GC), as (# allocated) - (# deallocated) is only 1 in the later approach.
We had internal RL use case to justify the win. But let me also verify the win via small model and large batch size setup with logprob enabled. |
Head branch was pushed to by a user without write access
ed8546f to
b1d02bd
Compare
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
b1d02bd to
d9ba4cb
Compare
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
vllm-project#28245) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>
vllm-project#28245) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Purpose
LogprobsLists would introduced 3 nested list[list[int]] which would invoke severe GC costs for large batch size use cases.
In this PR, we're simply changing the nested list to np.ndarray, and ideally the interface should be mostly identical compared to the original one.
Test Plan & Test Result
Ensure logprob e2e testing is still running, and we've confirmed the types are changed in LogprobsProcessor._update_sample_logprobs.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.