[Core] Optimize expensive deepcopy in GPU model runner#31723
[Core] Optimize expensive deepcopy in GPU model runner#31723GOavi101 wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant performance optimization by replacing an expensive deepcopy of the scheduler_output object with a more efficient selective copy. This change is applied when using asynchronous scheduling with speculative decoding. The implementation correctly identifies the mutable fields that are modified and creates shallow copies of them, which is sufficient to prevent side effects while being much faster than a full deep copy. This is a well-executed optimization that should deliver the performance and memory benefits described.
|
Hi @GOavi101, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
99d2968 to
f96e8b0
Compare
|
Hello @njhill, could you please take a look and review it? |
|
Hi @GOavi101, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Replace expensive deepcopy() with selective shallow copying for scheduler_output when using async scheduling with speculative decoding. The optimization: - Shallow copies the SchedulerOutput dataclass - Only deep copies the 2 dict fields that get modified: * num_scheduled_tokens (modified via dict[key] -= value) * scheduled_spec_decode_tokens (modified via dict.pop()) - Shares read-only fields (memory efficient and safe) Fixes the TODO comment at line 3108. Signed-off-by: GOavi101 <avishek.official12@gmail.com>
f96e8b0 to
b15c224
Compare
Summary
Replace expensive
deepcopy()with selective shallow copying forscheduler_outputwhen using async scheduling with speculative decoding. This optimization addresses the TODO comment at line 3108.Performance Improvement
Changes
deepcopy(scheduler_output)with selective copy:SchedulerOutputdataclassnum_scheduled_tokens(modified viadict[key] -= value)scheduled_spec_decode_tokens(modified viadict.pop())Testing
Related
Fixes the TODO comment at line 3108 (Ronald1995).