Skip to content

[Core] Optimize expensive deepcopy in GPU model runner#31723

Closed
GOavi101 wants to merge 1 commit intovllm-project:mainfrom
GOavi101:optimize-deepcopy-scheduler-output
Closed

[Core] Optimize expensive deepcopy in GPU model runner#31723
GOavi101 wants to merge 1 commit intovllm-project:mainfrom
GOavi101:optimize-deepcopy-scheduler-output

Conversation

@GOavi101
Copy link
Copy Markdown
Contributor

@GOavi101 GOavi101 commented Jan 5, 2026

Summary

Replace expensive deepcopy() with selective shallow copying for scheduler_output when using async scheduling with speculative decoding. This optimization addresses the TODO comment at line 3108.

Performance Improvement

  • 13-37x faster copy operations (measured in unit tests)
  • Scales better with more requests (performance gap widens with larger workloads)
  • Reduces memory usage by ~90% (only copies 2 dicts instead of entire object graph)

Changes

  • Replaced deepcopy(scheduler_output) with selective copy:
    • Shallow copy the SchedulerOutput dataclass
    • Only deep copy the 2 dict fields that get modified:
      • num_scheduled_tokens (modified via dict[key] -= value)
      • scheduled_spec_decode_tokens (modified via dict.pop())
    • Share read-only fields (memory efficient and safe)

Testing

  • Verified correctness with unit tests comparing optimized copy vs deepcopy
  • Performance benchmarks confirm significant speedup

Related

Fixes the TODO comment at line 3108 (Ronald1995).

@mergify mergify bot added the v1 label Jan 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant performance optimization by replacing an expensive deepcopy of the scheduler_output object with a more efficient selective copy. This change is applied when using asynchronous scheduling with speculative decoding. The implementation correctly identifies the mutable fields that are modified and creates shallow copies of them, which is sufficient to prevent side effects while being much faster than a full deep copy. This is a well-executed optimization that should deliver the performance and memory benefits described.

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 5, 2026

Hi @GOavi101, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@GOavi101 GOavi101 force-pushed the optimize-deepcopy-scheduler-output branch from 99d2968 to f96e8b0 Compare January 5, 2026 16:42
@GOavi101
Copy link
Copy Markdown
Contributor Author

GOavi101 commented Jan 5, 2026

Hello @njhill, could you please take a look and review it?

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 5, 2026

Hi @GOavi101, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Replace expensive deepcopy() with selective shallow copying for
scheduler_output when using async scheduling with speculative decoding.

The optimization:
- Shallow copies the SchedulerOutput dataclass
- Only deep copies the 2 dict fields that get modified:
  * num_scheduled_tokens (modified via dict[key] -= value)
  * scheduled_spec_decode_tokens (modified via dict.pop())
- Shares read-only fields (memory efficient and safe)

Fixes the TODO comment at line 3108.

Signed-off-by: GOavi101 <avishek.official12@gmail.com>
@GOavi101 GOavi101 force-pushed the optimize-deepcopy-scheduler-output branch from f96e8b0 to b15c224 Compare January 5, 2026 18:20
@njhill
Copy link
Copy Markdown
Member

njhill commented Jan 6, 2026

Thanks for this @GOavi101, actually we're already simplifying this as part of #29821 so the code in question is disappearing anyhow.

@GOavi101 GOavi101 closed this Jan 6, 2026
@GOavi101 GOavi101 deleted the optimize-deepcopy-scheduler-output branch January 6, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants