[Minor] Optimize cuda graph memory usage by esmeetu · Pull Request #2437 · vllm-project/vllm

esmeetu · 2024-01-14T09:33:29Z

This PR can reduce cuda graph captured memory usage by max_num_seqs parameter. For personal and test use case, there is no need to occupy up to all level batch size ([1,2...256]) cuda graphs caches. So we can inform user to tune the max_num_seqs to save memory.

Yard1

LGTM, thanks!

optimize memory usage by max_num_seqs parameter

02f5278

Yard1 approved these changes Jan 14, 2024

View reviewed changes

Yard1 merged commit 9f659bf into vllm-project:main Jan 14, 2024

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Jan 18, 2024

[Minor] Optimize cuda graph memory usage (vllm-project#2437)

fed9ac8

esmeetu deleted the optimize-graph-memory branch February 3, 2024 04:13

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

[Minor] Optimize cuda graph memory usage (vllm-project#2437)

f7dc794

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Minor] Optimize cuda graph memory usage#2437

[Minor] Optimize cuda graph memory usage#2437
Yard1 merged 1 commit intovllm-project:mainfrom
esmeetu:optimize-graph-memory

esmeetu commented Jan 14, 2024 •

edited

Loading

Uh oh!

Yard1 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

esmeetu commented Jan 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yard1 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

esmeetu commented Jan 14, 2024 •

edited

Loading