Skip to content

[Minor] Optimize cuda graph memory usage#2437

Merged
Yard1 merged 1 commit intovllm-project:mainfrom
esmeetu:optimize-graph-memory
Jan 14, 2024
Merged

[Minor] Optimize cuda graph memory usage#2437
Yard1 merged 1 commit intovllm-project:mainfrom
esmeetu:optimize-graph-memory

Conversation

@esmeetu
Copy link
Copy Markdown
Member

@esmeetu esmeetu commented Jan 14, 2024

This PR can reduce cuda graph captured memory usage by max_num_seqs parameter. For personal and test use case, there is no need to occupy up to all level batch size ([1,2...256]) cuda graphs caches. So we can inform user to tune the max_num_seqs to save memory.

Copy link
Copy Markdown
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Yard1 Yard1 merged commit 9f659bf into vllm-project:main Jan 14, 2024
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Jan 18, 2024
@esmeetu esmeetu deleted the optimize-graph-memory branch February 3, 2024 04:13
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants