Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize maximum threads to saturate decoding capacity #3

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

JoeZijunZhou
Copy link
Collaborator

  • Number of RPC handlers worker threads should be at least equal to the decoding batch size to fully saturate the decoding queue.
  • Default threads to the total number of concurrent allowed decodes, to make sure we can fully saturate the model.
  • Set default minimum to 64.
  • Add error handling when queue is out of capacity.

@JoeZijunZhou JoeZijunZhou merged commit ccdb782 into main Mar 5, 2024
3 checks passed
@JoeZijunZhou JoeZijunZhou deleted the zijun/optimize-thread branch March 5, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant