Skip to content

Conversation

@gshtras
Copy link
Collaborator

@gshtras gshtras commented Jul 31, 2024

Fix for the issue where a method from multiprocess gpu executor was being called on a regular gpu executor

Additional toggles to tweak the sync openai server request batching to not have the decode interrupted by prefill too often

logger.info("No unfinished requests. Waiting...")
(request_id, prompt, sampling_params) = self.input_queue.get()
if self.need_restart:
if self.need_restart and isinstance(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest syncing this logic with line 101, or at least a comment clarifying that need_restart is produced but not consumed in the single-GPU GPUExecutor case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having the if isinstance where this value is set is a micro-optimization :p

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments have no performance impact 😼 Will leave this convo up in the event some confused person noses around this part of the code.

Copy link
Collaborator

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it

@gshtras gshtras merged commit 3e480e9 into main Aug 2, 2024
@gshtras gshtras deleted the greg/server_tweaks branch August 2, 2024 18:26
gshtras added a commit that referenced this pull request Aug 13, 2024
…r request batching parameters (#114)

* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters

* Adding HTTP headers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants