-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix frontend multiprocessing hang #7217
[BugFix] Fix frontend multiprocessing hang #7217
Conversation
If the server dies, the frontend keeps waiting for it to come up for ever Signed-off-by: Max de Bayser <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maxdebayser
I wonder if there is a way we can send a health check rather than waiting on timeout. 1000 seconds is 15minutes, this is a very long time to wait before detecting failure. But 1000 seconds is also not that long if we are downloading L405 |
Actually the unit is milliseconds |
I see - I think this could be a problem b/c download times can be much longer than the timeout here. Im going to post an alternate proposal ^Ignore I see the polling loop now |
yeah this effectively just polling once per second, maybe add a comment since that confused me too initially… I think some of this can be refactored a bit soon anyhow |
Okay, I see now where the polling is |
Alternatively, we could go with the following: |
Signed-off-by: Max de Bayser <[email protected]>
…nto fix_multiprocess_hang
I think your approach is better than mine. Can you please add a test case for this? |
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Thanks! I've added a new file for the test, I wasn't sure if there's an existing test file that is a good fit for this. |
Ill wait for nick's final signoff, but LGTM. Thanks for the fix! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maxdebayser
Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Signed-off-by: Alvant <[email protected]>
Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Robert Shaw <[email protected]>
Now that the OpenAI server has optional multiprocessing, there is a hang that can happen if the backend dies during initialization and never replies to the
IS_SERVER_READY
message sent inasync_engine_client.setup()
.This PR add an optional timeout to
_send_one_way_rpc_request
so that during the initialization we can check periodically if the server process is still running while we wait for the reply to theIS_SERVER_READY
message.FIX #7213
FYI @njhill @robertgshaw2-neuralmagic