fix hang with pause and collectives#38761
Draft
hao-aaron wants to merge 2 commits intovllm-project:mainfrom
Draft
fix hang with pause and collectives#38761hao-aaron wants to merge 2 commits intovllm-project:mainfrom
hao-aaron wants to merge 2 commits intovllm-project:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request simplifies the request wave coordination logic by removing the explicit "FIRST_REQ" notification from the engine core client. Instead, the engine core now handles notifying the coordinator when an idle, unpaused engine receives a request, ensuring consistent state transitions. I have no feedback to provide as there were no review comments to evaluate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
The problem can be described as follows:
So we need to respect the pause state when sending “FIRST_REQ" from the frontend. If we are paused, we shouldn’t send a start_dp_wave.
Okay, so the next obvious step is we query the engines, if any are in a paused state, then we send the request but don’t broadcast start wave. This can result in problems if our engines are not in a consistent state.
We could also keep a paused state variable on the frontend side, but this would result in the same problem above.
The point is, even if we are an engine core who is paused, we still need to respect dp wave start requests because other engines may already be in the model call and need us to participate.
The deadlock in both of the above cases is a result of sending a start_dp_wave when we are not supposed to be stepping, or not sending start_dp_wave when we are supposed to be stepping. The frontend is unable to obtain the information of whether we should actually step or not, this information is only known by the engine that the request is eventually routed to. Thus, we need to handle the sending of "FIRST_REQ" on the engine core side rather than the frontend side.
Lets say we are an engine, and we get a request while engines_running = False. There are two possibilities:
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.