-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[P/D Disaggregation] PDController and PDWorker Prototype (1p1d)
#15343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
2.To more accurately reflect its purpose, we will rename connect.py to disagg_connector.py. Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
…oy(linger=0) for immediate termination Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having a connector to connect 1P1D, can we have a manager that listens to P/D engine or entrypoint process register? then we can have dynamic XpYd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this after https://github.com/vllm-project/vllm/pull/12957/commits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thing. Another design would be to have It might make sense to have static P and D service that has workers scale out from there. This might work better inside k8s. Need to think more about it and review the design docs in more detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having a connector to connect 1P1D, can we have a manager that listens to P/D engine or entrypoint process register? then we can have dynamic XpYd.
will it be merged?
| connector_addr: str, | ||
| model_name: str | ||
| ) -> AsyncIterator[PDEngine]: | ||
| engine = PDEngine(prefill_addr, decode_addr, connector_addr, model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can set up a side thread to run a busy loop, to receive register / unregister / heartbeat signals from entrypoint (or i would name it api servers) and engines.
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: [email protected] <[email protected]>
PDController and PDWorker Prototype
PDController and PDWorker PrototypePDController and PDWorker Prototype (1p1d)
@robertgshaw2-redhat Thanks Robert, it'd be great if you can move this PR to our account. Let me know if you don't have the write permission. |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
🚀 |
|
Any progress? |

SUMMARY:
PDControllerEngineClientprotocol (thus can be used from OpenAI Server)PDWorker, waitsPDWorker, forwards responses to userPDWorker- which wrapsAsyncLLMto get messages fromPDControllerDesign Considerations
NOTE: these are intended as prototype, we may ultimately need to move these implementations into higher level frameworks like DYNAMO or otherwise
Successor to: https://github.com/vllm-project/vllm/pull/11791/files - @panf2333 happy to move this PR under your account or add you as co-author, whatever your prefer