Skip to content

Conversation

@robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Mar 22, 2025

SUMMARY:

  • Implement PDController
    • conforms to EngineClient protocol (thus can be used from OpenAI Server)
    • Send prefill request to Prefill PDWorker, waits
    • Sends decode request to Decode PDWorker, forwards responses to user
  • Implement PDWorker - which wraps AsyncLLM to get messages from PDController
  • ZMQ based messaging between the two

Design Considerations

NOTE: these are intended as prototype, we may ultimately need to move these implementations into higher level frameworks like DYNAMO or otherwise

Successor to: https://github.com/vllm-project/vllm/pull/11791/files - @panf2333 happy to move this PR under your account or add you as co-author, whatever your prefer

image

panf2333 and others added 30 commits March 21, 2025 08:15
Signed-off-by: clark <[email protected]>
2.To more accurately reflect its purpose, we will rename connect.py to disagg_connector.py.

Signed-off-by: clark <[email protected]>
…oy(linger=0) for immediate termination

Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: clark <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Robert Shaw added 2 commits March 22, 2025 18:51
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having a connector to connect 1P1D, can we have a manager that listens to P/D engine or entrypoint process register? then we can have dynamic XpYd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing. Another design would be to have It might make sense to have static P and D service that has workers scale out from there. This might work better inside k8s. Need to think more about it and review the design docs in more detail

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having a connector to connect 1P1D, can we have a manager that listens to P/D engine or entrypoint process register? then we can have dynamic XpYd.

will it be merged?

connector_addr: str,
model_name: str
) -> AsyncIterator[PDEngine]:
engine = PDEngine(prefill_addr, decode_addr, connector_addr, model_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can set up a side thread to run a busy loop, to receive register / unregister / heartbeat signals from entrypoint (or i would name it api servers) and engines.

Signed-off-by: Robert Shaw <[email protected]>
@robertgshaw2-redhat robertgshaw2-redhat changed the title [P/D Disaggregation][Prototype] ZMQ Proxy [P/D Disaggregation][Prototype] Proxy Mar 23, 2025
@robertgshaw2-redhat robertgshaw2-redhat changed the title [P/D Disaggregation][Prototype] Proxy [P/D Disaggregation] ZMQ Controller and Worker Mar 23, 2025
@robertgshaw2-redhat robertgshaw2-redhat changed the title [P/D Disaggregation] ZMQ Controller and Worker [P/D Disaggregation] PDController and PDWorker Prototype Mar 24, 2025
@robertgshaw2-redhat robertgshaw2-redhat changed the title [P/D Disaggregation] PDController and PDWorker Prototype [P/D Disaggregation] PDController and PDWorker Prototype (1p1d) Mar 24, 2025
@panf2333
Copy link

panf2333 commented Mar 24, 2025

SUMMARY:

* Implement `PDController`
  
  * conforms to `EngineClient` protocol (thus can be used from OpenAI Server)
  * Send prefill request to Prefill `PDWorker`, waits
  * Sends decode request to Decode `PDWorker`, forwards responses to user

* Implement `PDWorker` - which wraps `AsyncLLM` to get messages from `PDController`

* ZMQ based messaging between the two

Design Considerations

NOTE: these are intended as prototype, we may ultimately need to move these implementations into higher level frameworks like DYNAMO or otherwise

Successor to: https://github.com/vllm-project/vllm/pull/11791/files - @panf2333 happy to move this PR under your account or add you as co-author, whatever your prefer
image

@robertgshaw2-redhat Thanks Robert, it'd be great if you can move this PR to our account. Let me know if you don't have the write permission.

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Aug 4, 2025
@puppetm4st3r
Copy link

🚀

@github-actions github-actions bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels Aug 14, 2025
@pplmx
Copy link

pplmx commented Aug 27, 2025

Any progress?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend kv-connector unstale Recieved activity after being labelled stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants