[test/doc] make NixlConnector example more clear#24249
Conversation
There was a problem hiding this comment.
Code Review
This pull request improves the clarity of the NixlConnector integration test script. The changes correctly distinguish between kv_producer and kv_consumer roles, which is more explicit than the previous kv_both. Additionally, the inclusion of the UCX_NET_DEVICES=all environment variable serves as a useful hint for users transitioning from NCCL to UCX for communication configuration. The formatting of the command construction has also been improved for better readability. The changes are sound and achieve the stated goal of making the example clearer.
chaunceyjiang
left a comment
There was a problem hiding this comment.
Thanks~
#21800 (comment)
It seems there’s still a lack of tutorials on NixlConnector’s xPyD. Could you help add one?
NickLucche
left a comment
There was a problem hiding this comment.
Thanks for the PR @panpan0000 !
I agree with you the lack of documentation for the NixlConnector can be frustrating, but I don't think editing this script is the best way towards clarity.
I would be happier to see a more specific docs page on the topic, with a list of references eg nixl unit tests, llm-d guides https://github.com/llm-d/llm-d/tree/dev/guides/pd-disaggregation and basic nixl setup.
I can help with some Justfiles to get started too.
(1) we can tell between consumer and producer role
(2) tell user NCCL_* environment variables are no longer applicable to NixlConnector, but UCX replaces NCCL, so UCX_* variable should be used instead.
1 - the fact you can specify kv_both here is a "feature not a bug" as the connector makes no assumption about its role providing symmetric functionality.
2 - This isn't generally true as nixl supports backends other than ucx, although ucx is indeed the main transport library.
d4a3eeb to
85a463f
Compare
|
Thanks @NickLucche , the doc already added per @chaunceyjiang's suggestion. can you please help to review again, thanks |
85a463f to
76b39e0
Compare
c89986a to
2af5028
Compare
|
Thank you @hmellor , all fixed |
NickLucche
left a comment
There was a problem hiding this comment.
Thank you for the doc page!!
Left some comments
|
Thank you very much for the work of all the experts. I have tried to understand your work, but I find it somewhat difficult. SGLang and Dynamo have already provided PD-disaggregated inference tutorials that are very user-friendly for readers. Perhaps these can bring better inspiration to everyone. https://docs.sglang.ai/advanced_features/pd_disaggregation.html |
|
Great point @Alan-D-Chen ! |
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
4c44ac0 to
cd77ea7
Compare
|
Thank you for your time again, @NickLucche , your comments are all fixed :-) |
NickLucche
left a comment
There was a problem hiding this comment.
lgtm, thanks for contributing @panpan0000 !
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Lots of thanks for kind developers. 🏵️🏵️🏵️🏵️🏵️🏵️ |
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Purpose
NixlConnector are lack of tutorial documents, and the only reference,
per https://github.com/vllm-project/vllm/blob/main/docs/features/disagg_prefill.md?plain=1#L26
is the test code.
So it's better to make this code more clear.
NCCL_*environment variables are no longer applicable to NixlConnector, but UCX replaces NCCL, soUCX_*variable should be used instead.example,
UCX_TLSorUCX_NET_DEVICESare the way to configurate underlaying communication device or method,NCCL_IB_HCANCCL_SOCKET_IFNAMEare not applicable.So in my PR,
UCX_NET_DEVICES=allis just a "Hint" to people to be aware of that.If you want me to add 2 more additional Tips:
VLLM_NIXL_SIDE_CHANNEL_HOSTvariables , which is helpful when P and D are in diff machinesVLLM_NIXL_SIDE_CHANNEL_*for Decoder , since it's just needed for Prefiller.I'm glad to follow up