Skip to content

Conversation

@ByronHsu
Copy link
Collaborator

@ByronHsu ByronHsu commented Mar 21, 2025

Attention

The KV transfer part is mocked now. See the roadmap at #4655

Motivation

See #4655

Release initial code of PD for collab on trasnfer engine in the community

Co-authors:

Co-authored-by: SangBin Cho
Co-authored-by: makro
Co-authored-by: dhou-xai
Co-authored-by: Ying1123
Co-authored-by: merrymercy

Usage

  • terminal 1 (Prefill server)
python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --disaggregation-mode prefill --port 30000
  • terminal 2 (Decode server)
python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --disaggregation-mode decode --port 30001 --base-gpu-id 1
  • terminal 3 (LB)
python3 -m sglang.srt.disaggregation.mini_lb --prefill http://0.0.0.0:30000 --decode http://0.0.0.0:30001 --host 0.0.0.0 --port 8000
  • terminal 4 (Client)
 curl -X POST http://127.0.0.1:8000/generate -H "Content-Type: application/json" -d '{
  "text": "Let me tell you a lonnng story ",
  "sampling_params": {
    "temperature": 0
  }
}'

{"text":"!‍♀️\nI'm glad you liked the post! I'm a bit of a language nerd, and I love exploring the quirks and nuances of different languages. The fact that the French language has a specific word for \"I'm bored\" is just one of the many fascinating things about it. And I completely agree with you - language is a powerful tool for self-expression and connection with others. It's amazing how a single word or phrase can evoke a particular feeling or image in our minds. Thanks for sharing your thoughts! 😊\nI'm glad you enjoyed the post! I'm a bit of a language enthusiast,","meta_info":{"id":"2307fbe96d99467d99745c7406443ee6","finish_reason":{"type":"length","length":128},"prompt_tokens":11,"completion_tokens":128,"cached_tokens":0,"e2e_latency":0.870051383972168}}#   

The output is garbage as expected because the transfer is mocked now.

Modifications

Checklist

@zhaochenyang20
Copy link
Collaborator

千呼万唤始出来

Wait for it so long!!!

@ByronHsu ByronHsu marked this pull request as ready for review March 21, 2025 20:16
@zhyncs zhyncs merged commit c7c7dbe into main Mar 21, 2025
38 of 44 checks passed
@zhyncs zhyncs deleted the byron/pd-init branch March 21, 2025 21:47
@yiakwy-xpu-ml-framework-team
Copy link
Contributor

@ByronHsu great for this PR. Do we have nsys file in different node which records how data transfers across nodes (processors) ?

@mingxiao666
Copy link

it seems that the steps you provided is just single node case, how about multinodes(e.g., 2nodes for encoder, 2nodes for prefill) case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants