-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support #24845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support #24845
Conversation
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
…pping asserts Signed-off-by: Sage Moore <[email protected]>
…sult in an empty second ubatch Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
|
@LucasWilkinson this commit introduces weired behaviour - the first http request with larger context is working normally but the subsequent requests are signifficantly slower. I have verified that it is this commit: cc1dc7e which should be this PR. a903669 is working normally |
… and Prefill support (vllm-project#24845) Signed-off-by: Sage Moore <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>
… and Prefill support (#24845) Signed-off-by: Sage Moore <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: yewentao256 <[email protected]>
|
|
||
| if not should_ubatch: | ||
| num_pad, num_tokens_across_dp = self.get_dp_padding(num_tokens) | ||
| num_tokens += num_pad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing this doesn't make the padding happen.
… and Prefill support (vllm-project#24845) Signed-off-by: Sage Moore <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: gaojc <[email protected]>
… and Prefill support (vllm-project#24845) Signed-off-by: Sage Moore <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
… and Prefill support (vllm-project#24845) Signed-off-by: Sage Moore <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>
… and Prefill support (vllm-project#24845) Signed-off-by: Sage Moore <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>
… and Prefill support (vllm-project#24845) Signed-off-by: Sage Moore <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
… and Prefill support (vllm-project#24845) Signed-off-by: Sage Moore <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Sage Moore <[email protected]> Co-authored-by: yewentao256 <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>
Purpose
Test Plan
lm_eval
Test Result
export VLLM_ALL2ALL_BACKEND=deepep_high_throughputexport VLLM_ALL2ALL_BACKEND=deepep_low_latencyHT Overlap Trace (2x8xH100)

Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.