Deepseek-v4-Pro share expert tp1 on H20#23911
Closed
zhangxiaolei123456 wants to merge 79 commits into
Closed
Conversation
Co-authored-by: Baizhou Zhang <baizhouzhang@radixark.ai> Co-authored-by: Baizhou Zhang <baizhou.zhang@radixark.ai> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: DarkSharpness <2040703891@qq.com> Co-authored-by: DarkSharpness <76582120+DarkSharpness@users.noreply.github.com> Co-authored-by: Fridge003 <sobereddiezhang@gmail.com> Co-authored-by: Ke Bao <26454835+ispobock@users.noreply.github.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com> Co-authored-by: Mingyi Lu <wisclmy0611@gmail.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com> Co-authored-by: Yueming Yuan <yy28@illinois.edu> Co-authored-by: Yueming Yuan <yym022502@gmail.com> Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: yueming-yuan <yym022502@gmail.com>
This reverts commit d40ca83.
This reverts commit 0d6856b.
…s bypassing) (cherry picked from commit f2fb9795d1b4f0609bdf5c1339b542551e66ad69)
… events + debug_prev_state + silence none Re-port from feat/debug_prefill_delayer commit 1b02e2d4f. Changes: - _NegotiateOutput.debug_prev_state field for wait_success/wait_timeout timing - _record_single_pass_result: print no_wait/wait_success/delay/wait_timeout events - silence prefillable_status==none branch (was log explosion under decode-log-interval=1) - Computed _dbg_wait_seconds/_dbg_forward_passes from next_state OR debug_prev_state Gated by SGLANG_PREFILL_DELAYER_DEBUG_LOG=1. forward_pass_id alignment via existing built-in SGLANG_LOG_FORWARD_ITERS=1 (no extra patch needed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This PR #23686 implements a TP16 deployment of deepseekv4-pro on the SM90, but since Share Expert cannot be deployed using TP16, this PR implements a TP1 deployment of Share Expert.
Co-authored-by: shiyu7
Modifications
Accuracy Tests
Command
GSM8K
MMLU
longbench_v2
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci