-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Deepseek V4 #23882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Deepseek V4 #23882
Changes from 1 commit
Commits
Show all changes
428 commits
Select commit
Hold shift + click to select a range
3cc1c8b
add dsv4 flash coherence sanity tests
hnyls2002 623a314
add server sanity kit + dsv4 flash sanity tests
hnyls2002 df00677
nextn subclass owns post_load_weights is_nextn
hnyls2002 7d38986
remove deprecated environ
hnyls2002 ed98c6f
fix lint
hnyls2002 e055a5b
reduce one duplicate
hnyls2002 626c862
revert noisy log prefix in _try_load_model_cls
hnyls2002 c68c649
restore _build_hisparse_decode_batch docstring
hnyls2002 ce2ff84
Add manual AIME25 tests for DeepSeek-V4 cookbook launch configs (#24104)
Fridge003 5cb0a57
inline Compressor.compress_fused into Compressor.forward (single caller)
fzyzcjy f3e040b
remove redundant self.rotary_emb in MQALayer/Compressor/C4Indexer
fzyzcjy ba7ef7b
remove unused Compressor.overlap_transform / overlap_transform_decode
fzyzcjy 567b6a1
remove duplicate 'from sglang.srt.environ import envs' import
fzyzcjy 45bd371
remove unused rms_normalize function in deepseek_v4.py
fzyzcjy 354c102
remove unused Compressor.compute_state_len{,_indices} static helpers
fzyzcjy 5f9247d
remove unused freqs_cis param from MQALayer prepare/compute helpers
fzyzcjy 3a63091
remove unused debug_return_kv param from MQALayer.forward
fzyzcjy e8158c5
drop unused pp_proxy_tensors from inner DeepseekV4Model.forward
fzyzcjy d1907d3
drop unused 'as err' in encoding_dsv4 tool_call exception
fzyzcjy 2dba457
remove unused parse_message_from_completion_text in encoding_dsv4
fzyzcjy f10765d
remove unused expand_seq_lens helper in paged_prefill
fzyzcjy 8df1c9f
remove unused make_swa_ring_buffer_indices helper in paged_prefill
fzyzcjy 8aa52bc
delete unused paged_prefill module (prepare_swa_ring_buffer_cache)
fzyzcjy 6d83cd2
remove unused RaggedCoreMetadata and RaggedIndexerMetadata dataclasses
fzyzcjy 371f94b
remove unused init_c4_metadata / init_c128_metadata wrappers
fzyzcjy 642752b
drop dead seq_lens_sum=bs in IDLE replay branch
fzyzcjy 7d35c91
drop unused _is_cuda/_is_cpu/_is_cpu_amx_available/_use_aiter in deep…
fzyzcjy 86ca10d
remove unused fused_norm_rope_inplace wrapper in jit_kernel/deepseek_v4
fzyzcjy e4eb8ac
remove unused HiSparseCoordinator.get_front_topk_tokens
fzyzcjy 0353cb9
remove MOE/ATTN/COMPRESSOR_BIT_WISE_EQUAL_MODE flags
fzyzcjy 612aa66
Revert "remove unused parse_message_from_completion_text in encoding_…
fzyzcjy 00370f2
Revert "drop unused 'as err' in encoding_dsv4 tool_call exception"
fzyzcjy c8a561d
Revert "drop dead seq_lens_sum=bs in IDLE replay branch"
fzyzcjy 83b48df
unify cuda graph metadata dicts via _GraphBucket enum in deepseek_v4_…
fzyzcjy c8f263a
remove unused yarn_get_mscale helper in deepseek_v4
fzyzcjy 1bf0585
remove unused is_layer_sparse/_is_layer_sparse on DeepseekV4DecoderLayer
fzyzcjy 0ea8709
drop dead BumpAllocator setup in DeepseekV4Model.forward
fzyzcjy 37d499a
remove dead num_fused_shared_experts > 0 branches in DSv4
fzyzcjy ec693f8
remove unused self.padding_id in DeepseekV4ModelNextN
fzyzcjy da1e4bc
remove unused self.layers_to_capture in DeepseekV4ModelNextN
fzyzcjy 6382a13
remove unused self.enable_a2a_moe in DeepseekV4ModelNextN
fzyzcjy 2250c7b
Revert "remove dead num_fused_shared_experts > 0 branches in DSv4"
fzyzcjy 145be01
remove unused PagedCoreMetadata class
fzyzcjy 89b4fd1
remove unused DeepseekV4Metadata class
fzyzcjy 1cb30cd
remove unused CoreMetadata.init_swa_slice method
fzyzcjy 267ca2a
drop unread c4_positions field on DSV4AttnMetadataRadix
fzyzcjy 6b8cea5
drop unread c128_positions field on DSV4AttnMetadataRadix
fzyzcjy 4736050
drop unused real_metadata field on DSV4MetadataRawVerify/RawDecode
fzyzcjy a374819
remove unused apply_rotary_emb in deepseek_v4_rope
fzyzcjy 410ec93
remove unused tilelang_make_swa_prefill_indices and helper kernel
fzyzcjy 2c72be8
remove unused CompressorPrefillPlan.copy_ method
fzyzcjy 8631d6b
remove unused CompressorDecodePlan.copy_ method
fzyzcjy 0b8ac0c
Revert "remove unused CompressorDecodePlan.copy_ method"
fzyzcjy c4b9411
Revert "remove unused CompressorPrefillPlan.copy_ method"
fzyzcjy 17fd7c8
drop unused start_event field on HiSparseAct namedtuple
fzyzcjy 83f41d0
Revert "drop unused start_event field on HiSparseAct namedtuple"
fzyzcjy 9f849e3
fix: re-add is_hip import in deepseek_v4_topk
fzyzcjy fc84e1d
fix lint after dead-code cleanup
fzyzcjy b503ba6
rename is_deepseek_compressed; move dsv4 fp4 autodetect out of model_…
hnyls2002 c344aa8
fix dsv4 dataclass import-order crash; move auto-detect helper to wei…
hnyls2002 0d291a3
Revert "fix dsv4 dataclass import-order crash; move auto-detect helpe…
hnyls2002 1045918
Revert "rename is_deepseek_compressed; move dsv4 fp4 autodetect out o…
hnyls2002 cda4bea
fix DSv4 config dataclass ordering with transformers 5.6 PretrainedCo…
hnyls2002 3175d0b
fix v4 nextn post_load_weights signature accept is_nextn kwarg
hnyls2002 6a90273
stop on newline in determinism probe to avoid drift on continuation
hnyls2002 178dcdf
simplify determinism probe comments
hnyls2002 c4d2056
rename is_deepseek_compressed; move dsv4 fp4 autodetect out of model_…
hnyls2002 d7f7a72
fix dsv4 dataclass import-order crash; move auto-detect helper to wei…
hnyls2002 955eb87
move dsv4 fp4 autodetect from weight_utils into configs/deepseek_v4
hnyls2002 02a6950
add swa unit tests
ispobock 1a48196
Replace SGLANG_DSV4_FP4_EXPERTS env with ModelConfig.is_fp4_experts a…
hnyls2002 335a2b3
add marlin to MOE_RUNNER_BACKEND_CHOICES (CLI choices list was out of…
fzyzcjy 2a30899
Revert "Replace SGLANG_DSV4_FP4_EXPERTS env with ModelConfig.is_fp4_e…
hnyls2002 98d35bc
remove SGLANG_OPT_DEEPGEMM_SCALE_CONVERT_AT_INIT (default True), inli…
fzyzcjy c0d402e
fix(support_triton): restore "ascend" backend (lost in rebase merge)
fzyzcjy 8fe4670
remove duplicate get_libnuma/numa_bind_to_node from utils/common.py (…
fzyzcjy a5f4cc7
restore top-level 'import gc' in utils/common.py to match main
fzyzcjy 9d5a84e
MoEGate: gate prefill_cp F.linear shortcut on not is_deepseek_v4 (dro…
fzyzcjy 84cfee1
MoEGate: gate linear_bf16_fp32 fallback on is_deepseek_v4 (restore F.…
fzyzcjy 9144676
DeepseekV2MoE: drop incorrect 'assert hasattr(self, shared_experts)' …
fzyzcjy 7bd615b
remove SGLANG_OPT_ALLOW_SHARED_EXPERT_DUAL_STREAM (default True), inl…
fzyzcjy 27cb98c
topk: drop duplicate '_is_xpu = is_xpu()' assignment
fzyzcjy 1f7708d
Revert "topk: drop duplicate '_is_xpu = is_xpu()' assignment"
fzyzcjy c2bae49
DSV4 fp4 experts: env user override + try-detect → ModelConfig.is_fp4…
hnyls2002 c76ea69
cleanup swa fix env and test
ispobock 73d128e
Merge remote-tracking branch 'origin' into dsv4-rebase
hnyls2002 e9b8d3e
group all DSV4 envs into one section with sub-group headers
hnyls2002 a49605d
is_deepseek_nsa/v4: accept dict-or-object via shared _hf_arch/_hf_att…
hnyls2002 229c0fe
[Dep] Add tilelang to pyproject.toml (#24178)
Fridge003 111dba6
drop SGLANG_DISABLE_REQUEST_LOGGING doc entry; env has no reader
hnyls2002 a18ceb5
drop dead branches in DeepSeekV4SingleKVPool
hnyls2002 708f215
add is_swa_like_pool helper
hnyls2002 b5e4991
add BaseSWAKVPool ABC
hnyls2002 4523bc7
drop is_swa_like_pool; use BaseSWAKVPool isinstance
hnyls2002 a18f1c1
all req pools reserve slot 0 as padding
hnyls2002 66572a6
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 008e7b8
rename expected_free back to req_total_size to match main
hnyls2002 850281a
restore SWAChunkCache assert; allow HiSparse allocator
hnyls2002 8fe0104
restore session_held_mamba_slots chain
hnyls2002 a8149dd
fix mamba slot leak in release_session
hnyls2002 d96d731
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 2c4693b
align with main's dependencies
hnyls2002 3203982
drop CoreMetadata + IndexerMetadata dead code
hnyls2002 cf3d985
rename maybe_torch_compile -> compile_in_capture_mode; move to cuda_g…
hnyls2002 778ec36
align custom_all_reduce_v2 to main
hnyls2002 6de214b
drop SGLANG_DISABLE_REQUEST_LOGGING doc; env has no reader
hnyls2002 31ef73a
resolve hisparse conflict
xiezhq-hermann 5e68a6f
hisparse cleaning
xiezhq-hermann b3175f6
Merge origin/main into dsv4-rebase
hnyls2002 4d711cb
drop redundant f-prefix
hnyls2002 53e6703
rename topk
ispobock 0f2db17
restore docs
ispobock 38902df
fix rebase swa evict
ispobock 9c3e572
update
ispobock 022e293
add ut for leaf split
ispobock 5a65728
Fix weight checker float dtype detection
yueming-yuan 1dc3397
Align weight checker reset handling
yueming-yuan d6d1240
Handle DeepSeek KV cache scales in weight checker
yueming-yuan 60cb388
force DSV4 topk_group=n_group so router takes ungrouped sqrtsoftplus …
hnyls2002 591a42f
register deepseek_v4 via DeepseekV3Config subclass alias
hnyls2002 87d385c
move allreduce v2 env flag to match main location
hnyls2002 bed994e
restore HIP buf_numel_per_page assertion
hnyls2002 0f4d53a
restore HIP page_size assertion
hnyls2002 57e1b98
restore blank line in combine_a
hnyls2002 3f9037a
extract mega-moe to layers/moe/mega_moe.py (#24301)
hnyls2002 d6418f4
restore registry log message
hnyls2002 0cd8b6e
extract MxFP4 fused RSF+shared_add helper
hnyls2002 698797e
revert HIP page_size to main
hnyls2002 6bdb5e1
Merge origin/main into dsv4-rebase
hnyls2002 62cbbb8
extract dsv4 server-args hooks to arg_groups (#24326)
hnyls2002 d564605
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 66fa6ac
sync utils/common.py with main; revert NPU bugfix revert
hnyls2002 6025a50
extract hisparse hook to arg_groups
hnyls2002 61b26dc
drop SGLANG_ENABLE_THINKING force on HF path
hnyls2002 e107dd7
rename SGLANG_ENABLE_THINKING to SGLANG_DEFAULT_THINKING
hnyls2002 bc4fab4
rename deepseekv4_memory_pool to deepseek_v4_memory_pool
hnyls2002 8e88bb8
rename compress_state to deepseek_v4_compress_state
hnyls2002 bfa804b
rebase main with 91fa2340ed
hnyls2002 b56d72e
drop SGLANG_OPT_V4_DRAFT_EXTEND_CUDA_GRAPH
hnyls2002 ce278b1
rename DeepseekV4BackendRadix to DeepseekV4AttnBackend; drop Radix su…
hnyls2002 581c44d
lock deps to cu129 baseline; main adaptation pending
hnyls2002 6ea6048
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 009bbc7
move v4 pd pp=1 check to server args
hnyls2002 b22221f
revert rocm artifacts from dsv4-rebase (cuda-only scope) (#24339)
hnyls2002 df16061
rename mxfp4_deepseek to mxfp4_flashinfer_trtllm_moe
hnyls2002 7ce156b
reconcile v4 nixl with main; add v4 dispatch branch
hnyls2002 16afced
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 d88c329
move dsv4 metadata init kernel to compressed/
hnyls2002 77b5774
rename on_after_cuda_graph_warmup; tidy comments
hnyls2002 eecf0fb
rename attention/compressed to compression
hnyls2002 70d5e29
drop SGLANG_FIX_MTP_HC_HIDDEN; default-on
hnyls2002 e46b1e3
minor: use main
DarkSharpness 9267502
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 6617c26
rename compressed attn backend to dsv4; deprecate alias
hnyls2002 89411ab
rename is_swa_with_compressed_attention -> is_deepseek_v4_arch
hnyls2002 be95d76
rename init_compressed_metadata -> init_compression_metadata
hnyls2002 d1d3e57
rename SGLANG_REASONING_EFFORT -> SGLANG_DSV4_REASONING_EFFORT
hnyls2002 8996c5a
fix SGLANG_DSV4_ISOLATE type: EnvInt -> EnvBool
hnyls2002 d39be6d
describe head_dim/block_size asserts in fp8_paged_mqa_logits_torch
hnyls2002 8a2a47f
minor: clean up
DarkSharpness 1127c78
drop DeepseekRefRMSNorm; use standard RMSNorm in Compressor
hnyls2002 99e3141
drop unused methods on v4 compress state pool and attn backend
hnyls2002 2008b54
drop unused _DSV4_RAW_TYPES constant
hnyls2002 dc0c318
move Compressor/C4Indexer nn.Modules to layers/attention/compression
hnyls2002 470a380
rename Compressor/C4IndexerBackend to *BackendMixin
hnyls2002 48f7b07
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 a96fd95
import ReplicatedLinear from layers.linear instead of models.dbrx
hnyls2002 f325ce6
describe magic-number asserts in dsv4 attn / compressor / pool
hnyls2002 618317a
drop unused create_flashmla_metadata helper
hnyls2002 23eab17
rename layers/attention/compression to layers/attention/dsv4
hnyls2002 8f36a94
unify Dsv4/DSv4 mixed-case to DSV4
hnyls2002 6ce24a8
rename is_v4_model to is_dsv4_model
hnyls2002 88c5754
consolidate copy_metadata: drop duplicate _copy_metadata in v4 backend
hnyls2002 fcda6fc
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 884eaeb
use _skip_weight_check marker
yueming-yuan 3f55938
upd
Fridge003 0f27c21
upd
Fridge003 3bf02ce
align c4 hisparse translate_loc names with NSA
hnyls2002 38827bf
DeepGemm fixes for v4 rebasing (#24399)
Fridge003 17591f3
sync attention/utils.py with main
hnyls2002 62969b3
fix glm4-moe-lite missing is_hash attr
hnyls2002 ed35a74
minor: remove flag
DarkSharpness 40f07b0
minor: remove seemingly dangerous flags
DarkSharpness 85c63af
restore hisparse comments to align with main
hnyls2002 024911f
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 c398860
drop duplicate maybe_collect_indexer_topk left by merge
hnyls2002 66b9453
drop duplicate indexer_topk kwarg left by merge
hnyls2002 d688b21
isort dsv4/indexer.py imports
hnyls2002 e11e351
tiny recover comments
hnyls2002 25a9fab
rename state_type v4 -> dsv4
hnyls2002 a9ef99b
guard self.is_hash with getattr; drop glm4 patch
hnyls2002 7609096
move nsa _v4 artifacts to attention/dsv4
hnyls2002 fa3e030
extract dsv4 paged_mqa_logits to attention/dsv4/tilelang_kernel
hnyls2002 1b38b68
drop SGLANG_OPT_DG_PAGED_MQA_LOGITS_CHUNK_SIZE
hnyls2002 bcca716
Merge remote-tracking branch 'origin/main' into dsv4-rebase
hnyls2002 4e0b431
drop SGLANG_DSV4_ISOLATE
hnyls2002 8fa2b96
fixme: hisparse negative pool counter clamp
hnyls2002 584c86f
Remove dsv4 compress_state dead code (#24472)
hnyls2002 3593e22
restore non-dsv4 silu_and_mul fallback; gate dsv4 by swiglu_limit
hnyls2002 976a43d
remove requirement on fast hadamard for sm103
Fridge003 b0df768
tiny
hnyls2002 28dae6f
remove dead non-triton k cache quant
hnyls2002 c3d8770
drop dead swa_page_size and unused fields on CompressStatePool
hnyls2002 3039f4f
Merge branch 'main' into dsv4-rebase
hnyls2002 b69a724
drop duplicate TestDeepSeekV4Detector from rebase
hnyls2002 a3b4935
minor: remove selector
DarkSharpness 72679d7
fix has_attention_sinks unset for non-hybrid-swa
hnyls2002 973f098
split hisparse mla swap-in by buffer layout
hnyls2002 087d83b
small upgrade sglang-kernel version
Fridge003 3ffc34d
fix non-dsv4 cuda graph replay typeerror
hnyls2002 a17de73
try revert
hnyls2002 a5fff9b
guard on_after_cuda_graph_warmup for non-dsv4 draft backends
hnyls2002 5760cc1
Merge branch 'main' into dsv4-rebase
hnyls2002 c918888
fix swa eviction test mock req
hnyls2002 dba0adc
restore admit_request_direct on HiSparseCoordinator
hnyls2002 7298f63
Merge branch 'main' into dsv4-rebase
hnyls2002 4acbcca
gate masked deep_gemm V4 path on swiglu_limit
hnyls2002 fce5ad6
fix _postprocess_tensors test calls
hnyls2002 c822e91
Merge branch 'main' into dsv4-rebase
hnyls2002 23e62ce
fix _make_req mock missing seqlen
hnyls2002 c39f6d9
restore naive_load_topk on HiSparseCoordinator
hnyls2002 1875af4
fix: NSA prefill context parallel crash (dsv4-rebase) (#24560)
yhyang201 900eac3
Merge branch 'main' into dsv4-rebase
hnyls2002 cfdce82
Merge remote-tracking branch 'origin/dsv4-rebase' into dsv4-rebase
hnyls2002 1623ce0
Merge remote-tracking branch 'origin' into dsv4-rebase
hnyls2002 6efeee8
Port MXFP4 Marlin MoE support to JIT kernel path (#24490)
yhyang201 0826876
gate jit mask_topk_ids; default off
hnyls2002 f490e24
Merge remote-tracking branch 'origin' into dsv4-rebase
hnyls2002 cbcd3a3
Inline SGLANG_OPT_MXFP4_FUSE_RSF_SHARED_ADD (default True)
fzyzcjy 2267007
Inline SGLANG_OPT_MXFP4_STATIC_SCALE_ONES (default True)
fzyzcjy 5d8779f
Inline SGLANG_OPT_MXFP4_SKIP_DISPATCHER_MAPPING (default True)
fzyzcjy f64d12f
Remove unused SGLANG_FIX_ATTN_BACKEND_IDLE env var
fzyzcjy 1bc545e
Inline SGLANG_FIX_PD_IDLE (default True)
fzyzcjy fe60306
Drop unused envs import in mxfp4_flashinfer_trtllm_moe.py
fzyzcjy 0d7379f
Revert "gate jit mask_topk_ids; default off"
hnyls2002 3a469e5
fix lint
hnyls2002 a32a55c
Merge remote-tracking branch 'upstream/dsv4-rebase' into dsv4-rebase
fzyzcjy 4c3689b
fix error when model has no swiglu limit
fzyzcjy 47d6c2b
fix AMD build: guard enable_cluster with USE_ROCM
fzyzcjy 9e45b2d
add comments
fzyzcjy 7fde762
fix _mask_topk_ids_padded_region
fzyzcjy 37f4a77
fmt code
fzyzcjy f54b6ee
fix: cast kv_compressed to bf16 before rotate_activation in compressor
yhyang201 dfaf8a9
add B200 CI tests for DSV4 Flash FP4 (per-commit + nightly)
yhyang201 f81a88a
add H200 CI test for DSV4 Flash FP4 Marlin (per-commit sanity + GSM8K)
yhyang201 97d22df
fix glm5 grouped topk; route by model not n_group>topk_group
hnyls2002 ab450c6
topk init: explicit main default + dsv4 override
hnyls2002 b481bf0
minor adjust for hisparse
xiezhq-hermann 644dee6
[CI] Add flash_mla installation script for dsv4 ci tests (#24634)
Fridge003 7978aa7
Merge branch 'main' into dsv4-rebase
Fridge003 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
You are viewing a condensed version of this merge commit. You can view the full changes here.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.