amd/deepseek_v4 integration 12/N enable triton swa prepare kernel 0506#24535
Merged
HaiShaw merged 1 commit intosgl-project:amd/deepseek_v4from May 6, 2026
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
HaiShaw
approved these changes
May 6, 2026
Raiden-Makoto
pushed a commit
to Raiden-Makoto/squidward
that referenced
this pull request
May 6, 2026
Builds on the is_hip() guard from the prior commit to enable the
DeepseekV4BackendRadix code path on AMD/ROCm. Three flag flips relative
to upstream amd/deepseek_v4 HEAD's blessed AMD recipe:
- SGLANG_OPT_DPSK_V4_RADIX: 0 -> 1
- SGLANG_OPT_USE_OLD_COMPRESSOR: true -> false (paged KVAndScore layout)
- --disable-radix-cache server flag dropped (so requests actually hit
the radix tree for prefix reuse)
These three flips are unvalidated by upstream:
- Issue sgl-project#23639 frames DSV4 radix as future work (UnifiedRadix design)
- PR sgl-project#24249 just landed fused-compressor optimizations for the OLD
code path; we forfeit those by setting OLD_COMPRESSOR=false
- Upstream's run_dsv4.sh keeps RADIX=0 after 12 integration PRs
run_dsv4.sh:
- Rewritten with per-env documentation explaining the choice
- Auto-merged with PR sgl-project#24535 addition (SGLANG_OPT_USE_TRITON_SWA_PREPARE)
docs/references/amd_deepseek_v4_architecture_reading_order.md:
- New file documenting the radix vs non-radix code paths and pointing
to the integration PRs (1/N..12/N) for context
test/registered/amd/test_deepseek_v4_fp8_radix.py:
- Nightly accuracy + perf test in suite
'nightly-amd-8-gpu-mi35x-deepseek-v4-flash' (matches sibling
test_deepseek_v4_fp8.py registration pattern)
- GSM8K accuracy floor 0.91, perf table at input_len=8192/output_len=1024
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Update amd/deepseek_v4 integration branch
Following PRs have large set of conflict, we use this PR and upstream amd/deepseek_v4 branch to integrate in parallel.
#23600
#23608
Modifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci