Skip to content

amd/deepseek_v4 integration 12/N enable triton swa prepare kernel 0506#24535

Merged
HaiShaw merged 1 commit intosgl-project:amd/deepseek_v4from
HaiShaw:amd/deepseek_v4_enable_swa_prepare_triton_0506
May 6, 2026
Merged

amd/deepseek_v4 integration 12/N enable triton swa prepare kernel 0506#24535
HaiShaw merged 1 commit intosgl-project:amd/deepseek_v4from
HaiShaw:amd/deepseek_v4_enable_swa_prepare_triton_0506

Conversation

@kkHuang-amd
Copy link
Copy Markdown
Collaborator

@kkHuang-amd kkHuang-amd commented May 6, 2026

Motivation

Update amd/deepseek_v4 integration branch

Following PRs have large set of conflict, we use this PR and upstream amd/deepseek_v4 branch to integrate in parallel.
#23600
#23608

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@HaiShaw HaiShaw merged commit 6ebfb4c into sgl-project:amd/deepseek_v4 May 6, 2026
6 checks passed
Raiden-Makoto pushed a commit to Raiden-Makoto/squidward that referenced this pull request May 6, 2026
Builds on the is_hip() guard from the prior commit to enable the
DeepseekV4BackendRadix code path on AMD/ROCm. Three flag flips relative
to upstream amd/deepseek_v4 HEAD's blessed AMD recipe:

  - SGLANG_OPT_DPSK_V4_RADIX: 0 -> 1
  - SGLANG_OPT_USE_OLD_COMPRESSOR: true -> false (paged KVAndScore layout)
  - --disable-radix-cache server flag dropped (so requests actually hit
    the radix tree for prefix reuse)

These three flips are unvalidated by upstream:
  - Issue sgl-project#23639 frames DSV4 radix as future work (UnifiedRadix design)
  - PR sgl-project#24249 just landed fused-compressor optimizations for the OLD
    code path; we forfeit those by setting OLD_COMPRESSOR=false
  - Upstream's run_dsv4.sh keeps RADIX=0 after 12 integration PRs

run_dsv4.sh:
  - Rewritten with per-env documentation explaining the choice
  - Auto-merged with PR sgl-project#24535 addition (SGLANG_OPT_USE_TRITON_SWA_PREPARE)

docs/references/amd_deepseek_v4_architecture_reading_order.md:
  - New file documenting the radix vs non-radix code paths and pointing
    to the integration PRs (1/N..12/N) for context

test/registered/amd/test_deepseek_v4_fp8_radix.py:
  - Nightly accuracy + perf test in suite
    'nightly-amd-8-gpu-mi35x-deepseek-v4-flash' (matches sibling
    test_deepseek_v4_fp8.py registration pattern)
  - GSM8K accuracy floor 0.91, perf table at input_len=8192/output_len=1024

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants