amd/deepseek_v4 integration 10/N optimize mhc performance by kkHuang-amd · Pull Request #24355 · sgl-project/sglang

kkHuang-amd · 2026-05-04T08:29:24Z

co-author: @1am9trash

Motivation

Update amd/deepseek_v4 integration branch

Following PRs have large set of conflict, we use this PR and upstream amd/deepseek_v4 branch to integrate in parallel.
#23600
#23608

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request integrates AMD-specific aiter multi-head compression (MHC) operations into the DeepSeek-V4 model, introduces new environment variables to control these features, and updates the run_dsv4.sh script with optimized configurations. The review feedback focuses on ensuring cross-platform compatibility by guarding aiter imports with platform checks, adhering to PEP-8 formatting standards, and improving performance by reducing frequent memory allocations during the forward pass. Additionally, redundant path assignments in the shell script were identified for cleanup.

gemini-code-assist · 2026-05-04T08:34:50Z

@@ -1889,6 +1889,23 @@ def hc_pre_torch_impl(x, hc_fn):
            # returned post should be [n, hc_mult]


The aiter library is an AMD-specific dependency and is not available on NVIDIA platforms. Since SGLANG_OPT_USE_AITER_MHC_PRE is enabled by default, this block will be executed on NVIDIA hardware if SGLANG_OPT_USE_TILELANG_MHC_PRE is disabled (which is necessary to use DeepGEMM), resulting in a ModuleNotFoundError. This check should be guarded by _is_hip to ensure portability.

Suggested change

# returned post should be [n, hc_mult]

if _is_hip and envs.SGLANG_OPT_USE_AITER_MHC_PRE.get():

gemini-code-assist proposed wrong diff, incorrect line #

You are absolutely correct, HaiShaw. My apologies for the incorrect line number in the previous suggestion. The if _is_hip condition should indeed guard the aiter block, specifically modifying the line if envs.SGLANG_OPT_USE_AITER_MHC_PRE.get():.

Since the original comment was anchored to line 1889, I cannot provide a direct code suggestion for line 1892 in the UI. However, the correct modification would be to change line 1892 from:

if envs.SGLANG_OPT_USE_AITER_MHC_PRE.get():

to:

if _is_hip and envs.SGLANG_OPT_USE_AITER_MHC_PRE.get():

This ensures that the aiter library, which is an AMD-specific dependency, is only used on AMD platforms, preventing ModuleNotFoundError on NVIDIA hardware.

gemini-code-assist · 2026-05-04T08:34:51Z

+    SGLANG_OPT_USE_AITER_MHC_POST= EnvBool(True)
    # fmt: on


There are missing spaces around the assignment operator = for the new environment variables. To maintain consistency with the rest of the file and adhere to PEP-8 standards, please add spaces around the equals sign.

Suggested change

SGLANG_OPT_USE_AITER_MHC_POST= EnvBool(True)

# fmt: on

SGLANG_OPT_USE_AITER_MHC_PRE = EnvBool(True)

SGLANG_OPT_USE_AITER_MHC_POST = EnvBool(True)

gemini-code-assist · 2026-05-04T08:34:51Z

            result = mhc_post(x, residual, post, comb)
            return result

+        elif envs.SGLANG_OPT_USE_AITER_MHC_POST.get():


Allocating a new tensor with torch.empty_like(residual) in every layer during every forward pass can be inefficient due to frequent memory allocations. If the aiter kernel supports it, consider using a pre-allocated workspace or an in-place operation to improve performance.

gemini-code-assist · 2026-05-04T08:34:51Z

+MODEL=/dockerx/data/deepseek-ai/DeepSeek-V4-Pro
+MODEL=/dockerx/data/sgl-project/DeepSeek-V4-Flash-FP8/


The MODEL variable is assigned twice consecutively, making the first assignment redundant. Additionally, these absolute paths are specific to a particular environment. It is recommended to use a single assignment and consider using a more generic path or an environment variable for better flexibility.

Suggested change

MODEL=/dockerx/data/deepseek-ai/DeepSeek-V4-Pro

MODEL=/dockerx/data/sgl-project/DeepSeek-V4-Flash-FP8/

MODEL=/dockerx/data/sgl-project/DeepSeek-V4-Flash-FP8/

HaiShaw

Let's update rocm.Dockerfile for the best config also

Replace naive mhc design to aiter design

5a2940d

github-actions Bot added the deepseek label May 4, 2026

kkHuang-amd and others added 2 commits May 4, 2026 16:30

Merge branch 'amd/deepseek_v4' into amd/deepseek_v4_mhc-improve_0504

eb21c40

Enable tilelang mhc

94ec513

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

1am9trash requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners May 4, 2026 08:36

kkHuang-amd changed the title ~~Replace naive mhc design to aiter design~~ amd/deepseek_v4 integration 10/N optimize mhc performance May 4, 2026

HaiShaw reviewed May 4, 2026

View reviewed changes

Resolve geminiy review comments

2211112

kkHuang-amd requested review from ishandhanani and yctseng0211 as code owners May 4, 2026 09:59

github-actions Bot added the amd label May 4, 2026

HaiShaw approved these changes May 4, 2026

View reviewed changes

HaiShaw merged commit b4fe024 into sgl-project:amd/deepseek_v4 May 4, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

amd/deepseek_v4 integration 10/N optimize mhc performance#24355

amd/deepseek_v4 integration 10/N optimize mhc performance#24355
HaiShaw merged 4 commits intosgl-project:amd/deepseek_v4from
HaiShaw:amd/deepseek_v4_mhc-improve_0504

kkHuang-amd commented May 4, 2026 •

edited by HaiShaw

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

HaiShaw May 4, 2026

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

HaiShaw left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -1889,6 +1889,23 @@ def hc_pre_torch_impl(x, hc_fn):
		# returned post should be [n, hc_mult]

	# returned post should be [n, hc_mult]
	if _is_hip and envs.SGLANG_OPT_USE_AITER_MHC_PRE.get():

		MODEL=/dockerx/data/deepseek-ai/DeepSeek-V4-Pro
		MODEL=/dockerx/data/sgl-project/DeepSeek-V4-Flash-FP8/

Conversation

kkHuang-amd commented May 4, 2026 • edited by HaiShaw Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

HaiShaw May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

HaiShaw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kkHuang-amd commented May 4, 2026 •

edited by HaiShaw

Loading