amd/deepseek_v4 integration 10/N optimize mhc performance#24355
amd/deepseek_v4 integration 10/N optimize mhc performance#24355HaiShaw merged 4 commits intosgl-project:amd/deepseek_v4from
Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates AMD-specific aiter multi-head compression (MHC) operations into the DeepSeek-V4 model, introduces new environment variables to control these features, and updates the run_dsv4.sh script with optimized configurations. The review feedback focuses on ensuring cross-platform compatibility by guarding aiter imports with platform checks, adhering to PEP-8 formatting standards, and improving performance by reducing frequent memory allocations during the forward pass. Additionally, redundant path assignments in the shell script were identified for cleanup.
| @@ -1889,6 +1889,23 @@ def hc_pre_torch_impl(x, hc_fn): | |||
| # returned post should be [n, hc_mult] | |||
There was a problem hiding this comment.
The aiter library is an AMD-specific dependency and is not available on NVIDIA platforms. Since SGLANG_OPT_USE_AITER_MHC_PRE is enabled by default, this block will be executed on NVIDIA hardware if SGLANG_OPT_USE_TILELANG_MHC_PRE is disabled (which is necessary to use DeepGEMM), resulting in a ModuleNotFoundError. This check should be guarded by _is_hip to ensure portability.
| # returned post should be [n, hc_mult] | |
| if _is_hip and envs.SGLANG_OPT_USE_AITER_MHC_PRE.get(): |
There was a problem hiding this comment.
gemini-code-assist proposed wrong diff, incorrect line #
There was a problem hiding this comment.
You are absolutely correct, HaiShaw. My apologies for the incorrect line number in the previous suggestion. The if _is_hip condition should indeed guard the aiter block, specifically modifying the line if envs.SGLANG_OPT_USE_AITER_MHC_PRE.get():.
Since the original comment was anchored to line 1889, I cannot provide a direct code suggestion for line 1892 in the UI. However, the correct modification would be to change line 1892 from:
if envs.SGLANG_OPT_USE_AITER_MHC_PRE.get():to:
if _is_hip and envs.SGLANG_OPT_USE_AITER_MHC_PRE.get():This ensures that the aiter library, which is an AMD-specific dependency, is only used on AMD platforms, preventing ModuleNotFoundError on NVIDIA hardware.
| SGLANG_OPT_USE_AITER_MHC_POST= EnvBool(True) | ||
| # fmt: on |
There was a problem hiding this comment.
There are missing spaces around the assignment operator = for the new environment variables. To maintain consistency with the rest of the file and adhere to PEP-8 standards, please add spaces around the equals sign.
| SGLANG_OPT_USE_AITER_MHC_POST= EnvBool(True) | |
| # fmt: on | |
| SGLANG_OPT_USE_AITER_MHC_PRE = EnvBool(True) | |
| SGLANG_OPT_USE_AITER_MHC_POST = EnvBool(True) |
| result = mhc_post(x, residual, post, comb) | ||
| return result | ||
|
|
||
| elif envs.SGLANG_OPT_USE_AITER_MHC_POST.get(): |
There was a problem hiding this comment.
| MODEL=/dockerx/data/deepseek-ai/DeepSeek-V4-Pro | ||
| MODEL=/dockerx/data/sgl-project/DeepSeek-V4-Flash-FP8/ |
There was a problem hiding this comment.
The MODEL variable is assigned twice consecutively, making the first assignment redundant. Additionally, these absolute paths are specific to a particular environment. It is recommended to use a single assignment and consider using a more generic path or an environment variable for better flexibility.
| MODEL=/dockerx/data/deepseek-ai/DeepSeek-V4-Pro | |
| MODEL=/dockerx/data/sgl-project/DeepSeek-V4-Flash-FP8/ | |
| MODEL=/dockerx/data/sgl-project/DeepSeek-V4-Flash-FP8/ |
HaiShaw
left a comment
There was a problem hiding this comment.
Let's update rocm.Dockerfile for the best config also
co-author: @1am9trash
Motivation
Update amd/deepseek_v4 integration branch
Following PRs have large set of conflict, we use this PR and upstream amd/deepseek_v4 branch to integrate in parallel.
#23600
#23608
Modifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci