[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300 by 1am9trash · Pull Request #22314 · sgl-project/sglang

1am9trash · 2026-04-08T03:11:09Z

Motivation

On MI300, running GLM-5-fp8 with FP8 KV cache can fail (see CI log).
The root cause is that the quant path does not dispatch the correct kernel (set_mla_kv_buffer_triton_fp8_quant).

Modifications

The flag self.nsa_kv_cache_store_fp8 is true only when KV cache is stored in fp8 with scaling. Our attention path uses fp8 KV cache without scaling, so it should not be gated by this flag.
This change moves the HIP + fp8 quant path out of the scaling-specific branch, ensuring MI300 dispatches the correct fused kernel (set_mla_kv_buffer_triton_fp8_quant).

This change only affects the MI300 code path.

Accuracy Tests

GLM-5-fp8 with fp8 kvcache Accuracy: 0.945
Also validated with the new CI script test_glm5_perf_amd.py prepared in PR #21710.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-08T03:11:12Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

kkHuang-amd · 2026-04-08T03:29:44Z

python/sglang/srt/mem_cache/memory_pool.py

+        if (
+            _is_hip
+            and self.use_nsa
+            and self.dtype in (torch.float8_e4m3fn, torch.float8_e4m3fnuz)


You can import "from sglang.srt.layers.quantization.fp8_kernel import fp8_dtype"
and use "self.dtype == fp8_dtype" to do condition check.

fp8_dtype is torch.float8_e4m3fnuz on mi300x and torch.float8_e4m3fn on mi35x

Fixed and reran well. Really appreciate the reminder.

kkHuang-amd · 2026-04-08T03:30:27Z

python/sglang/srt/mem_cache/memory_pool.py

+        ):
+            # HIP FP8 path uses raw MLA KV layout (nope + rope) without per-block scales.
+            # Fuse BF16/FP16 -> FP8 cast with paged KV write.
+            fp8_dtype = torch.float8_e4m3fnuz if _is_fp8_fnuz else torch.float8_e4m3fn


Remove Line 1585, when you use from sglang.srt.layers.quantization.fp8_kernel import fp8_dtype

Now that #22314 (MI300 FP8 KV quant dispatch fix) and #22232 (NSA indexer clone fix) are merged, re-enable FP8 KV cache for both MI30x and MI35x perf tests.

1am9trash and others added 3 commits April 4, 2026 12:42

Fix mi300 quant code path

130930c

Merge branch 'sgl-project:main' into fix-mi300-quant-path

e404981

Merge branch 'sgl-project:main' into fix-mi300-quant-path

866cab4

1am9trash requested review from Ying1123, hanming-lu, hnyls2002, hzh0425, ispobock, merrymercy, xiezhq-hermann and yizhang2077 as code owners April 8, 2026 03:11

kkHuang-amd reviewed Apr 8, 2026

View reviewed changes

Use fp8 dtype from import

e795cd0

HaiShaw approved these changes Apr 8, 2026

View reviewed changes

HaiShaw merged commit 729b74d into sgl-project:main Apr 8, 2026
54 of 62 checks passed

michaelzhang-ai mentioned this pull request Apr 8, 2026

[AMD] Add GLM-5-FP8 nightly performance benchmarks for MI30x and MI35x #21710

Merged

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 8, 2026

[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300 (sgl-project#22314)

ca6b920

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300#22314

[AMD] Fix GLM-5 fp8 KV quant path dispatch on MI300#22314
HaiShaw merged 4 commits intosgl-project:mainfrom
1am9trash:fix-mi300-quant-path

1am9trash commented Apr 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Apr 8, 2026

Uh oh!

kkHuang-amd Apr 8, 2026

Uh oh!

1am9trash Apr 8, 2026 •

edited

Loading

Uh oh!

kkHuang-amd Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

1am9trash commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist bot commented Apr 8, 2026

Uh oh!

kkHuang-amd Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

1am9trash Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkHuang-amd Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1am9trash commented Apr 8, 2026 •

edited

Loading

1am9trash Apr 8, 2026 •

edited

Loading

kkHuang-amd Apr 8, 2026 •

edited

Loading