Fix: fix flashmla fp8 kv cache acc error by FlamingoPg · Pull Request #13841 · sgl-project/sglang

FlamingoPg · 2025-11-24T07:40:54Z

Motivation

#13832

Modifications

Accuracy Tests

Fixed.

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-11-24T07:40:58Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Fridge003 · 2025-11-24T22:37:14Z

@FlamingoPg Please fix lint

cscyuge · 2025-11-25T10:07:50Z

launch error:

...
  File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/flashmla_backend.py", line 78, in __init__
    self.is_fp8_kvcache = model_runner.kv_cache_dtype.startswith("fp8")
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'torch.dtype' object has no attribute 'startswith'

Tried using model_runner.server_args.kv_cache_dtype.startswith("fp8") but didn't fix #13832.

I think it may because the current deepseek_v2 model doesn’t implement load_kv_cache_scales(), so any quantization_param_path is ignored, and MLATokenToKVPool simply casts the latent KV tensors to fp8 without storing or reusing scale factors. As a result, the KV cache relies on raw FP8 range only, which introduces much larger quantization error than the other models that load and apply layer-wise scales.

akhoroshev · 2025-11-26T07:49:24Z

#13832 (comment)

Co-authored-by: ybyang <ybyang7@iflytek.com>

fix flashmla fp8 kv cache

8400e3f

FlamingoPg requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners November 24, 2025 07:40

FlamingoPg self-assigned this Nov 24, 2025

FlamingoPg added the run-ci label Nov 24, 2025

FlamingoPg assigned Fridge003 Nov 24, 2025

Fridge003 mentioned this pull request Nov 24, 2025

[Bug] flashmla fp8 kernel deepseek accuracy problem #13832

Closed

5 tasks

whybeyoung force-pushed the flashmla-fp8kvcache branch from afcb9d5 to 0657edb Compare November 27, 2025 16:19

fix fp8 accuracy problem

a4db4d3

whybeyoung force-pushed the flashmla-fp8kvcache branch from 0657edb to a4db4d3 Compare November 27, 2025 16:26

fix description

3465341

Fridge003 approved these changes Nov 30, 2025

View reviewed changes

Fridge003 merged commit c72f075 into sgl-project:main Nov 30, 2025
114 of 130 checks passed

HanHan009527 pushed a commit to bytedance-iaas/sglang that referenced this pull request Dec 1, 2025

Fix: fix flashmla fp8 kv cache acc error (sgl-project#13841)

3d5df97

Co-authored-by: ybyang <ybyang7@iflytek.com>

harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025

Fix: fix flashmla fp8 kv cache acc error (sgl-project#13841)

e2a864e

Co-authored-by: ybyang <ybyang7@iflytek.com>

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

Fix: fix flashmla fp8 kv cache acc error (sgl-project#13841)

c1742b4

Co-authored-by: ybyang <ybyang7@iflytek.com>

whybeyoung mentioned this pull request Apr 17, 2026

[HiSparse] Support FP8 KV cache by routing to flashmla_kv backend #23013

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: fix flashmla fp8 kv cache acc error#13841

Fix: fix flashmla fp8 kv cache acc error#13841
Fridge003 merged 3 commits into
sgl-project:mainfrom
FlamingoPg:flashmla-fp8kvcache

FlamingoPg commented Nov 24, 2025 •

edited by whybeyoung

Loading

Uh oh!

gemini-code-assist Bot commented Nov 24, 2025

Uh oh!

Fridge003 commented Nov 24, 2025

Uh oh!

cscyuge commented Nov 25, 2025

Uh oh!

akhoroshev commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

FlamingoPg commented Nov 24, 2025 • edited by whybeyoung Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Nov 24, 2025

Uh oh!

Fridge003 commented Nov 24, 2025

Uh oh!

cscyuge commented Nov 25, 2025

Uh oh!

akhoroshev commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

FlamingoPg commented Nov 24, 2025 •

edited by whybeyoung

Loading