Skip to content

Fix: fix flashmla fp8 kv cache acc error#13841

Merged
Fridge003 merged 3 commits into
sgl-project:mainfrom
FlamingoPg:flashmla-fp8kvcache
Nov 30, 2025
Merged

Fix: fix flashmla fp8 kv cache acc error#13841
Fridge003 merged 3 commits into
sgl-project:mainfrom
FlamingoPg:flashmla-fp8kvcache

Conversation

@FlamingoPg
Copy link
Copy Markdown
Collaborator

@FlamingoPg FlamingoPg commented Nov 24, 2025

Motivation

#13832

Modifications

Accuracy Tests

Fixed.
Image

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Fridge003
Copy link
Copy Markdown
Collaborator

@FlamingoPg Please fix lint

@cscyuge
Copy link
Copy Markdown

cscyuge commented Nov 25, 2025

launch error:

...
  File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/flashmla_backend.py", line 78, in __init__
    self.is_fp8_kvcache = model_runner.kv_cache_dtype.startswith("fp8")
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'torch.dtype' object has no attribute 'startswith'

Tried using model_runner.server_args.kv_cache_dtype.startswith("fp8") but didn't fix #13832.

I think it may because the current deepseek_v2 model doesn’t implement load_kv_cache_scales(), so any quantization_param_path is ignored, and MLATokenToKVPool simply casts the latent KV tensors to fp8 without storing or reusing scale factors. As a result, the KV cache relies on raw FP8 range only, which introduces much larger quantization error than the other models that load and apply layer-wise scales.

@akhoroshev
Copy link
Copy Markdown

#13832 (comment)

@Fridge003 Fridge003 merged commit c72f075 into sgl-project:main Nov 30, 2025
114 of 130 checks passed
HanHan009527 pushed a commit to bytedance-iaas/sglang that referenced this pull request Dec 1, 2025
Co-authored-by: ybyang <ybyang7@iflytek.com>
harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025
Co-authored-by: ybyang <ybyang7@iflytek.com>
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
Co-authored-by: ybyang <ybyang7@iflytek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants