-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Closed
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
On Blackwell device, nvfp4 is supported for TensorCore. If kv cache can be quantized to nvfp4, it will obviously reduce kv cache store memory and speed up inference. And TensorRT-LLM has this feature in pr: NVIDIA/TensorRT-LLM#6244.
Whether sglang has plan to support nvfp4 kv cache?
Related resources
No response
Metadata
Metadata
Assignees
Labels
No labels