[Feature] support nvfp4 kv cache

### Checklist

- [ ] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [ ] 2. Please use English, otherwise it will be closed.

### Motivation

On Blackwell device, nvfp4 is supported for TensorCore. If kv cache can be quantized to nvfp4, it will obviously reduce kv cache store memory and speed up inference. And TensorRT-LLM has this feature in pr: https://github.com/NVIDIA/TensorRT-LLM/pull/6244.

Whether sglang has plan to support nvfp4 kv cache?

### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] support nvfp4 kv cache #11907

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] support nvfp4 kv cache #11907

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions