Skip to content

Conversation

@adityachatter
Copy link

@adityachatter adityachatter commented Oct 17, 2025

  • Adds functional support of FP8 Chunk Prefill kernel
  • Supports FP8 E4M3FN and E5M2 datatype. Expects Q, K, V to be in FP8 precision and descale factors for Q, K, V to be in FP32 precision with shape (batch size, number of KV heads)

Run FP8 Chunk Prefill unit tests:

cd sgl-kernel-xpu/tests
python3 -m pytest -v -s test_flash_attention.py -k dtype1
96 passed, 182 skipped, 278 deselected

@adityachatter adityachatter force-pushed the achatter/fp8_chunk_prefill branch from d20fff8 to 06ae0d8 Compare October 27, 2025 07:08
Signed-off-by: Aditya Chatterjee <[email protected]>
Signed-off-by: Aditya Chatterjee <[email protected]>
Signed-off-by: Aditya Chatterjee <[email protected]>
@adityachatter adityachatter marked this pull request as ready for review October 29, 2025 08:49
Signed-off-by: Aditya Chatterjee <[email protected]>
Signed-off-by: Aditya Chatterjee <[email protected]>
@deepvars deepvars self-requested a review November 6, 2025 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants