Skip to content

Enable DSR1 FP8 Optimizations#116

Merged
k50112113 merged 18 commits intoshaoclee/ds_fp4_gemmfrom
farlukas/dsfp8-fusedrmsnorm
Jan 8, 2026
Merged

Enable DSR1 FP8 Optimizations#116
k50112113 merged 18 commits intoshaoclee/ds_fp4_gemmfrom
farlukas/dsfp8-fusedrmsnorm

Conversation

@farlukas
Copy link
Contributor

@farlukas farlukas commented Jan 8, 2026

Enable the following optimizations on DSR1 FP8:

  • Fused RMSNorm + quant
  • Fused reduce + RMSNorm + quant
  • Triton preshuffled blockscale GEMM
  • Triton fused preshuffled blockscale GEMM + split + cat

@farlukas farlukas marked this pull request as ready for review January 8, 2026 00:34
@k50112113 k50112113 merged commit 55fafbe into shaoclee/ds_fp4_gemm Jan 8, 2026
@k50112113 k50112113 deleted the farlukas/dsfp8-fusedrmsnorm branch January 8, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants