You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently Quant-LLM kernel (backing FPx in torchao) only works with FP16. This creates a small divergence from other quantization methods, which all work with BF16. Since all recent models are trained and released with BF16, having BF16 support potentially improve accuracy for FPx models.
Might be over-simplifying, but I think it's just the matter of modifying dequant logic and MMA instructions (as well as update dtype in function signature appropriately)
Hi @DevyRuxpin, I can understand that you want to work on this, but I have already made most of the necessary changes for this feature (see the diff here). Since your solution is not complete yet (it removes FP16 support), it's probably best to avoid doing further duplicate work.
Quant-LLM code: https://github.com/pytorch/ao/tree/main/torchao/csrc/cuda/fp6_llm
Currently Quant-LLM kernel (backing FPx in torchao) only works with FP16. This creates a small divergence from other quantization methods, which all work with BF16. Since all recent models are trained and released with BF16, having BF16 support potentially improve accuracy for FPx models.
Might be over-simplifying, but I think it's just the matter of modifying dequant logic and MMA instructions (as well as update dtype in function signature appropriately)
ao/torchao/csrc/cuda/fp6_llm/utils_parallel_dequant.cuh
Lines 30 to 45 in 09b8b3c
ao/torchao/csrc/cuda/fp6_llm/ptx_mma.cuh
Lines 117 to 125 in 09b8b3c
cc @msaroufim @HDCharles
I might try to do it myself, but I think it would be an interesting good first issue task too. @tobiasvanderwerff Would you be interested?
The text was updated successfully, but these errors were encountered: