Do online fp8 quantization while loading weights instead of in process_weights_after_loading, reducing memory overhead
#17945
Loading