Enable FP8 quantization in SDXL using INC#1337
Conversation
| default="disable", | ||
| type=str, | ||
| help="Quantization mode 'measure', 'quantize' or 'disable'", | ||
| ) |
There was a problem hiding this comment.
can you do the same as run_generation? and not introduce extra arguments?
| """ | ||
|
|
||
| quant_mode=kwargs["quant_mode"] | ||
| if quant_mode == "measure" or quant_mode == "quantize": |
There was a problem hiding this comment.
please check how run_generation is done
| if config.measure: | ||
| self.unet = prepare(self.unet, config) | ||
| elif config.quantize: | ||
| self.unet = convert(self.unet, config) |
There was a problem hiding this comment.
All Unet steps should not be converted to Fp8, Recipe for good accuracy includes last few steps in bf16. So, you must have the bf16 Unet support - refer Model Garden reference for Fp8- HQT to INC conversion patch
There was a problem hiding this comment.
ok, I'll submit it as a separate PR. Original intention of this PR is to show how easy it is to quantize model, so I prefer to keep code change to bare minimum.
|
@splotnikv can you update the branch ? thanks |
|
I have not forgot about this PR but too busy right now to address feedback. I’ll try to push update next week. |
|
@splotnikv can you please review the comments and make necessary changes. |
e4a20ca to
ac4de48
Compare
|
@splotnikv if you can't finish it this week, we won't be able to pull in the next release. |
I don't have anything else to change in this PR. From my point of view it is ready. And I also don't mind if it'll go into next release. |
|
@splotnikv, as part of release commit for 1.19.0, we have already completed this feature plan to merge this below PR for Fp8 flow #1519. |
|
@splotnikv We will merge 1519 that has fp8 support |
What does this PR do?
This PR illustrates how to enable FP8 quantization in SDXL using INC https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_FP8.html