Skip to content

Enable FP8 quantization in SDXL using INC#1337

Closed
splotnikv wants to merge 2 commits into
huggingface:mainfrom
splotnikv:sdxl_quant
Closed

Enable FP8 quantization in SDXL using INC#1337
splotnikv wants to merge 2 commits into
huggingface:mainfrom
splotnikv:sdxl_quant

Conversation

@splotnikv
Copy link
Copy Markdown
Contributor

What does this PR do?

This PR illustrates how to enable FP8 quantization in SDXL using INC https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_FP8.html

@splotnikv splotnikv requested a review from regisss as a code owner September 17, 2024 21:56
default="disable",
type=str,
help="Quantization mode 'measure', 'quantize' or 'disable'",
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do the same as run_generation? and not introduce extra arguments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""

quant_mode=kwargs["quant_mode"]
if quant_mode == "measure" or quant_mode == "quantize":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check how run_generation is done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if config.measure:
self.unet = prepare(self.unet, config)
elif config.quantize:
self.unet = convert(self.unet, config)
Copy link
Copy Markdown

@kumarans-ai kumarans-ai Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Unet steps should not be converted to Fp8, Recipe for good accuracy includes last few steps in bf16. So, you must have the bf16 Unet support - refer Model Garden reference for Fp8- HQT to INC conversion patch

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll submit it as a separate PR. Original intention of this PR is to show how easy it is to quantize model, so I prefer to keep code change to bare minimum.

@libinta
Copy link
Copy Markdown
Collaborator

libinta commented Nov 1, 2024

@splotnikv can you update the branch ? thanks

@splotnikv
Copy link
Copy Markdown
Contributor Author

I have not forgot about this PR but too busy right now to address feedback. I’ll try to push update next week.

@hsubramony
Copy link
Copy Markdown
Collaborator

@splotnikv can you please review the comments and make necessary changes.

@libinta
Copy link
Copy Markdown
Collaborator

libinta commented Nov 27, 2024

@splotnikv if you can't finish it this week, we won't be able to pull in the next release.

@splotnikv
Copy link
Copy Markdown
Contributor Author

@splotnikv if you can't finish it this week, we won't be able to pull in the next release.

I don't have anything else to change in this PR. From my point of view it is ready. And I also don't mind if it'll go into next release.

@kumarans-ai
Copy link
Copy Markdown

@splotnikv, as part of release commit for 1.19.0, we have already completed this feature plan to merge this below PR for Fp8 flow #1519.

@libinta
Copy link
Copy Markdown
Collaborator

libinta commented Nov 29, 2024

@splotnikv We will merge 1519 that has fp8 support

@libinta libinta closed this Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants