Skip to content

Supporting llama int4 inference using AutoGPTQ in HPU (#166)#1125

Closed
HolyFalafel wants to merge 3 commits into
huggingface:mainfrom
HabanaAI:dev/danny/uint4_readme_us
Closed

Supporting llama int4 inference using AutoGPTQ in HPU (#166)#1125
HolyFalafel wants to merge 3 commits into
huggingface:mainfrom
HabanaAI:dev/danny/uint4_readme_us

Conversation

@HolyFalafel
Copy link
Copy Markdown
Contributor

@HolyFalafel HolyFalafel commented Jul 4, 2024

Added support for AutoGPTQ when loading quantized model, and running inference in HPU.
This will be available in v1.17

* Supporting llama int4 quantization using AutoGPTQ

* cleanups in int4

* Blocking running hqt with int4

* Rename int4 param to gptq

* Added call to preprocessing in gptq

* Added call to preprocessing in gptq fix

* Added call to preprocessing in gptq fix2

* Removed call to preprocessing (found a better solution on AutoGPTQ)

* Fixed deprecated message for exllama
@HolyFalafel HolyFalafel requested a review from a user July 4, 2024 14:29
@HolyFalafel HolyFalafel requested a review from regisss as a code owner July 4, 2024 14:29
@libinta libinta added the synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available. label Jul 9, 2024
mounikamandava added a commit to emascarenhas/optimum-habana that referenced this pull request Aug 2, 2024
@libinta libinta removed the synapse 1.17_dependency PR not backward compatible can be merged only when synapse 1.17 is available. label Aug 5, 2024
@emascarenhas
Copy link
Copy Markdown
Contributor

Please sync your PR with main/upstream and fix any merge conflicts. Thank you.

@yafshar
Copy link
Copy Markdown
Contributor

yafshar commented Sep 10, 2024

@HolyFalafel, please sync this PR with main and ping me to wrap it up. Thanks



Llama2-7b in UINT4 is enabled using [AutoGPTQ Fork](https://github.com/HabanaAI/AutoGPTQ), which provides quantization capabilities in PyTorch.
Currently, the support is for UINT4 inference of pre-quantized models only.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HolyFalafel please add the AutoGPTQ installation here,

BUILD_CUDA_EXT=0 pip install auto-gptq --no-build-isolation

@yafshar
Copy link
Copy Markdown
Contributor

yafshar commented Sep 10, 2024

@HolyFalafel
Copy link
Copy Markdown
Contributor Author

#1364 replaces this PR

@HolyFalafel HolyFalafel closed this Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants